Counting Words

Program CountWords outputs each word in a text file together with the number of occurrences of that word in the file. It has much in common with the last program. It identifies words in a text file and finds if they are in an array of words. In this case the array contains words already encountered in the file and is changing with each new word. If a word already exists in the file, then its presence must be noted by incrementing an integer in the parallel WordCount array. This means that we need to know the position of the found word in the Words array. Usefully, the function ansiIndexText returns exactly the integer that we require (provided that we use zero-based arrays).

For your convenience, we include a procedure that saves the test prose.txt file into your program folder.

program CountWords;
  {$APPTYPE CONSOLE}
uses
  SysUtils, StrUtils;
const
  MAX_WORDS = 100;
  FILENAME = 'prose.txt';
var
  Words: array[0 .. MAX_WORDS - 1] of ansiString;
  WordCount : array[0 .. MAX_WORDS - 1] of integer;
  i, UniqueWordTotal, CurrentIndex : integer;
  ProseFile : textfile;
  CurrentLine, CurrentWord : string;
  IsWord : Boolean;

procedure SaveProse;
var
  OutFile : textFile;  
begin
  assignFile(OutFile, 'prose.txt');
  rewrite(OutFile);
  writeln(OutFile, 'This is a test piece of writing. ' +
                   'The words will be counted by program CountWords. ');
  writeln(OutFile, 'Numbers such as 53 should be included in the' +
                   ' count and symbols such as ;: # £$ should not. ');
  writeln(OutFile, 'The program converts each word to lower case ' +
                           'to ensure that a word is not counted ');
  writeln(OutFile, 'separately if it is the first word in a sentence.');
  closeFile(OutFile);
end;

begin
  SaveProse;
  //Initialise
  for i := 1 to MAX_WORDS - 1 do
    begin
      Words[i] := '';
      WordCount[i] := 0;
    end;
  UniqueWordTotal := 0;
  assignFile(ProseFile, FILENAME);
  reset(ProseFile);
  //Populate arrays
  while not eof(ProseFile) do
    begin
      readln(ProseFile, CurrentLine);
      CurrentLine := CurrentLine + ' '; //marker for end of last word
      IsWord := True;
      if CurrentLine[1] in WordDelimiters then
        IsWord := False;
      CurrentWord := '';
      for i := 1 to length(CurrentLine) do
        begin
          if IsWord then
            begin
              if not (CurrentLine[i] in WordDelimiters) then
                CurrentWord := CurrentWord + CurrentLine[i]
              else //word is complete
                begin
                  IsWord := False;
                  CurrentIndex := ansiIndexText(CurrentWord, Words);
                  if CurrentIndex = -1 then //first occurrence
                    begin
                      Words[UniqueWordTotal] := lowerCase(CurrentWord);
                      WordCount[UniqueWordTotal] := 1;
                      inc(UniqueWordTotal);
                    end
                  else //Word already used
                    inc(WordCount[CurrentIndex]);
                  CurrentWord := '';
                end; //else
            end //if IsWord
          else //IsWord is False
            begin
              if not (CurrentLine[i] in WordDelimiters) then
                begin
                  CurrentWord := CurrentWord + CurrentLine[i];
                  IsWord := True;
                end;
            end; //else
        end;  //for
    end; //while
  closeFile(ProseFile);
  //Output results
  for i := 0 to UniqueWordTotal - 1  do
    begin
      writeln(Words[i], ': ', WordCount[i]);
    end;
  readln;
end.  

The program produced the following initial output from our test file.

Start of the output from program CountWords

Start of the output from program CountWords

Programming - a skill for life!

Introduction to the string manipulation of text files