Lexical Scan Revisited

                              By
        
                      Jack W. Crenshaw, Ph.D.
        
                            3 June 1989                     
        
*****************************************************************
*                                                               *
*                        COPYRIGHT NOTICE                       *
*                                                               *
*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *
*                                                               *
*****************************************************************

Introduction

I've got some good news and some bad news. The bad news is that this instalment is not the one I promised last time. What's more, the one after this one won't be, either.

The good news is the reason for this instalment: I've found a way to simplify and improve the lexical scanning part of the compiler. Let me explain.

Background

If you'll remember, we talked at length about the subject of lexical scanners in Part VII, and I left you with a design for a distributed scanner that I felt was about as simple as I could make it ... more than most that I've seen elsewhere. We used that idea in Part X. The compiler structure that resulted was simple, and it got the job done.

Recently, though, I've begun to have problems, and they're the kind that send a message that you might be doing something wrong.

The whole thing came to a head when I tried to address the issue of semicolons. Several people have asked me about them, and whether or not KISS will have them separating the statements. My intention has been not to use semicolons, simply because I don't like them and, as you can see, they have not proved necessary.

But I know that many of you, like me, have gotten used to them, and so I set out to write a short instalment to show you how they could easily be added, if you were so inclined.

Well, it turned out that they weren't easy to add at all. In fact it was darned difficult.

I guess I should have realized that something was wrong, because of the issue of newlines. In the last couple of instalments we've addressed that issue, and I've shown you how to deal with newlines with a procedure called, appropriately enough, NewLine. In TINY Version 1.0, I sprinkled calls to this procedure in strategic spots in the code.

It seems that every time I've addressed the issue of newlines, though, I've found it to be tricky, and the resulting parser turned out to be quite fragile ... one addition or deletion here or there and things tended to go to pot. Looking back on it, I realize that there was a message in this that I just wasn't paying attention to.

When I tried to add semicolons on top of the newlines, that was the last straw. I ended up with much too complex a solution. I began to realize that something fundamental had to change.

So, in a way this instalment will cause us to backtrack a bit and revisit the issue of scanning all over again. Sorry about that. That's the price you pay for watching me do this in real time. But the new version is definitely an improvement, and will serve us well for what is to come.

As I said, the scanner we used in Part X was about as simple as one can get. But anything can be improved. The new scanner is more like the classical scanner, and not as simple as before. But the overall compiler structure is even simpler than before. It's also more robust, and easier to add to and/or modify. I think that's worth the time spent in this digression. So in this instalment, I'll be showing you the new structure. No doubt you'll be happy to know that, while the changes affect many procedures, they aren't very profound and so we lose very little of what's been done so far.

Ironically, the new scanner is much more conventional than the old one, and is very much like the more generic scanner I showed you earlier in Part VII. Then I started trying to get clever, and I almost clevered myself clean out of business. You'd think one day I'd learn: KISS!

The Problem

The problem begins to show itself in procedure Block, which I've reproduced below:

{ Parse and translate a block of statements }
procedure Block;
begin
  Scan;
  while not (Token in ['e', 'l']) do 
    begin
      case Token of
        'i' : DoIf;
        'w' : DoWhile;
        'R' : DoRead;
        'W' : DoWrite;
      else 
        Assignment;
      end;
      Scan;
   end;
end;    

As you can see, Block is oriented to individual program statements. At each pass through the loop, we know that we are at the beginning of a statement. We exit the block when we have scanned an END or an ELSE.

But suppose that we see a semicolon instead. The procedure as it's shown above can't handle that, because procedure Scan only expects and can only accept tokens that begin with a letter.

I tinkered around for quite awhile to come up with a fix. I found many possible approaches, but none were very satisfying. I finally figured out the reason.

Recall that when we started with our single-character parsers, we adopted a convention that the lookahead character would always be pre-fetched. That is, we would have the character that corresponds to our current position in the input stream fetched into the global character Look, so that we could examine it as many times as needed. The rule we adopted was that every recognizer, if it found its target token, would advance Look to the next character in the input stream.

That simple and fixed convention served us very well when we had single-character tokens, and it still does. It would make a lot of sense to apply the same rule to multi-character tokens.

But when we got into lexical scanning, I began to violate that simple rule. The scanner of Part X did indeed advance to the next token if it found an identifier or keyword, but it didn't do that if it found a carriage return, a white-space character, or an operator.

Now, that sort of mixed-mode operation gets us into deep trouble in procedure Block, because whether or not the input stream has been advanced depends upon the kind of token we encounter. If it's a keyword or the target of an assignment statement, the "cursor", as defined by the contents of Look, has been advanced to the next token or to the beginning of white space. If, on the other hand, the token is a semicolon, or if we have hit a carriage return, the cursor has not advanced.

Needless to say, we can add enough logic to keep us on track. But it's tricky, and makes the whole parser very fragile.

There's a much better way, and that's just to adopt that same rule that's worked so well before, to apply to tokens as well as single characters. In other words, we'll pre-fetch tokens just as we've always done for characters. It seems so obvious once you think about it that way.

Interestingly enough, if we do things this way the problem that we've had with newline characters goes away. We can just lump them in as white-space characters, which means that the handling of newlines becomes very trivial, and much less prone to error than we've had to deal with in the past.

The Solution

Let's begin to fix the problem by re-introducing the two procedures:

{ Get an identifier }
procedure GetName;
begin
  SkipWhite;
  if Not IsAlpha(Look) then 
    Expected('Identifier');
  Token := 'x';
  Value := '';
  repeat
    Value := Value + UpCase(Look);
    GetChar;
  until not IsAlNum(Look);
end;

{ Get a number }
procedure GetNum;
begin
  SkipWhite;
  if not IsDigit(Look) then 
    Expected('Number');
  Token := '#';
  Value := '';
  repeat
    Value := Value + Look;
    GetChar;
  until not IsDigit(Look);
end;  
 

These two procedures are functionally almost identical to the ones I showed you in Part VII. They each fetch the current token, either an identifier or a number, into the global string Value. They also set the encoded version, Token, to the appropriate code. The input stream is left with Look containing the first character not part of the token.

We can do the same thing for operators, even multi-character operators, with a procedure such as:

{ Get an operator }
procedure GetOp;
begin
  Token := Look;
  Value := '';
  repeat
    Value := Value + Look;
    GetChar;
  until IsAlpha(Look) or IsDigit(Look) or IsWhite(Look);
end;   
 

Note that GetOp returns, as its encoded token, the first character of the operator. This is important, because it means that we can now use that single character to drive the parser, instead of the lookahead character.

We need to tie these procedures together into a single procedure that can handle all three cases. The following procedure will read any one of the token types and always leave the input stream advanced beyond it:

{ Get the next input token }
procedure Next;
begin
  SkipWhite;
  if IsAlpha(Look) then
    GetName
  else if IsDigit(Look) then
    GetNum
  else 
    GetOp;
end;        
      

Note that here I have put SkipWhite before the calls rather than after. This means that, in general, the variable Look will not have a meaningful value in it, and therefore we should not use it as a test value for parsing, as we have been doing so far. That's the big departure from our normal approach.

Now, remember that before I was careful not to treat the carriage return (CR) and line feed (LF) characters as white space. This was because, with SkipWhite called as the last thing in the scanner, the encounter with LF would trigger a read statement. If we were on the last line of the program, we couldn't get out until we input another line with a non-white character. That's why I needed the second procedure, NewLine, to handle the CRLF's.

But now, with the call to SkipWhite coming first, that's exactly the behaviour we want. The compiler must know there's another token coming or it wouldn't be calling Next. In other words, it hasn't found the terminating END yet. So we're going to insist on more data until we find something.

All this means that we can greatly simplify both the program and the concepts, by treating CR and LF as white-space characters, and eliminating NewLine. You can do that simply by modifying the function IsWhite:

{ Recognize white space }
function IsWhite(c : char): boolean;
begin
  IsWhite := c in [' ', TAB, CR, LF];
end;        
      

We've already tried similar routines in Part VII, but you might as well try these new ones out. Add them to a copy of the Cradle and call Next with the following main program:

{ Main program }
begin
  Init;
  repeat
    Next;
    WriteLn(Token, ' ', Value);
  until Token = '.';
end.  

Compile it and verify that you can separate a program into a series of tokens, and that you get the right encoding for each token.

This almost works, but not quite. There are two potential problems: First, in KISS/TINY almost all of our operators are single-character operators. The only exceptions are the relops >=, <=, and <>. It seems a shame to treat all operators as strings and do a string compare, when only a single character compare will almost always suffice. Second, and much more important, the thing doesn't work when two operators appear together, as in (a+b)*(c+d). Here the string following 'b' would be interpreted as a single operator ")*(".

It's possible to fix that problem. For example, we could just give GetOp a list of legal characters, and we could treat the parentheses as different operator types than the others. But this begins to get messy.

Fortunately, there's a better way that solves all the problems. Since almost all the operators are single characters, let's just treat them that way, and let GetOp get only one character at a time. This not only simplifies GetOp, but also speeds things up quite a bit. We still have the problem of the relops, but we were treating them as special cases anyway.

So here's the final version of GetOp:

{ Get an operator }
procedure GetOp;
begin
  SkipWhite;
  Token := Look;
  Value := Look;
  GetChar;
end;  

Note that I still give the string Value a value. If you're truly concerned about efficiency, you could leave this out. When we're expecting an operator, we will only be testing Token anyhow, so the value of the string won't matter. But to me it seems to be good practice to give the thing a value just in case.

Try this new version with some realistic-looking code. You should be able to separate any program into its individual tokens, with the caveat that the two-character relops will scan into two separate tokens. That's OK ... we'll parse them that way.

Now, in Part VII the function of Next was combined with procedure Scan, which also checked every identifier against a list of keywords and encoded each one that was found. As I mentioned at the time, the last thing we would want to do is to use such a procedure in places where keywords should not appear, such as in expressions. If we did that, the keyword list would be scanned for every identifier appearing in the code. Not good.

The right way to deal with that is to simply separate the functions of fetching tokens and looking for keywords. The version of Scan shown below does nothing but check for keywords. Notice that it operates on the current token and does not advance the input stream.

{ Scan the current identifier for keywords }
procedure Scan;
begin
  if Token = 'x' then
    Token := KWcode[Lookup(Addr(KWlist), Value, NKW) + 1];
end;        
      

There is one last detail. In the compiler there are a few places that we must actually check the string value of the token. Mainly, this is done to distinguish between the different END's, but there are a couple of other places. (I should note in passing that we could always eliminate the need for matching END characters by encoding each one to a different character. Right now we are definitely taking the lazy man's route.)

The following version of MatchString takes the place of the character-oriented Match. Note that, like Match, it does advance the input stream.

{ Match a specific input string }
procedure MatchString(x : string);
begin
  if Value <> x then
    Expected('''' + x + '''');
  Next;
end;  

Fixing up the Compiler

Armed with these new scanner procedures, we can now begin to fix the compiler to use them properly. The changes are all quite minor, but there are quite a few places where changes are necessary. Rather than showing you each place, I will give you the general idea and then just give the finished product.

First of all, the code for procedure Block doesn't change, though its function does:

{ Parse and translate a block of statements }
procedure Block;
begin
  Scan;
  while not(Token in ['e', 'l']) do
    begin
      case Token of
        'i' : DoIf;
        'w' : DoWhile;
        'R' : DoRead;
        'W' : DoWrite;
      else 
        Assignment;
      end;
      Scan;
   end;
end;  
 

Remember that the new version of Scan doesn't advance the input stream, it only scans for keywords. The input stream must be advanced by each procedure that Block calls.

In general, we have to replace every test on Look with a similar test on Token. For example:

{ Parse and translate a Boolean expression }
procedure BoolExpression;
begin
  BoolTerm;
  while IsOrOp(Token) do 
    begin
      Push;
      case Token of
        '|' : BoolOr;
        '~' : BoolXor;
      end;
    end;
end;  

In procedures like Add, we don't have to use Match any more. We need only call Next to advance the input stream:

{ Recognize and translate an add }
procedure Add;
begin
  Next;
  Term;
  PopAdd;
end;  

Control structures are actually simpler. We just call Next to advance over the control keywords:

{ Recognize and translate an IF construct }
procedure Block; forward;
procedure DoIf;
var 
  L1, L2 : string;
begin
  Next;
  BoolExpression;
  L1 := NewLabel;
  L2 := L1;
  BranchFalse(L1);
  Block;
  if Token = 'l' then 
    begin
      Next;
      L2 := NewLabel;
      Branch(L2);
      PostLabel(L1);
      Block;
    end;
  PostLabel(L2);
  MatchString('ENDIF');
end;  
That's about the extent of the required changes. In the listing of TINY Version 1.1 below, I've also made a number of other "improvements" that aren't really required. Let me explain them briefly:
  1. I've deleted the two procedures Prog and Main, and combined their functions into the main program. They didn't seem to add to program clarity ... in fact they seemed to just muddy things up a little.
  2. I've deleted the keywords PROGRAM and BEGIN from the keyword list. Each one only occurs in one place, so it's not necessary to search for it.
  3. Having been bitten by an overdose of cleverness, I've reminded myself that TINY is supposed to be a minimalist program. Therefore I've replaced the fancy handling of unary minus with the dumbest one I could think of. A giant step backwards in code quality, but a great simplification of the compiler. KISS is the right place to use the other version.
  4. I've added some error-checking routines such as CheckTable and CheckDup, and replaced in-line code by calls to them. This cleans up a number of routines.
  5. I've taken the error checking out of code generation routines like Store, and put it in the parser where it belongs. See Assignment, for example.
  6. There was an error in InTable and Locate that caused them to search all locations instead of only those with valid data in them. They now search only valid cells. This allows us to eliminate the initialization of the symbol table, which was done in Init.
  7. Procedure AddEntry now has two arguments, which helps to make things a bit more modular.
  8. I've cleaned up the code for the relational operators by the addition of the new procedures CompareExpression and NextExpression.
  9. I fixed an error in the Read routine ... the earlier value did not check for a valid variable name.

Conclusion

The resulting compiler for TINY is given below. Other than the removal of the keyword PROGRAM, it parses the same language as before. It's just a bit cleaner, and more importantly it's considerably more robust. I feel good about it.

The next instalment will be another digression: the discussion of semicolons and such that got me into this mess in the first place. Then we'll press on into procedures and types. Hang in there with me. The addition of those features will go a long way towards removing KISS from the "toy language" category. We're getting very close to being able to write a serious compiler.

New! Follow the links at the bottom of the page to the integration of TINY Version 1.1 with the MASM assembler to achieve complete compilation.

TINY Version 1.1

program TINY11;
{$Apptype Console}
uses
  SysUtils;

{ Type declarations }
type
  Symbol = string[8];
  SymTab = array[1 .. 1000] of Symbol;
  TabPtr = ^SymTab;

{ Constant declarations }
const
  TAB = ^I;
  CR  = ^M;
  LF  = ^J;
  MaxEntry = 100;

  { Definition of keywords and token types }
  NKW =   9;
  NKW1 = 10;
  KWlist : array[1 .. NKW] of Symbol =
                ('IF', 'ELSE', 'ENDIF', 'WHILE', 'ENDWHILE',
                 'READ', 'WRITE', 'VAR', 'END');

  KWcode : string[NKW1] = 'xileweRWve';

{ Variable declarations }
var
  Look : char;             { Lookahead character }
  Token : char;            { Encoded token       }
  Value : string[16];      { Unencoded token     }

  ST : array[1 .. MaxEntry] of Symbol;
  SType : array[1 .. MaxEntry] of char;
  LCount : integer = 0;
  NEntry : integer = 0;

{ Read new character from input stream }
procedure GetChar;
begin
  Read(Look);
end;

{ Report an error }
procedure Error(s : string);
begin
  WriteLn;
  WriteLn(^G, 'Error: ', s, '.');
  ReadLn;
  ReadLn;
end;

{ Report error and halt }
procedure Abort(s : string);
begin
  Error(s);
  Halt;
end;

{ Report what was expected }
procedure Expected(s : string);
begin
   Abort(s + ' Expected');
end;

{ Report an undefined identifier }
procedure Undefined(n : string);
begin
  Abort('Undefined Identifier ' + n);
end;

{ Report a duplicate identifier }
procedure Duplicate(n : string);
begin
  Abort('Duplicate Identifier ' + n);
end;

{ Check to make sure the current token is an identifier }
procedure CheckIdent;
begin
if Token <> 'x' then
  Expected('Identifier');
end;

{ Recognize an alpha character }
function IsAlpha(c : char) : boolean;
begin
  IsAlpha := UpCase(c) in ['A'..'Z'];
end;

{ Recognize a decimal digit }
function IsDigit(c : char) : boolean;
begin
  IsDigit := c in ['0' .. '9'];
end;

{ Recognize an alphanumeric character }
function IsAlNum(c : char): boolean;
begin
  IsAlNum := IsAlpha(c) or IsDigit(c);
end;

{ Recognize an addop }
function IsAddop(c : char) : boolean;
begin
  IsAddop := c in ['+', '-'];
end;

{ Recognize a mulop }
function IsMulop(c : char): boolean;
begin
  IsMulop := c in ['*', '/'];
end;

{ Recognize a Boolean orop }
function IsOrop(c : char): boolean;
begin
  IsOrop := c in ['|', '~'];
end;

{ Recognize a relop }
function IsRelop(c : char): boolean;
begin
  IsRelop := c in ['=', '#', '<', '>'];
end;

{ Recognize white space }
function IsWhite(c : char) : boolean;
begin
  IsWhite := c in [' ', TAB, CR, LF];
end;

{ Skip over leading white space }
procedure SkipWhite;
begin
  while IsWhite(Look) do
    GetChar;
end;

{ Table lookup }
function Lookup(T : TabPtr; s : string; n : integer) : integer;
var
  i : integer;
  found : Boolean;
begin
  found := false;
  i := n;
  while (i > 0) and not found do
    if s = T^[i] then
      found := true
    else
      dec(i);
  Lookup := i;
end;

{ Locate a symbol in table
  Returns the index of the entry.  Zero if not present. }

function Locate(N : Symbol) : integer;
begin
  Locate := Lookup(@ST, n, NEntry);
end;

{ Look for symbol in table }
function InTable(n : Symbol) : Boolean;
begin
  InTable := Lookup(@ST, n, NEntry) <> 0;
end;

{ Check to see if an identifier is in the symbol table
  Report an error if it's not. }
procedure CheckTable(N : Symbol);
begin
  if not InTable(N) then
    Undefined(N);
end;

{ Check the symbol table for a duplicate identifier
 Report an error if identifier is already in table. }
procedure CheckDup(N : Symbol);
begin
  if InTable(N) then
    Duplicate(N);
end;

{ Add a new entry to symbol table }
procedure AddEntry(N : Symbol; T : char);
begin
  CheckDup(N);
  if NEntry = MaxEntry then
    Abort('Symbol Table Full');
  Inc(NEntry);
  ST[NEntry] := N;
  SType[NEntry] := T;
end;

{ Get an identifier }
procedure GetName;
begin
  SkipWhite;
  if Not IsAlpha(Look) then
    Expected('Identifier');
  Token := 'x';
  Value := '';
  repeat
    Value := Value + UpCase(Look);
    GetChar;
  until not IsAlNum(Look);
end;

{ Get a number }
procedure GetNum;
begin
  SkipWhite;
  if not IsDigit(Look) then
    Expected('Number');
  Token := '#';
  Value := '';
  repeat
    Value := Value + Look;
    GetChar;
  until not IsDigit(Look);
end;

{ Get an operator }
procedure GetOp;
begin
  SkipWhite;
  Token := Look;
  Value := Look;
  GetChar;
end;

{ Get the next input token }
procedure Next;
begin
  SkipWhite;
  if IsAlpha(Look) then
    GetName
  else if IsDigit(Look) then
    GetNum
  else
    GetOp;
end;

{ Scan the current identifier for keywords }
procedure Scan;
begin
  if Token = 'x' then
    Token := KWcode[Lookup(Addr(KWlist), Value, NKW) + 1];
end;

{ Match a specific input string }
procedure MatchString(x : string);
begin
  if Value <> x then
    Expected('''' + x + '''');
  Next;
end;

{ Output a string with tab }
procedure Emit(s : string);
begin
  Write(TAB, s);
end;

{ Output a string with tab and CRLF }
procedure EmitLn(s : string);
begin
  Emit(s);
  WriteLn;
end;

{ Generate a unique label }
function NewLabel : string;
var
  S : string;
begin
  Str(LCount, S);
  NewLabel := '@L' + S;
  Inc(LCount);
end;

{ Post a label to output }
procedure PostLabel(L : string);
begin
  WriteLn(L, ':');
end;

procedure Clear;
begin
   EmitLn('XOR EAX, EAX');
end;

{ Negate the primary register }
procedure Negate;
begin
  EmitLn('NEG EAX');
end;

{ Complement the primary register }
procedure NotIt;
begin
  EmitLn('NOT EAX');
end;

{ Load a constant value to primary register }
procedure LoadConst(n : string);
begin
  EmitLn('MOV EAX, ' + n);
end;

{ Load a variable to primary register }
procedure LoadVar(Name : string);
begin
  if not InTable(Name) then
     Undefined(Name);
  EmitLn('MOV EAX, ' + Name);
end;

{ Push primary onto stack }
procedure Push;
begin
  EmitLn('PUSH EAX');
end;

{ Add top of stack to primary }
procedure PopAdd;
begin
  EmitLn('POP EDX');
  EmitLn('ADD EAX, EDX');
end;

{ Subtract primary from top of stack }
procedure PopSub;
begin
  EmitLn('POP EDX');
  EmitLn('SUB EAX, EDX');
  EmitLn('NEG EAX');
end;

{ Multiply top of stack by primary }
procedure PopMul;
begin
  EmitLn('POP EDX');
  EmitLn('IMUL EDX');
end;

{ Divide top of stack by primary }
procedure PopDiv;
begin
  EmitLn('MOV ECX, EAX');
  EmitLn('POP EAX');
  EmitLn('XOR EDX, EDX'); //Clear EDX
  EmitLn('IDIV ECX');
end;

{ AND top of stack with primary }
procedure PopAnd;
begin
  EmitLn('POP EDX');
  EmitLn('AND EAX, EDX');
end;

{ OR top of stack with primary }
procedure PopOr;
begin
  EmitLn('POP EDX');
  EmitLn('OR EAX, EDX');
end;

{ XOR top of stack with primary }
procedure PopXor;
begin
  EmitLn('POP EDX');
  EmitLn('XOR EAX, EDX');
end;

{ Compare top of stack with primary }
procedure PopCompare;
begin
  EmitLn('POP EDX');
  EmitLn('CMP EDX, EAX');
end;

{ Set EAX if compare was = }
procedure SetEqual;
begin
  EmitLn('CMOVE EAX, T');
  EmitLn('CMOVNE EAX, F');
end;

{ Set EAX if compare was != }
procedure SetNEqual;
begin
  EmitLn('CMOVE EAX, F');
  EmitLn('CMOVNE EAX, T');
end;

{ Set EAX if compare was > }
procedure SetGreater;
begin
  EmitLn('CMOVG EAX, T');
  EmitLn('CMOVLE EAX, F');
end;

{ Set EAX if compare was < }
procedure SetLess;
begin
  EmitLn('CMOVL EAX, T');
  EmitLn('CMOVGE EAX, F');
end;

{ Set EAX if compare was <= }
procedure SetLessOrEqual;
begin
  EmitLn('CMOVLE EAX, T');
  EmitLn('CMOVG EAX, F');
end;

{ Set EAX if compare was >= }
procedure SetGreaterOrEqual;
begin
  EmitLn('CMOVGE EAX, T');
  EmitLn('CMOVL EAX, F');
end;

{ Store primary to variable }
procedure Store(Name : string);
begin
   EmitLn('MOV ' + Name + ', EAX');
end;

{ Branch unconditional }
procedure Branch(L : string);
begin
  EmitLn('JMP ' + L);
end;

{ Branch False }
procedure BranchFalse(L : string);
begin
  EmitLn('TEST EAX, -1');
  EmitLn('JE ' + L);
end;

{ Read variable to primary register }
procedure ReadIt(Name : string);
begin
  EmitLn('CALL READ');
  Store(Name);
end;

{ Write from primary register }
procedure WriteIt;
begin
   EmitLn('CALL WRITE');
end;

{ Write header info }
procedure Header;
begin
  WriteLn('Place-holder for MASM start-up code');
  EmitLn('LIB TINYLIB');
end;

{ Write the prolog }
procedure Prolog;
begin
  PostLabel('MAIN');
end;

{ Write the epilog }
procedure Epilog;
begin
  EmitLn('Place-holder for epilog');
end;

{ Allocate storage for a static variable }
procedure Allocate(Name, Val : string);
begin
  WriteLn('Var ', Name, ' : integer = ', Val, ';');
end;

{ Parse and translate a math factor }
procedure BoolExpression; forward;
procedure Factor;
begin
  if Token = '(' then
    begin
      Next;
      BoolExpression;
      MatchString(')');
    end
  else
    begin
      if Token = 'x' then
        LoadVar(Value)
      else if Token = '#' then
        LoadConst(Value)
      else
        Expected('Math Factor');
      Next;
    end;
end;

{ Recognize and translate a multiply }
procedure Multiply;
begin
  Next;
  Factor;
  PopMul;
end;

{ Recognize and translate a divide }
procedure Divide;
begin
  Next;
  Factor;
  PopDiv;
end;

{ Parse and translate a math term }
procedure Term;
begin
  Factor;
  while IsMulop(Token) do
    begin
      Push;
      case Token of
        '*' : Multiply;
        '/' : Divide;
      end;
  end;
end;

{ Recognize and translate an add }
procedure Add;
begin
  Next;
  Term;
  PopAdd;
end;

{ Recognize and translate a subtract }
procedure Subtract;
begin
  Next;
  Term;
  PopSub;
end;

{ Parse and translate an expression }
procedure Expression;
begin
  if IsAddop(Token) then
    Clear
  else
    Term;
  while IsAddop(Token) do
    begin
      Push;
      case Token of
        '+' : Add;
        '-' : Subtract;
      end;
  end;
end;

{ Get another expression and compare }
procedure CompareExpression;
begin
  Expression;
  PopCompare;
end;

{ Get the next expression and compare }
procedure NextExpression;
begin
  Next;
  CompareExpression;
end;

{ Recognize and translate a relational "Equals" }
procedure Equal;
begin
  NextExpression;
  SetEqual;
end;

{ Recognize and translate a relational "Less Than or Equal" }
procedure LessOrEqual;
begin
  NextExpression;
  SetLessOrEqual;
end;

{ Recognize and translate a relational "Not Equals" }
procedure NotEqual;
begin
  NextExpression;
  SetNEqual;
end;

{ Recognize and translate a relational "Less Than" }
procedure Less;
begin
  Next;
  case Token of
    '=' : LessOrEqual;
    '>' : NotEqual;
  else
    begin
      CompareExpression;
      SetLess;
    end;
  end;
end;

{ Recognize and translate a relational "Greater Than" }
procedure Greater;
begin
  Next;
  if Token = '=' then
    begin
      NextExpression;
      SetGreaterOrEqual;
    end
  else
    begin
      CompareExpression;
      SetGreater;
    end;
end;

{ Parse and translate a relation }
procedure Relation;
begin
  Expression;
  if IsRelop(Token) then
    begin
      Push;
      case Token of
        '=' : Equal;
        '<' : Less;
        '>' : Greater;
      end;
    end;
end;

{ Parse and translate a Boolean factor with leading NOT }
procedure NotFactor;
begin
  if Token = '!' then
    begin
      Next;
      Relation;
      NotIt;
    end
  else
    Relation;
end;

{ Parse and translate a Boolean term }
procedure BoolTerm;
begin
  NotFactor;
  while Token = '&' do
    begin
      Push;
      Next;
      NotFactor;
      PopAnd;
    end;
end;

{ Recognize and translate a Boolean OR }
procedure BoolOr;
begin
  Next;
  BoolTerm;
  PopOr;
end;

{ Recognize and translate an exclusive Or }
procedure BoolXor;
begin
  Next;
  BoolTerm;
  PopXor;
end;

{ Parse and translate a Boolean expression }
procedure BoolExpression;
begin
  BoolTerm;
  while IsOrOp(Token) do
    begin
      Push;
      case Token of
        '|' : BoolOr;
        '~' : BoolXor;
      end;
    end;
end;

{ Parse and translate an assignment statement }
procedure Assignment;
var
  Name : string;
begin
  CheckTable(Value);
  Name := Value;
  Next;
  MatchString('=');
  BoolExpression;
  Store(Name);
end;

{ Recognize and translate an IF construct }
procedure Block; forward;
procedure DoIf;
var
  L1, L2 : string;
begin
  Next;
  BoolExpression;
  L1 := NewLabel;
  L2 := L1;
  BranchFalse(L1);
  Block;
  if Token = 'l' then
    begin
      Next;
      L2 := NewLabel;
      Branch(L2);
      PostLabel(L1);
      Block;
    end;
  PostLabel(L2);
  MatchString('ENDIF');
end;

{ Parse and translate a WHILE statement }
procedure DoWhile;
var
  L1, L2 : string;
begin
  Next;
  L1 := NewLabel;
  L2 := NewLabel;
  PostLabel(L1);
  BoolExpression;
  BranchFalse(L2);
  Block;
  MatchString('ENDWHILE');
  Branch(L1);
  PostLabel(L2);
end;

{ Read a single variable }
procedure ReadVar;
begin
  CheckIdent;
  CheckTable(Value);
  ReadIt(Value);
  Next;
end;

{ Process a read statement }
procedure DoRead;
begin
  Next;
  MatchString('(');
  ReadVar;
  while Token = ',' do
    begin
      Next;
      ReadVar;
    end;
  MatchString(')');
end;

{ Process a write statement }
procedure DoWrite;
begin
  Next;
  MatchString('(');
  Expression;
  WriteIt;
  while Token = ',' do
    begin
      Next;
      Expression;
      WriteIt;
    end;
  MatchString(')');
end;

{ Parse and translate a block of statements }
procedure Block;
begin
  Scan;
  while not(Token in ['e', 'l']) do
    begin
      case Token of
        'i' : DoIf;
        'w' : DoWhile;
        'R' : DoRead;
        'W' : DoWrite;
      else
        Assignment;
      end;
      Scan;
  end;
end;

{ Allocate storage for a variable }
procedure Alloc;
begin
  Next;
  if Token <> 'x' then
    Expected('Variable Name');
  CheckDup(Value);
  AddEntry(Value, 'v');
  Allocate(Value, '0');
  Next;
end;

{ Parse and translate global declarations }
procedure TopDecls;
begin
  Scan;
  while Token = 'v' do
    Alloc;
  while Token = ',' do
    Alloc;
end;

{ Initialize }
procedure Init;
begin
  GetChar;
  Next;
end;

{ Main program }
begin
  Init;
  MatchString('PROGRAM');
  Header;
  TopDecls;
  MatchString('BEGIN');
  Prolog;
  Block;
  MatchString('END');
  Epilog;
  ReadLn;
  ReadLn;
end.
*****************************************************************
*                                                               *
*                        COPYRIGHT NOTICE                       *
*                                                               *
*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *
*                                                               *
*****************************************************************
Programming - a skill for life!

PPS introduction to the sixteen chapters of Let's build a compiler! by Jack Crenshaw, adapted for output to Intel processors. Conversion of compiler to output ARM assembler for use on Raspberry Pi.