Compilers: The Basics

The three main stages of compilation are lexical analysis, syntax analysis and code generation. The code can also be optimised in many different ways both to increase speed of execution or to decrease the size of the binary code. The following paragraphs summarise some of the key points you will find in a school textbook. The demonstration programs in Let's Build a Compiler illustrate a variety of approaches to writing a translator and give you a real feeling for the topic. Jack Crenshaw explains carefully his reasons for each step and informs you when he is departing from the usual practice.

Lexical analysis comprises the examination of the stream of characters in the source code and the representation by a token of a single character or a group of characters. Each token may represent a keyword of the language such as FOR, an operator such as +, an identifier (for the program, variable, constant, procedure etc.) or a literal. Typically, the identifiers are stored in a symbol table at this stage. The scanner (lexical analyser) must skip comments and white spaces such as spaces and tabs. The output from this first stage is used by the syntax analyser.

The syntax analysis ensures that the code obeys the rules of the language (the grammar). Symbol tables gain additional information about the type and scope of each variable and constant. An error message should be given if a variables or constant is of an incorrect type for the way in which it is used, or if an attempt is made to use it outside its scope. Production rules may be expressed in Backus-Naur form. Let's Build a Compiler! gives many examples of writing compiler code that follows production rules and detects and reports errors when the source code does not conform to the specified grammar. Usually the syntax analyser produces a parse tree. See this excellent demonstration where "the parse tree is constructed as control flows through a parsing function. When a parsing function is first entered, a node without any children is created for it. As its constituents are parsed, the parse trees corresponding to the constituents are added to the node for the parsing function." See also the left panel with the grammar including the terminology term and factor.

The output from syntax analysis is passed on to code generation in the absence of syntax errors. Code generation requires that variables and constants be given addresses. These addresses are added to the symbol table(s) before the code is generated. The generated code may be an intermediate language or executable code.

Some compilers convert the source code into an intermediate language, which can then be translated for different platforms. A well known example is Java, where the Java Virtual Machine on the target computer interprets intermediate bytecode to binary instructions applicable to its processor(s).

Pascaland maintains a list of compilers for the Pascal language.

Libraries

A compiler can produce a library file instead of an executable program. This compiled file is free from syntax errors and is available for use by multiple programs. A library is especially suited to common tasks. A library should be produced by the most able programmers and tested thoroughly before being made available for linkage by other programs. The language of the source file of a library does not need to be the same as that of the program that uses it. The Windows .dll file type is an example of a dynamically linked library. It can be loaded only when required by a program and one copy of a library is accessible concurrently by different applications on the computer. The binary code of a library linked statically will be included in the executable file.

The Delphi 7 help file prompted this demonstration of access to a library from a Pascal program. A screenshot follows the code. If you want to change the type of message box and use the return value of the MessageBoxA function, consult Microsoft's documentation of the function. Have fun experimenting with this and other libraries!

program LibraryDemo;
{$Apptype Console}
uses
  SysUtils;
var
  MyCaption : pchar = 'Message Box from Console Program';
  MyText : pchar = 'This works!';

function MessageBoxA(HWnd : Integer; Text, Caption: PChar;
                     Flags: Integer): Integer; stdcall; external 'user32.dll';
begin
  MessageBoxA(0, MyText, MyCaption, 0);
  readln;
end.  
Message box from user32.dll

Message box from user32.dll

Within a few minutes of being asked to experiment Steven Binns studied the documentation and produced this code.

program librarything;
{$Apptype Console}
uses
  SysUtils;
var
  MyCaption : pchar = 'Capshuns';
  MyText : pchar = 'Can haz yes/no?';

function MessageBoxA(HWnd : Integer; Text, Caption: PChar;
                     Flags: Integer): Integer; stdcall; external 'user32.dll';
begin
  //Multiple flags can be set using multiple or statements
  case MessageBoxA(0, MyText, MyCaption, ($00000004) or ($00000020)) of //$00000004 makes the message box a yes/no box and 
                                                                        //$00000020 makes the ? symbol appear in the textbox.
    7 : MessageBoxA(0, PChar('Awwww... D:'), Pchar('Answer: No!'), 0); //a No button will always return 7
    6 : MessageBoxA(0, PChar('Yay! :D'), Pchar('Answer: Yes!'), 0);    //a Yes button will always return 6
  end;
  readln;

end.

This screenshot shows one of the three message boxes that the program creates.

Second message box after "No" button was pressed on the first

Second message box after "No" button was pressed on the first

Our Pascal demo that controls the PiFace Digital interface for the Raspberry Pi accesses a C library.

Programming - a skill for life!

Compilers (including Jack Crenshaw's "Let's Build a Compiler" applied to Intel processors and the Raspberry Pi), assemblers and interpreters