Procedures

                              By
        
                      Jack W. Crenshaw, Ph.D.
        
                          27 August 1989                     
        
*****************************************************************
*                                                               *
*                        COPYRIGHT NOTICE                       *
*                                                               *
*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *
*                                                               *
*****************************************************************

Introduction

At last we get to the good part!

At this point we've studied almost all the basic features of compilers and parsing. We have learned how to translate arithmetic expressions, Boolean expressions, control constructs, data declarations, and I/O statements. We have defined a language, TINY 1.2, that embodies all these features, and we have written a rudimentary compiler that can translate them. By adding some file I/O we could indeed have a working compiler that could produce executable object files from programs written in TINY. With such a compiler, we could write simple programs that could read integer data, perform calculations with it, and output the results.

That's nice, but what we have is still only a toy language. We can't read or write even a single character of text, and we still don't have procedures.

It's the features to be discussed in the next couple of instalments that separate the men from the toys, so to speak. "Real" languages have more than one data type, and they support procedure calls. More than any others, it's these two features that give a language much of its character and personality. Once we have provided for them, our languages, TINY and its successors, will cease to become toys and will take on the character of real languages, suitable for serious programming jobs.

For several instalments now, I've been promising you sessions on these two important subjects. Each time, other issues came up that required me to digress and deal with them. Finally, we've been able to put all those issues to rest and can get on with the mainstream of things. In this instalment, I'll cover procedures. Next time, we'll talk about the basic data types.

One Last Digression

This has been an extraordinarily difficult instalment for me to write. The reason has nothing to do with the subject itself ... I've known what I wanted to say for some time, and in fact I presented most of this at Software Development '89, back in February. It has more to do with the approach. Let me explain.

When I first began this series, I told you that we would use several "tricks" to make things easy, and to let us learn the concepts without getting too bogged down in the details. Among these tricks was the idea of looking at individual pieces of a compiler at a time, i.e. performing experiments using the Cradle as a base. When we studied expressions, for example, we dealt with only that part of compiler theory. When we studied control structures, we wrote a different program, still based on the Cradle, to do that part. We only incorporated these concepts into a complete language fairly recently. These techniques have served us very well indeed, and led us to the development of a compiler for TINY version 1.3.

When I first began this session, I tried to build upon what we had already done, and just add the new features to the existing compiler. That turned out to be a little awkward and tricky ... much too much to suit me.

I finally figured out why. In this series of experiments, I had abandoned the very useful techniques that had allowed us to get here, and without meaning to I had switched over into a new method of working, that involved incremental changes to the full TINY compiler.

You need to understand that what we are doing here is a little unique. There have been a number of articles, such as the Small C articles by Cain and Hendrix, that presented finished compilers for one language or another. This is different. In this series of tutorials, you are watching me design and implement both a language and a compiler, in real time.

In the experiments that I've been doing in preparation for this article, I was trying to inject the changes into the TINY compiler in such a way that, at every step, we still had a real, working compiler. In other words, I was attempting an incremental enhancement of the language and its compiler, while at the same time explaining to you what I was doing.

That's a tough act to pull off! I finally realized that it was dumb to try. Having gotten this far using the idea of small experiments based on single-character tokens and simple, special-purpose programs, I had abandoned them in favour of working with the full compiler. It wasn't working.

So we're going to go back to our roots, so to speak. In this instalment and the next, I'll be using single-character tokens again as we study the concepts of procedures, unfettered by the other baggage that we have accumulated in the previous sessions. As a matter of fact, I won't even attempt, at the end of this session, to merge the constructs into the TINY compiler. We'll save that for later.

After all this time, you don't need more build-up than that, so let's waste no more time and dive right in.

The Basics

All modern CPU's provide direct support for procedure calls, and the Intel is no exception. For the Intel, RET is the return from a CALL. All we have to do is to arrange for the compiler to issue these commands at the proper place.

Actually, there are really three things we have to address. One of them is the call/return mechanism. The second is the mechanism for defining the procedure in the first place. And, finally, there is the issue of passing parameters to the called procedure. None of these things are really very difficult, and we can of course borrow heavily on what people have done in other languages ... there's no need to reinvent the wheel here. Of the three issues, that of parameter passing will occupy most of our attention, simply because there are so many options available.

A Basis for Experiments

As always, we will need some software to serve as a basis for what we are doing. We don't need the full TINY compiler, but we do need enough of a program so that some of the other constructs are present. Specifically, we need at least to be able to handle statements of some sort, and data declarations.

The program shown below is that basis. It's a vestigial form of TINY, with single-character tokens. It has data declarations, but only in their simplest form ... no lists or initializers. It has assignment statements, but only of the kind

         <ident> = <ident>
 

In other words, the only legal expression is a single variable name. There are no control constructs ... the only legal statement is the assignment.

Most of the program is just the standard Cradle routines. I've shown the whole thing here, just to make sure we're all starting from the same point:

program Procs01;
{$Apptype Console}
uses
  SysUtils;

{ Constant declarations }
const
  TAB = ^I;
  CR  = ^M;
  LF  = ^J;
{ Variable declarations }

var
  Look : char;              { Lookahead Character }
  ST : array['A' .. 'Z'] of char;

{ Read new character from input stream }
procedure GetChar;
begin
  Read(Look);
end;

{ Report an error }
procedure Error(s : string);
begin
  WriteLn;
  WriteLn(^G, 'Error: ', s, '.');
  ReadLn;
  ReadLn;
end;

{ Report error and halt }
procedure Abort(s : string);
begin
  Error(s);
  Halt;
end;

{ Report what was expected }
procedure Expected(s : string);
begin
  Abort(s + ' Expected');
end;

{ Report an undefined identifier }
procedure Undefined(n : string);
begin
  Abort('Undefined Identifier ' + n);
end;

{ Report a duplicate identifier }
procedure Duplicate(n : string);
begin
  Abort('Duplicate Identifier ' + n);
end;

{ Get type of symbol }
function TypeOf(n : char) : char;
begin
  TypeOf := ST[n];
end;

{ Look for symbol in table }
function InTable(n : char) : Boolean;
begin
  InTable := ST[n] <> ' ';
end;

{ Add a new symbol to table }
procedure AddEntry(Name, T : char);
begin
  if InTable(Name) then
    Duplicate(Name);
  ST[Name] := T;
end;

{ Check an entry to make sure it's a variable }
procedure CheckVar(Name : char);
begin
  if not InTable(Name) then
    Undefined(Name);
  if  TypeOf(Name)  <>  'v'  then
    Abort(Name  +  ' is not a variable');
end;

{ Recognize an alpha character }
function IsAlpha(c : char) : boolean;
begin
  IsAlpha := upcase(c) in ['A' .. 'Z'];
end;

{ Recognize a decimal digit }
function IsDigit(c : char) : boolean;
begin
  IsDigit := c in ['0' .. '9'];
end;

{ Recognize an alphanumeric character }
function IsAlNum(c : char) : boolean;
begin
  IsAlNum := IsAlpha(c) or IsDigit(c);
end;

{ Recognize an addop }
function IsAddop(c : char) : boolean;
begin
  IsAddop := c in ['+', '-'];
end;

{ Recognize a mulop }
function IsMulop(c : char) : boolean;
begin
  IsMulop := c in ['*', '/'];
end;

{ Recognize a Boolean orop }
function IsOrop(c : char) : boolean;
begin
  IsOrop := c in ['|', '~'];
end;

{ Recognize a relop }
function IsRelop(c : char) : boolean;
begin
  IsRelop := c in ['=', '#', '<', '>'];
end;

{ Recognize white space }
function IsWhite(c : char) : boolean;
begin
  IsWhite := c in [' ', TAB];
end;

{ Skip over leading white space }
procedure SkipWhite;
begin
  while IsWhite(Look) do
    GetChar;
end;

{ Skip over an end-of-line }
procedure Fin;
begin
  if Look = CR then
    begin
      GetChar;
      if Look = LF then
        GetChar;
    end;
end;

{ Match a specific input character }
procedure Match(x : char);
begin
  if Look = x then
    GetChar
  else
    Expected('''' + x + '''');
  SkipWhite;
end;

{ Get an identifier }
function GetName : char;
begin
  if not IsAlpha(Look) then
    Expected('Name');
  GetName := UpCase(Look);
  GetChar;
  SkipWhite;
end;

{ Get a number }
function GetNum : char;
begin
  if not IsDigit(Look) then
    Expected('Integer');
  GetNum := Look;
  GetChar;
  SkipWhite;
end;

{ Output a string with tab }
procedure Emit(s : string);
begin
  Write(TAB, s);
end;

{ Output a string with tab and CRLF }
procedure EmitLn(s : string);
begin
  Emit(s);
  WriteLn;
end;

{ Post a label to output }
procedure PostLabel(L : string);
begin
  WriteLn('@', L, ':');
end;

{ Load a variable to the primary register }
procedure LoadVar(Name : char);
begin
  CheckVar(Name);
  EmitLn('MOV EAX, ' + Name);
end;

{ Store the primary register }
procedure StoreVar(Name : char);
begin
  CheckVar(Name);
  EmitLn('MOV ' + Name + ', EAX');
end;

{ Initialize }
procedure Init;
var
  i : char;
begin
  GetChar;
  SkipWhite;
  for i := 'A' to 'Z' do
    ST[i] := ' ';
end;

{ Parse and translate an expression
  vestigial version }
procedure Expression;
begin
  LoadVar(GetName);
end;

{ Parse and translate an assignment statement }
procedure Assignment;
var
  Name : char;
begin
  Name := GetName;
  Match('=');
  Expression;
  StoreVar(Name);
end;

{ Parse and translate a block of statements }
procedure DoBlock;
begin
  while not (Look in ['e']) do
    begin
      Assignment;
      Fin;
    end;
end;

{ Parse and translate a begin-block }
procedure BeginBlock;
begin
  Match('b');
  Fin;
  DoBlock;
  Match('e');
  Fin;
end;

{ Allocate storage for a variable }
procedure Alloc(N : char);
begin
  if InTable(N) then
    Duplicate(N);
  ST[N] := 'v';
  WriteLn('var ', N, ': integer;');
end;

{ Parse and translate a data declaration }
procedure Decl;
var
  Name : char;
begin
  Match('v');
  Alloc(GetName);
end;

{ Parse and translate global declarations }
procedure TopDecls;
begin
  while Look <> 'b' do
    begin
      case Look of
        'v' : Decl;
      else
        Abort('Unrecognized Keyword ' + Look);
      end;
      Fin;
    end;
end;

{ Main program }
begin
  Init;
  TopDecls;
  BeginBlock;
  ReadLn;
  ReadLn;
end.

Note that we do have a symbol table, and there is logic to check a variable name to make sure it's a legal one. It's also worth noting that I have included the code you've seen before to provide for white space and newlines. Finally, note that the main program is delimited, as usual, by BEGIN-END brackets.

Once you've copied the program to Lazarus or Delphi, the first step is to compile it and make sure it works. Give it a few declarations, and then a begin-block. Try something like:

      va             (for VAR A)
      vb             (for VAR B)
      vc             (for VAR C)
      b              (for BEGIN)
      a=b
      b=c
      e.             (for END.)

As usual, you should also make some deliberate errors, and verify that the program catches them correctly. Save the program as Procs02 to prepare for declaring a procedure.

Declaring a Procedure

If you're satisfied that our little program works, then it's time to deal with the procedures. Since we haven't talked about parameters yet, we'll begin by considering only procedures that have no parameter lists.

As a start, let's consider a simple program with a procedure, and think about the code we'd like to see generated for it:

     PROGRAM FOO;
     .
     .
     PROCEDURE BAR;                     BAR:
     BEGIN                                   .
     .                                       .
     .                                       .
     END;                                    RET

     BEGIN { MAIN PROGRAM }             MAIN:
     .                                       .
     .                                       .
     BAR;                                    CALL BAR
     .                                       .
     .                                       .
     END.                                    END MAIN

Here I've shown the high-order language constructs on the left, and the desired assembler code on the right. The first thing to notice is that we certainly don't have much code to generate here! For the great bulk of both the procedure and the main program, our existing constructs take care of the code to be generated.

The key to dealing with the body of the procedure is to recognize that although a procedure may be quite long, declaring it is really no different than declaring a variable. It's just one more kind of declaration. We can write the BNF:

        <declaration> ::= <data decl> | <procedure>        

This means that it should be easy to modify TopDecl to deal with procedures. What about the syntax of a procedure? Well, here's a suggested syntax, which is essentially that of Pascal:

        <procedure> ::= PROCEDURE <ident> <begin-block>

There is practically no code generation required, other than that generated within the begin-block. We need only emit a label at the beginning of the procedure, and a RET at the end.

Here's the required code:

{ Parse and translate a procedure declaration }
procedure DoProc;
var 
  N : char;
begin
  Match('p');
  N := GetName;
  Fin;
  if InTable(N) then 
    Duplicate(N);
  ST[N] := 'p';
  PostLabel(N);
  BeginBlock;
  Return;
end;        
      

Note that I've added a new code generation routine, Return, which merely emits a RET instruction. The creation of that routine is "left as an exercise for the student".

To finish this version, add the following line within the CASE statement in TopDecls:
'p': DoProc;  

I should mention that this structure for declarations, and the BNF that drives it, differs from standard Pascal. In the Jensen & Wirth definition of Pascal, variable declarations, in fact all kinds of declarations, must appear in a specific sequence, i.e. labels, constants, types, variables, procedures, and main program. To follow such a scheme, we should separate the two declarations, and have code in the main program something like

 DoVars;
 DoProcs;
 DoMain;        

However, most implementations of Pascal don't require that order and let you freely mix up the various declarations, as long as you still don't try to refer to something before it's declared. Although it may be more aesthetically pleasing to declare all the global variables at the top of the program, it certainly doesn't do any harm to allow them to be sprinkled around. In fact, it may do some good, in the sense that it gives you the opportunity to do a little rudimentary information hiding. Variables that should be accessed only by the main program, for example, can be declared just before it and will thus be inaccessible by the procedures.

OK, try this new version out. Note that we can declare as many procedures as we choose (as long as we don't run out of single-character names!), and the labels and RET's all come out in the right places.

It's worth noting here that I do not allow for nested procedures. In TINY, all procedures must be declared at the global level, the same as in C. There has been quite a discussion about this point in the Computer Language Forum of CompuServe. It turns out that there is a significant penalty in complexity that must be paid for the luxury of nested procedures. What's more, this penalty gets paid at run time, because extra code must be added and executed every time a procedure is called. I also happen to believe that nesting is not a good idea, simply on the grounds that I have seen too many abuses of the feature. Before going on to the next step, it's also worth noting that the "main program" as it stands is incomplete, since it doesn't have the label and END statement. Let's fix that little oversight:

{ Dummy prolog }
procedure Prolog;
begin
  PostLabel('MAIN');
end;

{ Dummy epilog }
procedure Epilog;
begin
  EmitLn('//epilog');
end;
        
{ Parse and translate a main program }
procedure DoMain;
begin
  Match('b');
  Fin;
  Prolog;
  DoBlock;
  Epilog;
end;

{ Main program }
begin
  Init;
  TopDecls;
  DoMain;
  ReadLn;
  ReadLn;
end.

PPS: We advise you to save your testing until the end of this section and then refer to our example of acceptable input in the Appendix.

Note that DoProc and DoMain are not quite symmetrical. DoProc uses a call to BeginBlock, whereas DoMain cannot. That's because a procedure is signalled by the keyword PROCEDURE (abbreviated by a 'p' here), while the main program gets no keyword other than the BEGIN itself.

And that brings up an interesting question: Why?

If we look at the structure of C programs, we find that all functions are treated just alike, except that the main program happens to be identified by its name, "main". Since C functions can appear in any order, the main program can also be anywhere in the compilation unit.

In Pascal, on the other hand, all variables and procedures must be declared before they're used, which means that there is no point putting anything after the main program ... it could never be accessed. The "main program" is not identified at all, other than being that part of the code that comes after the global BEGIN. In other words, if it ain't anything else, it must be the main program.

This causes no small amount of confusion for beginning programmers, and for big Pascal programs sometimes it's difficult to find the beginning of the main program at all. This leads to conventions such as identifying it in comments:
BEGIN { of MAIN }        
      

This has always seemed to me to be a bit of a kludge. The question comes up: Why should the main program be treated so much differently than a procedure? In fact, now that we've recognized that procedure declarations are just that ... part of the global declarations ... isn't the main program just one more declaration, also?

The answer is yes, and by treating it that way, we can simplify the code and make it considerably more orthogonal. I propose that we use an explicit keyword, PROGRAM, to identify the main program. (Note that this means that we can't start the file with it, as in Pascal). In this case, our BNF becomes:

     <declaration> ::= <data decl> | <procedure> | <main program>

     <procedure> ::= PROCEDURE <ident> <begin-block>

     <main program> ::= PROGRAM <ident> <begin-block>
The code also looks much better, at least in the sense that DoMain and DoProc look more alike:
{ Parse and translate a main program }
procedure DoMain;
var
  N : char;
begin
  Match('P');
  N := GetName;
  Fin;
  if InTable(N) then
    Duplicate(N);
  Prolog;
  BeginBlock;
end;

{ Parse and translate global declarations }
procedure TopDecls;
begin
  while Look <> '.' do
    begin
      case Look of
        'v' : Decl;
        'p' : DoProc;
        'P' : DoMain;
      else
        Abort('Unrecognized Keyword ' + Look);
      end;
      Fin;
  end;
end;

{ Main program }
begin
  Init;
  TopDecls;
  Epilog;
  ReadLn;
  ReadLn;
end.                    

Since the declaration of the main program is now within the loop of TopDecl, that does present some difficulties. How do we ensure that it's the last thing in the file? And how do we ever exit from the loop? My answer for the second question, as you can see, was to bring back our old friend the period. Once the parser sees that, we're done.

To answer the first question: it depends on how far we're willing to go to protect the programmer from dumb mistakes. In the code that I've shown, there's nothing to keep the programmer from adding code after the main program ... even another main program. The code will just not be accessible. However, we could access it via a forward statement, which we'll be providing later. As a matter of fact, many assembler language programmers like to use the area just after the program to declare large, uninitialized data blocks, so there may indeed be some value in not requiring the main program to be last. We'll leave it as it is.

If we decide that we should give the programmer a little more help than that, it's pretty easy to add some logic to kick us out of the loop once the main program has been processed. Or we could at least flag an error if someone tries to include two mains.

Save the program as Procs03 to prepare for calling the procedure.

Calling the Procedure

If you're satisfied that things are working, let's address the second half of the equation ... the call.

Consider the BNF for a procedure call:
      <proc_call> ::= <identifier>
for an assignment statement, on the other hand, the BNF is:
      <assignment> ::= <identifier> '=' <expression>  

At this point we seem to have a problem. The two BNF statements both begin on the right-hand side with the token <identifier>. How are we supposed to know, when we see the identifier, whether we have a procedure call or an assignment statement? This looks like a case where our parser ceases being predictive, and indeed that's exactly the case. However, it turns out to be an easy problem to fix, since all we have to do is to look at the type of the identifier, as recorded in the symbol table. As we've discovered before, a minor local violation of the predictive parsing rule can be easily handled as a special case.

Here's how to do it:

{ Parse and translate an assignment statement }
procedure Assignment(Name : char);
begin
  Match('=');
  Expression;
  StoreVar(Name);
end;

{ Decide if a statement is an assignment or procedure call }
procedure AssignOrProc;
var
  Name : char;
begin
  Name := GetName;
  case TypeOf(Name) of
    ' ' : Undefined(Name);
    'v' : Assignment(Name);
    'p' : CallProc(Name);
  else
    Abort('Identifier ' + Name +
          ' Cannot Be Used Here');
  end;
end;

{ Parse and translate a block of statements }
procedure DoBlock;
begin
  while not(Look in ['e']) do
    begin
      AssignOrProc;
      Fin;
    end;
end;               
 
As you can see, procedure Block now calls AssignOrProc instead of Assignment. The function of this new procedure is to simply read the identifier, determine its type, and then call whichever procedure is appropriate for that type. Since the name has already been read, we must pass it to the two procedures, and modify Assignment to match. Procedure CallProc is a simple code generation routine:
{ Call a procedure }
procedure CallProc(N : char);
begin
  EmitLn('CALL @' + N);
end;        
      

Well, at this point we have a compiler that can deal with procedures. It's worth noting that procedures can call procedures to any depth. So even though we don't allow nested declarations, there is certainly nothing to keep us from nesting calls, just as we would expect to do in any language. We're getting there, and it wasn't too hard, was it?

Of course, so far we can only deal with procedures that have no parameters. The procedures can only operate on the global variables by their global names. So at this point we have the equivalent of BASIC's GOSUB construct. Not too bad ... after all lots of serious programs were written using GOSUBs, but we can do better, and we will. That's the next step.

Save the program as Procs04 in readiness for parsing parameters.

Passing Parameters

Again, we all know the basic idea of passed parameters, but let's review them just to be safe.

In general the procedure is given a parameter list, for example

PROCEDURE FOO(X, Y, Z)  

In the declaration of a procedure, the parameters are called formal parameters, and may be referred to in the body of the procedure by those names. The names used for the formal parameters are really arbitrary. Only the position really counts. In the example above, the name 'X' simply means "the first parameter" wherever it is used.

When a procedure is called, the "actual parameters" passed to it are associated with the formal parameters, on a one-for-one basis.

The BNF for the syntax looks something like this:

     <procedure> ::= PROCEDURE <ident>
                    '(' <param-list> ')' <begin-block>

     <param_list> ::= <parameter> ( ',' <parameter> )* | null

Similarly, the procedure call looks like:

     <proc call> ::= <ident> '(' <param-list> ')'  

Note that there is already an implicit decision built into this syntax. Some languages, such as Pascal and Ada, permit parameter lists to be optional. If there are no parameters, you simply leave off the parentheses completely. Other languages, like C and Modula 2, require the parentheses even if the list is empty. Clearly, the example we just finished corresponds to the former point of view. But to tell the truth I prefer the latter. For procedures alone, the decision would seem to favour the "listless" approach.

The statement
Initialize;,  
standing alone, can only mean a procedure call. In the parsers we've been writing, we've made heavy use of parameterless procedures, and it would seem a shame to have to write an empty pair of parentheses for each case.

But later on we're going to be using functions, too. And since functions can appear in the same places as simple scalar identifiers, you can't tell the difference between the two. You have to go back to the declarations to find out. Some folks consider this to be an advantage. Their argument is that an identifier gets replaced by a value, and what do you care whether it's done by substitution or by a function? But we sometimes do care, because the function may be quite time-consuming. If, by writing a simple identifier into a given expression, we can incur a heavy run-time penalty, it seems to me we ought to be made aware of it.

Anyway, Niklaus Wirth designed both Pascal and Modula 2. I'll give him the benefit of the doubt and assume that he had a good reason for changing the rules the second time around!

Needless to say, it's an easy thing to accommodate either point of view as we design a language, so this one is strictly a matter of personal preference. Do it whichever way you like best.

Before we go any further, let's alter the translator to handle a (possibly empty) parameter list. For now we won't generate any extra code ... just parse the syntax. The code for processing the declaration has very much the same form we've seen before when dealing with VAR-lists:

{ Process the formal parameter list of a procedure }
procedure FormalList;
begin
  Match('(');
  if Look <> ')' then
    begin
      FormalParam;
      while Look = ',' do
        begin
          Match(',');
          FormalParam;
        end;
    end;
  Match(')');
end;        
      

Procedure DoProc needs to have a line added to call FormalList:

{ Parse and translate a procedure declaration }
procedure DoProc;
var
  N : char;
begin
  Match('p');
  N := GetName;
  FormalList;
  Fin;
  if InTable(N) then
    Duplicate(N);
  ST[N] := 'p';
  PostLabel(N);
  BeginBlock;
  Return;
end; 

For now, the code for FormalParam is just a dummy one that simply skips the parameter name:

{ Process a formal parameter }
procedure FormalParam;
var 
  Name : char;
begin
  Name := GetName;
end;  

For the actual procedure call, there must be similar code to process the actual parameter list:

{ Process an actual parameter }
procedure Param;
var 
  Name : char;
begin
  Name := GetName;
end;

{ Process the parameter list for a procedure call }
procedure ParamList;
begin
  Match('(');
  if Look <> ')' then 
    begin
      Param;
      while Look = ',' do 
        begin
          Match(',');
          Param;
        end;
    end;
  Match(')');
end;

{ Call a procedure }
procedure Call(N : char);
begin
  EmitLn('CALL @' + N);
end;

{ Process a procedure call }
procedure CallProc(Name : char);
begin
  ParamList;
  Call(Name);
end;  

Note here that CallProc is no longer just a simple code generation routine. It has some structure to it. To handle this, I've renamed the code generation routine CallProc to just Call, and called it from within the new CallProc.

OK, if you'll add all this code to your translator and try it out, you'll find that you can indeed parse the syntax properly. I'll note in passing that there is no checking to make sure that the number (and, later, types) of formal and actual parameters match up. In a production compiler, we must of course do this. We'll ignore the issue now if for no other reason than that the structure of our symbol table doesn't currently give us a place to store the necessary information. Later on, we'll have a place for that data and we can deal with the issue then.

Save the program as Procs05 to prepare for passing by value.

The Semantics of Parameters

So far we've dealt with the syntax of parameter passing, and we've got the parsing mechanisms in place to handle it. Next, we have to look at the semantics, i.e., the actions to be taken when we encounter parameters. This brings us square up against the issue of the different ways parameters can be passed.

There is more than one way to pass a parameter, and the way we do it can have a profound effect on the character of the language. So this is another of those areas where I can't just give you my solution. Rather, it's important that we spend some time looking at the alternatives so that you can go another route if you choose to.

There are two main ways parameters are passed:

  • By value
  • By reference (address)

The differences are best seen in the light of a little history.

The old FORTRAN compilers passed all parameters by reference. In other words, what was actually passed was the address of the parameter. This meant that the called subroutine was free to either read or write that parameter, as often as it chose to, just as though it were a global variable. This was actually quite an efficient way to do things, and it was pretty simple since the same mechanism was used in all cases, with one exception that I'll get to shortly.

There were problems, though. Many people felt that this method created entirely too much coupling between the called subroutine and its caller. In effect, it gave the subroutine complete access to all variables that appeared in the parameter list.

Many times, we didn't want to actually change a parameter, but only use it as an input. For example, we might pass an element count to a subroutine, and wish we could then use that count within a DO-loop. To avoid changing the value in the calling program, we had to make a local copy of the input parameter, and operate only on the copy. Some FORTRAN programmers, in fact, made it a practice to copy all parameters except those that were to be used as return values. Needless to say, all this copying defeated a good bit of the efficiency associated with the approach.

There was, however, an even more insidious problem, which was not really just the fault of the "pass by reference" convention, but a bad convergence of several implementation decisions.

Suppose we have a subroutine:

      SUBROUTINE FOO(X, Y, N)        
      

where N is some kind of input count or flag. Many times, we'd like to be able to pass a literal or even an expression in place of a variable, such as:

      CALL FOO(A, B, J + 1)       
     

Here the third parameter is not a variable, and so it has no address. The earliest FORTRAN compilers did not allow such things, so we had to resort to subterfuges like:

      K = J + 1
      CALL FOO(A, B, K)  

Here again, there was copying required, and the burden was on the programmer to do it. Not good.

Later FORTRAN implementations got rid of this by allowing expressions as parameters. What they did was to assign a compiler-generated variable, store the value of the expression in the variable, and then pass the address of the expression.

So far, so good. Even if the subroutine mistakenly altered the anonymous variable, who was to know or care? On the next call, it would be recalculated anyway.

The problem arose when someone decided to make things more efficient. They reasoned, rightly enough, that the most common kind of "expression" was a single integer value, as in:
      CALL FOO(A, B, 4)  

It seemed inefficient to go to the trouble of "computing" such an integer and storing it in a temporary variable, just to pass it through the calling list. Since we had to pass the address of the thing anyway, it seemed to make lots of sense to just pass the address of the literal integer, 4 in the example above.

To make matters more interesting, most compilers, then and now, identify all literals and store them separately in a "literal pool", so that we only have to store one value for each unique literal. That combination of design decisions: passing expressions, optimization for literals as a special case, and use of a literal pool, is what led to disaster.

To see how it works, imagine that we call subroutine FOO as in the example above, passing it a literal 4. Actually, what gets passed is the address of the literal 4, which is stored in the literal pool. This address corresponds to the formal parameter, K, in the subroutine itself.

Now suppose that, unbeknownst to the programmer, subroutine FOO actually modifies K to be, say, -7. Suddenly, that literal 4 in the literal pool gets changed, to a -7. From then on, every expression that uses a 4 and every subroutine that passes a 4 will be using the value of -7 instead! Needless to say, this can lead to some bizarre and difficult-to-find behaviour. The whole thing gave the concept of pass-by-reference a bad name, although as we have seen, it was really a combination of design decisions that led to the problem.

In spite of the problem, the FORTRAN approach had its good points. Chief among them is the fact that we don't have to support multiple mechanisms. The same scheme, passing the address of the argument, works for every case, including arrays. So the size of the compiler can be reduced.

Partly because of the FORTRAN gotcha, and partly just because of the reduced coupling involved, modern languages like C, Pascal, Ada, and Modula 2 generally pass scalars by value.

This means that the value of the scalar is copied into a separate value used only for the call. Since the value passed is a copy, the called procedure can use it as a local variable and modify it any way it likes. The value in the caller will not be changed.

It may seem at first that this is a bit inefficient, because of the need to copy the parameter. But remember that we're going to have to fetch some value to pass anyway, whether it be the parameter itself or an address for it. Inside the subroutine, using pass-by-value is definitely more efficient, since we eliminate one level of indirection. Finally, we saw earlier that with FORTRAN, it was often necessary to make copies within the subroutine anyway, so pass-by-value reduces the number of local variables. All in all, pass-by-value is better.

Except for one small little detail: if all parameters are passed by value, there is no way for a called to procedure to return a result to its caller! The parameter passed is not altered in the caller, only in the called procedure. Clearly, that won't get the job done.

There have been two answers to this problem, which are equivalent. In Pascal, Wirth provides for VAR parameters, which are passed-by-reference. What a VAR parameter is, in fact, is none other than our old friend the FORTRAN parameter, with a new name and paint job for disguise. Wirth neatly gets around the "changing a literal" problem as well as the "address of an expression" problem, by the simple expedient of allowing only a variable to be the actual parameter. In other words, it's the same restriction that the earliest FORTRANs imposed.

C does the same thing, but explicitly. In C, all parameters are passed by value. One kind of variable that C supports, however, is the pointer. So by passing a pointer by value, you in effect pass what it points to by reference. In some ways this works even better yet, because even though you can change the variable pointed to all you like, you still can't change the pointer itself. In a function such as strcpy, for example, where the pointers are incremented as the string is copied, we are really only incrementing copies of the pointers, so the values of those pointers in the calling procedure still remain as they were. To modify a pointer, you must pass a pointer to the pointer.

Since we are simply performing experiments here, we'll look at both pass-by-value and pass-by-reference. That way, we'll be able to use either one as we need to. It's worth mentioning that it's going to be tough to use the C approach to pointers here, since a pointer is a different type and we haven't studied types yet!

Pass-by-Value

Let's just try some simple-minded things and see where they lead us. Let's begin with the pass-by-value case. Consider the procedure call:

      FOO(X, Y)  
Almost the only reasonable way to pass the data is through the CPU stack. So the code we'd like to see generated might look something like this:
      PUSH X   
      PUSH Y 
      CALL FOO   

That certainly doesn't seem too complex!

When the CALL is executed, the CPU pushes the return address onto the stack and jumps to FOO. At this point the stack will look like this:

      .
      .
      Value of X (4 bytes)
      Value of Y (4 bytes)
      ESP -->  Return Address (4 bytes)

So the values of the parameters have addresses that are fixed offsets from the stack pointer. In this example, the addresses are:

      X : offset 8 from ESP
      Y : offset 4 from ESP

Now consider what the called procedure might look like:

      PROCEDURE FOO(A, B)
      BEGIN
        A = B
      END  

(Remember, the names of the formal parameters are arbitrary ... only the positions count.)

The desired output code might look like:

 FOO: MOV EAX, [ESP + 4]
      MOV [ESP + 8], EAX
      RET  

Note that, in order to address the formal parameters, we're going to have to know which position they have in the parameter list. This means some changes to the symbol table stuff. In fact, for our single-character case it's best to just create a new symbol table for the formal parameters.

Let's begin by declaring a new table:

var 
  Params : array['A' .. 'Z'] of integer;  

We also will need to keep track of how many parameters a given procedure has:

var 
  NumParams : integer;  

And we need to initialize the new table. Now, remember that the formal parameter list will be different for each procedure that we process, so we'll need to initialize that table anew for each procedure. Here's the initializer:

{ Initialize parameter table to null }      
procedure ClearParams;
var 
  i : char;
begin
  for i := 'A' to 'Z' do
    Params[i] := 0;
  NumParams := 0;
end;  
We'll put a call to this procedure in Init, and also at the end of DoProc:
{ Initialize }
procedure Init;
var
  i : char;
begin
  GetChar;
  SkipWhite;
  for i := 'A' to 'Z' do
    ST[i] := ' ';
  ClearParams;
end;

{ Parse and translate a procedure declaration }
procedure DoProc;
var
  N : char;
begin
  Match('p');
  N := GetName;
  FormalList;
  Fin;
  if InTable(N) then
    Duplicate(N);
  ST[N] := 'p';
  PostLabel(N);
  BeginBlock;
  Return;
  ClearParams;
end;            

Note that the call within DoProc ensures that the table will be clear when we're in the main program.

OK, now we need a few procedures to work with the table. The next few functions are essentially copies of InTable, TypeOf, etc.:

{ Find the parameter number }
function ParamNumber(N : char) : integer;
begin
  ParamNumber := Params[N];
end;

{ See if an identifier is a parameter }
function IsParam(N : char) : boolean;
begin
  IsParam := Params[N] <> 0;
end;

{ Add a new parameter to table }
procedure AddParam(Name : char);
begin
  if IsParam(Name) then
    Duplicate(Name);
  Inc(NumParams);
  Params[Name] := NumParams;
end;  
 

Finally, we need some code generation routines:

{ Load a parameter to the primary register }
procedure LoadParam(N : integer);
var
  Offset : integer;
begin
  Offset := 4 + 4 * (NumParams - N);
  EmitLn('MOV EAX, [ESP + ' + IntToStr(Offset) + ']');
end;

{ Store a parameter from the primary register }
procedure StoreParam(N : integer);
var
  Offset : integer;
begin
  Offset := 4 + 4 * (NumParams - N);
  EmitLn('[ESP + ' + IntToStr(Offset) + '], EAX');
end;

{ Push the primary register to the stack }
procedure Push;
begin
  EmitLn('PUSH EAX');
end;  

( The last routine is one we've seen before, but it wasn't in this vestigial version of the program.)

With those preliminaries in place, we're ready to deal with the semantics of procedures with calling lists (remember, the code to deal with the syntax is already in place).

Let's begin by processing a formal parameter. All we have to do is to add each parameter to the parameter symbol table:

{ Process a formal parameter }
procedure FormalParam;
begin
  AddParam(GetName);
end;  
 
Now, what about dealing with a formal parameter when it appears in the body of the procedure? That takes a little more work. We must first determine that it is a formal parameter. To do this, I've written a modified version of TypeOf:
{ Get type of symbol }
function TypeOf(n : char) : char;
begin
  if IsParam(n) then
    TypeOf := 'f'
  else
    TypeOf := ST[n];
end;   
 

(Note that, since TypeOf now calls IsParam, it may need to be relocated in your source. PPS: You can use forward declarations of procedures at least temporarily if you have trouble positioning all of the new procedures in an acceptable order. )

We also must modify AssignOrProc to deal with this new type:

{ Decide if a statement is an assignment or procedure call }
procedure AssignOrProc;
var
  Name : char;
begin
  Name := GetName;
  case TypeOf(Name) of
    ' ' : Undefined(Name);
    'v', 'f' : Assignment(Name);
    'p' : CallProc(Name);
  else
    Abort('Identifier ' + Name +  ' cannot be used here');
  end;
end;   

Finally, the code to process an assignment statement and an expression must be extended:

{ Parse and translate an expression 
  vestigial version }
procedure Expression;
var
  Name : char;
begin
  Name := GetName;
  if IsParam(Name) then
    LoadParam(ParamNumber(Name))
  else
    LoadVar(Name);
end;

{ Parse and translate an assignment statement }
procedure Assignment(Name : char);
begin
  Match('=');
  Expression;
  if IsParam(Name) then
    StoreParam(ParamNumber(Name))
  else
    StoreVar(Name);
end;  

As you can see, these procedures will treat every variable name encountered as either a formal parameter or a global variable, depending on whether or not it appears in the parameter symbol table. Remember that we are using only a vestigial form of Expression. In the final program, the change shown here will have to be added to Factor, not Expression.

The rest is easy. We need only add the semantics to the actual procedure call, which we can do with one new line of code:

{ Process an actual parameter }
procedure Param;
begin
  Expression;
  Push;
end;  

That's it. Add these changes to your program and give it a try. Try declaring one or two procedures, each with a formal parameter list. Then do some assignments, using combinations of global and formal parameters. You can call one procedure from within another, but you cannot declare a nested procedure. You can even pass formal parameters from one procedure to another. If we had the full syntax of the language here, you'd also be able to do things like read or write formal parameters or use them in complicated expressions.

Save this as Procs06 to prepare for making safe use of the stack.

What's Wrong?

At this point, you might be thinking: Surely there's more to this than a few pushes and pops. There must be more to passing parameters than this.

You'd be right. As a matter of fact, the code that we're generating here leaves a lot to be desired in several respects.

The most glaring oversight is that it's wrong! If you'll look back at the code for a procedure call, you'll see that the caller pushes each actual parameter onto the stack before it calls the procedure. The procedure uses that information, but it doesn't change the stack pointer. That means that the stuff is still there when we return. Somebody needs to clean up the stack, or we'll soon be in very hot water!

Fortunately, that's easily fixed. All we have to do is to increment the stack pointer when we're finished.

Should we do that in the calling program, or the called procedure? Some folks let the called procedure clean up the stack, since that requires less code to be generated per call, and since the procedure, after all, knows how many parameters it's got. But that means that it must do something with the return address so as not to lose it.

I prefer letting the caller clean up, so that the callee need only execute a return. Also, it seems a bit more balanced, since the caller is the one who "messed up" the stack in the first place. But that means that the caller must remember how many items it pushed. To make things easy, I've modified the procedure ParamList to be a function instead of a procedure, returning the number of bytes pushed:

{ Process the parameter list for a procedure  call }
function ParamList : integer;
var 
  N : integer;
begin
  N := 0;
  Match('(');
  if Look <> ')' then
    begin
      Param;
      inc(N);
      while Look = ',' do 
        begin
          Match(',');
          Param;
          inc(N);
        end;
    end;
  Match(')');
  ParamList := 4 * N;  
end;        
      

Procedure CallProc then uses this to clean up the stack:

{ Process a procedure call }
procedure CallProc(Name : char);
var 
  N : integer;
begin
  N := ParamList;
  Call(Name);
  CleanStack(N);
end;  

Here I've created yet another code generation procedure:

{ Adjust the stack pointer upwards by N bytes }
procedure CleanStack(N : integer);
begin
  if N > 0 then 
    EmitLn('ADD ESP, ', N)); 
end;  

OK, if you'll add this code to your compiler, I think you'll find that the stack is now under control.

The next problem has to do with our way of addressing relative to the stack pointer. That works fine in our simple examples, since with our rudimentary form of expressions nobody else is messing with the stack. But consider a different example as simple as:

     PROCEDURE FOO(A, B)
     BEGIN
       A = A + B
     END  

PPS: The following discussion is tied closely to the 68k processor and two byte integers. We note the possible pitfalls and hope to avoid them with our routines for Intel processors.

The code generated by a simple-minded parser might be:

     FOO: MOVE 6(SP),D0       ; Fetch A
          MOVE D0,-(SP)       ; Push it
          MOVE 4(SP),D0       ; Fetch B
          ADD (SP)+,D0        ; Add A
          MOVE D0,6(SP)       : Store A
          RTS

This would be wrong. When we push the first argument onto the stack, the offsets for the two formal parameters are no longer 4 and 6, but are 6 and 8. So the second fetch would fetch A again, not B.

This is not the end of the world. I think you can see that all we really have to do is to alter the offset every time we do a push, and that in fact is what's done if the CPU has no support for other methods.

Fortunately, though, the 68000 does have such support. Recognizing that this CPU would be used a lot with high-order language compilers, Motorola decided to add direct support for this kind of thing.

The problem, as you can see, is that as the procedure executes, the stack pointer bounces up and down, and so it becomes an awkward thing to use as a reference to access the formal parameters. The solution is to define some other register, and use it instead. This register is typically set equal to the original stack pointer, and is called the frame pointer.

The 68000 instruction set LINK lets you declare such a frame pointer, and sets it equal to the stack pointer, all in one instruction. As a matter of fact, it does even more than that. Since this register may have been in use for something else in the calling procedure, LINK also pushes the current value of that register onto the stack. It can also add a value to the stack pointer, to make room for local variables.

The complement of LINK is UNLK, which simply restores the stack pointer and pops the old value back into the register.

Using these two instructions, the code for the previous example becomes:

     FOO: LINK A6,#0
          MOVE 10(A6),D0      ; Fetch A
          MOVE D0,-(SP)       ; Push it
          MOVE 8(A6),D0       ; Fetch B
          ADD (SP)+,D0        ; Add A
          MOVE D0,10(A6)      : Store A
          UNLK A6
          RTS        
      

Fixing the compiler to generate this code is a lot easier than it is to explain it. All we need to do is to modify the code generation created by DoProc. Since that makes the code a little more than one line, I've created new procedures to deal with it, paralleling the Prolog and Epilog procedures called by DoMain:

PPS: From the Pentium III manual instructions, the ENTER and LEAVE instructions allow use of the EBP register to handle frames as follows.

"The ENTER and companion LEAVE instructions are provided to support block structured languages. The ENTER instruction (when used) is typically the first instruction in a procedure and is used to set up a new stack frame for a procedure. The LEAVE instruction is then used at the end of the procedure (just before the RET instruction) to release the stack frame.

If the nesting level is 0, the processor pushes the frame pointer from the EBP register onto the stack, copies the current stack pointer from the ESP register into the EBP register, and loads the ESP register with the current stack-pointer value minus the value in the size operand."

These worked for us in Delphi in-line assembler but not in Lazarus. We can, however, use PUSH EBP followed by MOV EBP, ESP instead of ENTER and POP EBP instead of LEAVE.

{ Write the prolog for a procedure }
procedure ProcProlog(N : char);
begin
  PostLabel(N);
  EmitLn('PUSH EBP');
  EmitLn('MOV EBP, ESP');
end;

{ Write the epilog for a procedure }
procedure ProcEpilog;
begin
  EmitLn('POP EBP');
  EmitLn('RET');
end;       
      

Procedure DoProc now just calls these:

procedure DoProc;
var
  N : char;
begin
  Match('p');
  N := GetName;
  FormalList;
  Fin;
  if InTable(N) then
    Duplicate(N);
  ST[N] := 'p';
  ProcProlog(N);
  BeginBlock;
  ProcEpilog;
  ClearParams;
end; 
 
Finally, we need to change the references to ESP in procedures LoadParam and StoreParam:
{ Load a parameter to the primary register }
procedure LoadParam(N : integer);
var
  Offset : integer;
begin
  Offset := 8 + 4 * (NumParams - N);
  EmitLn('MOV EAX, [EBP + ' + IntToStr(Offset) + ']');
end;

{ Store a parameter from the primary register }
procedure StoreParam(N : integer);
var
  Offset : integer;
begin
  Offset := 8 + 4 * (NumParams - N);
  EmitLn('MOV [EBP + ' + IntToStr(Offset) + '], EAX');
end;             
 

(Note that the Offset computation changes to allow for the extra push of EBP.)

That's all it takes. Try this out and see how you like it.

At this point we are generating some relatively nice code for procedures and procedure calls. Within the limitation that there are no local variables (yet) and that no procedure nesting is allowed, this code is just what we need.

There is still just one little small problem remaining:

      We have no way to return results to the caller!

But that, of course, is not a limitation of the code we're generating, but one inherent in the call-by-value protocol. Notice that we can use formal parameters in any way inside the procedure. We can calculate new values for them, use them as loop counters (if we had loops, that is!), etc. So the code is doing what it's supposed to. To get over this last problem, we need to look at the alternative protocol.

Call-by-Reference

This one is easy, now that we have the mechanisms already in place. We only have to make a few changes to the code generation. Instead of pushing a value onto the stack, we must push an address. We can load an effective address with LEA and then PUSH.

We'll be making a new version of the test program for this. Before we do anything else, retain Procs06 because we'll be needing it again later, and save as Procs07 for calling by reference.

Let's begin by looking at the code we'd like to see generated for the new case. Using the same example as before, we need the call

      FOO(X, Y)  

to be translated to:

      LEA EAX, X
      PUSH EAX            ; Push the address of X
      LEA EAX, Y
      PUSH EAX            ; Push Y the address of Y
      CALL FOO            ; Call FOO  

That's a simple matter of a slight change to Param:

{ Process an actual parameter }
procedure Param;
begin
  EmitLn('LEA EAX, ' + GetName);
  EmitLn('PUSH EAX');
end;  

(Note that with pass-by-reference, we can't have expressions in the calling list, so Param can just read the name directly.)

At the other end, the references to the formal parameters must be given one level of indirection:

     FOO: ENTER 0, 0
          MOV EDX, EBP + 12   ; Fetch the address of A
          MOV EAX, [EDX]      ; Fetch A
          PUSH EAX            ; Push it
          MOV EDX, EBP + 8    ; Fetch the address of B
          MOV EAX, [EDX]      ; Fetch B
          POP EDX
          ADD EAX, EDX        ; Add A
          MOV EDX, EBP + 12   ; Fetch the address of A
          MOV [EDX], EAX      ; Store A
          LEAVE
          RET  
All of this can be handled by changes to LoadParam and StoreParam:
{ Load a parameter to the primary register }
procedure LoadParam(N : integer);
var 
  Offset : integer;
begin
  Offset := 8 + 4 * (NumParams - N);
  EmitLn('MOV EDX, [EBP + ' + IntToStr(Offset) + ']');
  EmitLn('MOV EAX [EDX]');
end;

{ Store a parameter from the primary register }
procedure StoreParam(N : integer);
var 
  Offset : integer;
begin
  Offset := 8 + 4 * (NumParams - N);
  EmitLn('MOV EDX, [EBP + ' + IntToStr(Offset) + ']');
  EmitLn('MOV [EDX], EAX');
end;  

That should do it. Give it a try and see if it's generating reasonable-looking code. As you will see, the code is hardly optimal, since we reload the address register every time a parameter is needed. But that's consistent with our KISS approach here, of just being sure to generate code that works. We'll just make a little note here, that here's yet another candidate for optimization, and press on.

Now we've learned to process parameters using pass-by-value and pass-by-reference. In the real world, of course, we'd like to be able to deal with both methods. We can't do that yet, though, because we have not yet had a session on types, and that has to come first.

If we can only have one method, then of course it has to be the good ol' FORTRAN method of pass-by-reference, since that's the only way procedures can ever return values to their caller.

This, in fact, will be one of the differences between TINY and KISS. In the next version of TINY, we'll use pass-by-reference for all parameters. KISS will support both methods.

PPS: This is the only instalment that uses parameters.

Local Variables

So far, we've said nothing about local variables, and our definition of procedures doesn't allow for them. Needless to say, that's a big gap in our language, and one that needs to be corrected.

Here again we are faced with a choice: static or dynamic storage?

In those old FORTRAN programs, local variables were given static storage just like global ones. That is, each local variable got a name and allocated address, like any other variable, and was referenced by that name.

That's easy for us to do, using the allocation mechanisms already in place. Remember, though, that local variables can have the same names as global ones. We need to somehow deal with that by assigning unique names for these variables.

The characteristic of static storage, of course, is that the data survives a procedure call and return. When the procedure is called again, the data will still be there. That can be an advantage in some applications. In the FORTRAN days we used to do tricks like initialize a flag, so that you could tell when you were entering a procedure for the first time and could do any one-time initialization that needed to be done.

Of course, the same "feature" is also what makes recursion impossible with static storage. Any new call to a procedure will overwrite the data already in the local variables.

The alternative is dynamic storage, in which storage is allocated on the stack just as for passed parameters. We also have the mechanisms already for doing this. In fact, the same routines that deal with passed (by value) parameters on the stack can easily deal with local variables as well ... the code to be generated is the same. The purpose of the offset in the ENTER instruction is there just for that reason: we can use it to adjust the stack pointer to make room for locals. Dynamic storage, of course, inherently supports recursion.

When I first began planning TINY, I must admit to being prejudiced in favour of static storage. That's simply because those old FORTRAN programs were pretty darned efficient ... the early FORTRAN compilers produced a quality of code that's still rarely matched by modern compilers. Even today, a given program written in FORTRAN is likely to outperform the same program written in C or Pascal, sometimes by wide margins. (Whew! Am I going to hear about that statement!)

I've always supposed that the reason had to do with the two main differences between FORTRAN implementations and the others: static storage and pass-by-reference. I know that dynamic storage supports recursion, but it's always seemed to me a bit peculiar to be willing to accept slower code in the 95% of cases that don't need recursion, just to get that feature when you need it. The idea is that, with static storage, you can use absolute addressing rather than indirect addressing, which should result in faster code.

More recently, though, several folks have pointed out to me that there really is no performance penalty associated with dynamic storage. For example, you shouldn't use absolute addressing anyway ... most operating systems require position independent code. And the 68000 instruction
        MOVE 8(A6),D0        
      
has exactly the same timing as
        MOVE X(PC),D0.     
   

So I'm convinced, now, that there is no good reason not to use dynamic storage.

Since this use of local variables fits so well into the scheme of pass-by-value parameters, we'll use that version of the translator to illustrate it (Procs06 saved as Procs08). I sure hope you kept a copy!

The general idea is to keep track of how many local parameters there are. Then we use the integer in the ENTER instruction to adjust the stack pointer downward to make room for them. Formal parameters are addressed as positive offsets from the frame pointer, and locals as negative offsets. With a little bit of work, the same procedures we've already created can take care of the whole thing.

Let's start by creating a new variable, Base:

We'll use this variable, instead of NumParams, to compute stack offsets. That means changing the two references to NumParams in LoadParam and StoreParam:

{ Load a parameter to the primary register }
procedure LoadParam(N : integer);
var
  Offset : integer;
begin
  Offset := 8 + 4 * (Base - N);
  EmitLn('MOV EAX, [EBP + ' + IntToStr(Offset) + ']');
end;

{ Store a parameter from the primary register }
procedure StoreParam(N : integer);
var
  Offset : integer;
begin
  Offset := 8 + 4 * (Base - N);
  EmitLn('MOV [EBP + ' + IntToStr(Offset) + '], EAX');
end;                                                             

The idea is that the value of Base will be frozen after we have processed the formal parameters, and won't increase further as the new, local variables, are inserted in the symbol table. This is taken care of at the end of FormalList:

{ Process the formal parameter list of a procedure }
procedure FormalList;
begin
  Match('(');
  if Look <> ')' then
    begin
      FormalParam;
      while Look = ',' do
        begin
          Match(',');
          FormalParam;
        end;
    end;
  Match(')');
  Fin;
  Base := NumParams;
  NumParams := NumParams + 2;
end;      
 

(We add space to make allowances for the return address and old frame pointer, which end up between the formal parameters and the locals.)

About all we need to do next is to install the semantics for declaring local variables into the parser. The routines are very similar to Decl and TopDecls:

{ Parse and translate a local data declaration }
procedure LocDecl;
var 
  Name : char;
begin
  Match('v');
  AddParam(GetName);
  Fin;
end;

{ Parse and translate local declarations }
function LocDecls : integer;
var 
  n : integer;
begin
  n := 0;
  while Look = 'v' do 
    begin
      LocDecl;
      inc(n);
    end;
  LocDecls := n;
end;        
      

Note that LocDecls is a function, returning the number of locals to DoProc.

Next, we modify DoProc to use this information:

{ Parse and translate a procedure declaration }
procedure DoProc;
var
  N : char;
  k : integer;
begin
  Match('p');
  N := GetName;
  if InTable(N) then
    Duplicate(N);
  ST[N] := 'p';
  FormalList;
  k := LocDecls;
  ProcProlog(N, k);
  BeginBlock;
  ProcEpilog;
  ClearParams;
end;         
      

(I've made a couple of changes here that weren't really necessary. Aside from rearranging things a bit, I moved the call to Fin to within FormalList, and placed one inside LocDecls as well. Don't forget to put one at the end of FormalList, so that we're together here.)

Note the change in the call to ProcProlog. The new argument is the number of words (not bytes) to allocate space for.

PPS: The word size is usually the size of the main registers of the computer - 2 bytes for the 68k and 4 bytes for 32-bit machines. However, the word type in Delphi has 16 bits, so be careful!

We simulate the ENTER and LEAVE instructions so that the generated code should work in both Lazarus and Delphi. For the ENTER instruction, if the nesting level is 0 as in our usage, the processor pushes EBP, copies the current stack pointer from ESP to EBP, then subtracts from ESP the value in the size operand. The LEAVE instruction sets ESP to EBP then pops EBP.

Here are the new versions of ProcProlog and ProcEpilog:
{ Write the prolog for a procedure }
procedure ProcProlog(N : char; k : integer);
begin
  PostLabel(N);
  EmitLn('PUSH EBP');
  EmitLn('MOV EBP, ESP');
  EmitLn('SUB ESP, ' + inttostr(4 * k));
end;

{ Write the epilog for a procedure }
procedure ProcEpilog;
begin
 EmitLn('MOV ESP, EBP');
 EmitLn('POP EBP');
 EmitLn('RET');
end;    

That should do it. Add these changes and see how they work.

Conclusion

At this point you know how to compile procedure declarations and procedure calls, with parameters passed by reference and by value. You can also handle local variables. As you can see, the hard part is not in providing the mechanisms, but in deciding just which mechanisms to use. Once we make these decisions, the code to translate the constructs is really not that difficult. I didn't show you how to deal with the combination of local parameters and pass-by-reference parameters, but that's a straightforward extension to what you've already seen. It just gets a little more messy, that's all, since we need to support both mechanisms instead of just one at a time. I'd prefer to save that one until after we've dealt with ways to handle different variable types.

That will be the next instalment, which will be coming soon to a Forum near you. See you then.

*****************************************************************
*                                                               *
*                        COPYRIGHT NOTICE                       *
*                                                               *
*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *
*                                                               *
*****************************************************************
Programming - a skill for life!

PPS introduction to Let's build a compiler! by Jack Crenshaw