By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,865 Members | 1,717 Online
Bytes IT Community
Submit an Article
Got Smarts?
Share your bits of IT knowledge by writing an article on Bytes.

Compilers - 7: Instructions

Expert 10K+
P: 11,448


This part of the article is one week late; I apologize for that; my excuse is:
bizzy, bizzy, bizzy; I attended a nice course and I had to lecture a bit and
there simply was no time left for writing.

we've come a long way; the previous articles discussed the following aspects
of compilers:

1: a general introduction outlining the overall structure of compilers and
the languages they compile;

2: tokenizers (or lexical analyzers) that take care of transforming a character
input stream to a token input stream;

3: formal grammars that describe the syntax of a (programming) language and
form a guideline for the implementation of language parsers;

4: all the necessary bookkeeping needed by several parts of the compiler; the
tables define functions, operators and other predefined symbols;

5: parsers that take care whether or not a token stream obeys a certain syntax.
Parsers closely cooperate with the code generator part of the compiler;

6: interpreters that execute or evaluate the generated instruction sequence.


The last part of this article discusses the instructions themselves. Every
instruction implements this interface:

Expand|Select|Wrap|Line Numbers
  1. public interface Instruction extends Serializable {
  3.     public void execute(Interpreter interpreter) throws InterpreterException;
  4.     public void initialize(String token, int arity);
  5. }
The second method is only used by the code generator when the instruction is
constructed. The first method is invoked by the interpreter. That's all the
interpreter is interested in: to invoke every instruction in a sequence of

Here's the top part of the Instruction class hierarchy:

Expand|Select|Wrap|Line Numbers
  1. Instruction
  2.     |
  3.     +-------+- AbstractInstruction
  4.         |
  5.         +-------+- AbstractFunctionInstruction
  6.         |    |
  7.         |    +----------- FunctionInstruction
  8.         |    |
  9.         |    +----------- UserInstruction
  10.         |
  11.         +--------- QuoteInstruction
The AbstractInstruction class implements the name and arity of a particular
instruction. The arity of the instruction equals the number of arguments on
the stack taken by that particular instruction; e.g. the arity of the '+'
operator equals 2 and the name equals "+".

Below the AbstractInstruction there are two subtypes of instructions: the
AbstractFunctionInstruction and the QuoteInstruction. The first subclass
implements ordinary built-in functions as well as user defined functions.

The QuoteInstruction implements special function type instruction. An example
of such an instruction is the IfInstruction: the arity equals 3 and its name
is "if". The allows us to type things like this:

Expand|Select|Wrap|Line Numbers
  1. if (x, y, z)
If expression 'x' is not equal to zero the value of the 'if' function equals
'y', otherwise the value equals 'z'. As you could have read in a previous
part of this article, the generated instructions for x, y and z are passed to
the IfInstruction as is, i.e. the 'if' instruction passed the sequence of
instructions that make up expression 'x' to the interpreter to be executed.
If the top element of the stack is non-zero, the instruction sequence for 'y'
is passed to the interpreter, othewise the instruction sequence for expression
'z' is passed to the interpreter.

The FunctionInstruction is the base class for ordinary built-in functions
such as 'sin', 'cos', 'log' etc. The UserInstruction handles all user defined


Operators are implemented as FunctionInstructions. There's nothing strange
about it; compare this:

Expand|Select|Wrap|Line Numbers
  1. add(x, y)
  2. x+y
It's just the notation in the source code that is a bit 'strange', i.e. the '+'
operator doubles as the 'add' function; both are the same.

A function pops its arguments from the stack and computes its value. For the '+'
operator the operands are added and the result is pushed back on the stack. For,
say, the 'sin' function, one operand is popped from the stack, the sine is
computed and the result is pushed back on the stack again.

Instead of implementing all that pushing and popping in every class the
AbstractFunctionInstruction takes care of it all. See the source code attached
with the previous part of this article.

The following functions are implemented:

+ - ! (unary operators)
< <= == != >= >
+ -
* /
sin cos tan
exp log
max min abs

All functions have well known semantics.

Stack functions

On top of those 'normal' functions, other functions are available that are able
to explicitly manipulate the data stack.

The following functions are implemented:

arg(n) : move the n-th element from the top to the top of the stack;
top(n) : copy the n-th element from the top to the top of the stack;
clear() : clear the entire stack;
depth() : push the number of stack elements on the stack.

You can play with these functions using the software available in the attachment
of the previous part of this article. If you ruin the stack no harm is done, i.e.
the interpreter signals an incorrect stack and throws an exception which will
be printed and execution simply stops. If you want to use the Interpreter class
in you own software you can print a message and start evaluating another sequence
of instructions. Feel free to play with it.


Our little language supports simple lists, e.g. the '+' function/operator can
be invoked on simple double operands as in, say "3+4" but one or both of its
operands can also be lists. If more than one operand is a list all lists have
to be 'simple' which means: one dimensional lists of the same length; here
are a few examples:

Expand|Select|Wrap|Line Numbers
  1. {3, 4} + 5         // {8, 9}
  2. {3, 4} + {5, 6}        // {8, 10}
  3. {3, 4} + {5, 6, 7}    // error
  4. {3, 4} + {5, {6, 7}}    // error
  5. min({3, 6}, {5, 4})    // {3, 4}
  6. max({3, 6}, 5)        // {5, 6}
The AbstractFunctionInstruction takes care of it all; if at least one of the
operands is a list then all lists that represent the operand(s) of the function
need to be simple and of the same length. The AbstractFunctionInstruction passes
the double operands to the function and collects the results. Finally it builds
a list out of the results again.

Some functions bypass this mechanism, i.e. they handle list operands themselve.
Remember the 'listfunc' reserved word? User functions defined using the 'listfunc'
reserved word are marked such that the mechanism described above is bypassed.
There are also a few built-in functions that handle lists:

append(x, y) // append y to x
aslist(n) // store the top n stack elements in a list
concat(x, y) // catenate x and y in a new list
head(x) // first element of list x
tail(x) // all but the first element of list x
join(x, y) // join the two lists x and y
size(x) // number of elements in list x

Again, feel free to play with those functions; it's fun. See if you can find what
append, concat and join really do and how they differ.

Quote instructions

One of the QuoteInstructions has been mentioned above already; there are three
of those instructions defined:

if(x, y, z) // if (x) then y else z
while(x, y) // 0 or the last result of y
generate(x, y) // while (x) append y to a list

Please do play a bit with the included software (see previous article part)
and if you feel adventurous use the already implemented classes as a blueprint
for your own functions. If won't be much of an adventure. Don't forget to alter
the .properties files accordingly.

Concluding remarks

Given all the software described in this large article it is a breeze to
implement, say, an expression evaluator. Just a few lines of code will do the
job. This interpreter can do much more than just that but I leave it to your
imagination to come up with great ideas that can put this little system to
good work. Even just printing out the generated instructions (try it!) can
be an educational experience: you'll see infix to postfix translation for free.

Here's that evaluator method:

Expand|Select|Wrap|Line Numbers
  1. // evaluate the expression, throws InterpreterExcetpion if anything's wrong
  2. public double evaluate(String expression) throws InterpreterException {
  4.     // parse the string
  5.     Code code= new Parser().parse(new StringReader(expression));
  7.     // do the evaluation and get the stack top
  8.     Value value= new Interpreter().executeSub(code);
  10.     // convert the value back to a double
  11.     return value.getDbl();
  12. }
I hope people have actually read this all. If you did: congratulations, you
must've learned quite a bit and I hope the included software can be of any
use for you.

The next tip will be about text processing and querying. I haven't actually
implemented anything yet but I'll come up with something.

kind regards,

Jul 4 '07 #1
Share this Article
Share on Google+