Introduction
This part of the article is one week late; I apologize for that; my excuse is:
bizzy, bizzy, bizzy; I attended a nice course and I had to lecture a bit and
there simply was no time left for writing.
we've come a long way; the previous articles discussed the following aspects
of compilers:
1: a general introduction outlining the overall structure of compilers and
the languages they compile;
2: tokenizers (or lexical analyzers) that take care of transforming a character
input stream to a token input stream;
3: formal grammars that describe the syntax of a (programming) language and
form a guideline for the implementation of language parsers;
4: all the necessary bookkeeping needed by several parts of the compiler; the
tables define functions, operators and other predefined symbols;
5: parsers that take care whether or not a token stream obeys a certain syntax.
Parsers closely cooperate with the code generator part of the compiler;
6: interpreters that execute or evaluate the generated instruction sequence.
Instructions
The last part of this article discusses the instructions themselves. Every
instruction implements this interface:
Expand|Select|Wrap|Line Numbers
- public interface Instruction extends Serializable {
- public void execute(Interpreter interpreter) throws InterpreterException;
- public void initialize(String token, int arity);
- }
constructed. The first method is invoked by the interpreter. That's all the
interpreter is interested in: to invoke every instruction in a sequence of
instructions.
Here's the top part of the Instruction class hierarchy:
Expand|Select|Wrap|Line Numbers
- Instruction
- |
- +-------+- AbstractInstruction
- |
- +-------+- AbstractFunctionInstruction
- | |
- | +----------- FunctionInstruction
- | |
- | +----------- UserInstruction
- |
- +--------- QuoteInstruction
instruction. The arity of the instruction equals the number of arguments on
the stack taken by that particular instruction; e.g. the arity of the '+'
operator equals 2 and the name equals "+".
Below the AbstractInstruction there are two subtypes of instructions: the
AbstractFunctionInstruction and the QuoteInstruction. The first subclass
implements ordinary built-in functions as well as user defined functions.
The QuoteInstruction implements special function type instruction. An example
of such an instruction is the IfInstruction: the arity equals 3 and its name
is "if". The allows us to type things like this:
Expand|Select|Wrap|Line Numbers
- if (x, y, z)
'y', otherwise the value equals 'z'. As you could have read in a previous
part of this article, the generated instructions for x, y and z are passed to
the IfInstruction as is, i.e. the 'if' instruction passed the sequence of
instructions that make up expression 'x' to the interpreter to be executed.
If the top element of the stack is non-zero, the instruction sequence for 'y'
is passed to the interpreter, othewise the instruction sequence for expression
'z' is passed to the interpreter.
The FunctionInstruction is the base class for ordinary built-in functions
such as 'sin', 'cos', 'log' etc. The UserInstruction handles all user defined
functions.
Functions
Operators are implemented as FunctionInstructions. There's nothing strange
about it; compare this:
Expand|Select|Wrap|Line Numbers
- add(x, y)
- x+y
operator doubles as the 'add' function; both are the same.
A function pops its arguments from the stack and computes its value. For the '+'
operator the operands are added and the result is pushed back on the stack. For,
say, the 'sin' function, one operand is popped from the stack, the sine is
computed and the result is pushed back on the stack again.
Instead of implementing all that pushing and popping in every class the
AbstractFunctionInstruction takes care of it all. See the source code attached
with the previous part of this article.
The following functions are implemented:
+ - ! (unary operators)
< <= == != >= >
+ -
* /
^
sin cos tan
exp log
max min abs
All functions have well known semantics.
Stack functions
On top of those 'normal' functions, other functions are available that are able
to explicitly manipulate the data stack.
The following functions are implemented:
arg(n) : move the n-th element from the top to the top of the stack;
top(n) : copy the n-th element from the top to the top of the stack;
clear() : clear the entire stack;
depth() : push the number of stack elements on the stack.
You can play with these functions using the software available in the attachment
of the previous part of this article. If you ruin the stack no harm is done, i.e.
the interpreter signals an incorrect stack and throws an exception which will
be printed and execution simply stops. If you want to use the Interpreter class
in you own software you can print a message and start evaluating another sequence
of instructions. Feel free to play with it.
Lists
Our little language supports simple lists, e.g. the '+' function/operator can
be invoked on simple double operands as in, say "3+4" but one or both of its
operands can also be lists. If more than one operand is a list all lists have
to be 'simple' which means: one dimensional lists of the same length; here
are a few examples:
Expand|Select|Wrap|Line Numbers
- {3, 4} + 5 // {8, 9}
- {3, 4} + {5, 6} // {8, 10}
- {3, 4} + {5, 6, 7} // error
- {3, 4} + {5, {6, 7}} // error
- min({3, 6}, {5, 4}) // {3, 4}
- max({3, 6}, 5) // {5, 6}
operands is a list then all lists that represent the operand(s) of the function
need to be simple and of the same length. The AbstractFunctionInstruction passes
the double operands to the function and collects the results. Finally it builds
a list out of the results again.
Some functions bypass this mechanism, i.e. they handle list operands themselve.
Remember the 'listfunc' reserved word? User functions defined using the 'listfunc'
reserved word are marked such that the mechanism described above is bypassed.
There are also a few built-in functions that handle lists:
append(x, y) // append y to x
aslist(n) // store the top n stack elements in a list
concat(x, y) // catenate x and y in a new list
head(x) // first element of list x
tail(x) // all but the first element of list x
join(x, y) // join the two lists x and y
size(x) // number of elements in list x
Again, feel free to play with those functions; it's fun. See if you can find what
append, concat and join really do and how they differ.
Quote instructions
One of the QuoteInstructions has been mentioned above already; there are three
of those instructions defined:
if(x, y, z) // if (x) then y else z
while(x, y) // 0 or the last result of y
generate(x, y) // while (x) append y to a list
Please do play a bit with the included software (see previous article part)
and if you feel adventurous use the already implemented classes as a blueprint
for your own functions. If won't be much of an adventure. Don't forget to alter
the .properties files accordingly.
Concluding remarks
Given all the software described in this large article it is a breeze to
implement, say, an expression evaluator. Just a few lines of code will do the
job. This interpreter can do much more than just that but I leave it to your
imagination to come up with great ideas that can put this little system to
good work. Even just printing out the generated instructions (try it!) can
be an educational experience: you'll see infix to postfix translation for free.
Here's that evaluator method:
Expand|Select|Wrap|Line Numbers
- // evaluate the expression, throws InterpreterExcetpion if anything's wrong
- public double evaluate(String expression) throws InterpreterException {
- // parse the string
- Code code= new Parser().parse(new StringReader(expression));
- // do the evaluation and get the stack top
- Value value= new Interpreter().executeSub(code);
- // convert the value back to a double
- return value.getDbl();
- }
must've learned quite a bit and I hope the included software can be of any
use for you.
The next tip will be about text processing and querying. I haven't actually
implemented anything yet but I'll come up with something.
kind regards,
Jos