Compilers - 4: Bookkeeping

11,448 Recognized Expert MVP

Greetings,

this week's compiler article is all about bookkeeping; boring, I admit it, but
we need it for our Tokenizer and Parser(s). Two weeks ago I showed the Tokenizer
class code. It uses a TokenTable which contains the data needed by the Tokenizer.
The TokenTable contains the data alright, but how is this table initialized? I
could've hard coded all data in that table but I didn't. I want to be able to
play and alter that data without altering a single letter in the code itself.
I also want that table to initialize itself. Properties files are good external
resources for that purpose. A Properties file is just a list of key value pairs
separated by an equal '=' sign.

Here's the 'tokens.properties' file:

Expand|Select|Wrap|Line Numbers

 
space    =                      ^\\s*

number     =                      ^\\d+(\\.\\d*)?([eE][+-]?\\d+)?

word     =                      ^[A-Za-z_]\\w*

symbol2  =                      ^(==|<=|>=|!=|^=|\\+=|-=|\\*=|/=|\\+\\+|--)

symbol1  =                      ^[:,=!<>+*/(){}^-]

char     =                      ^\\S

We've seen that contents before in the article part that describes our Tokenizer
class.

Resources

We need a mechanism that reads a file from somewhere and produces a
Properties object. If the file has been read before, we don't want to read it
again but we simply want to produce the same Properties object again and again.
We need a cache for that:

Expand|Select|Wrap|Line Numbers

 
private static Map<String, Properties> cache= 

                new HashMap<String, Properties>();

Given a String for a key (such as "tokens") we want to cache a Properties object
such as the one listed above as the value element of the map.

The following method finds a Properties object for us given a String key:

Expand|Select|Wrap|Line Numbers

  
private static final fs= System.getProperty("file.separator");
 
protected static synchronized Properties getProperties(String name) {
 
    Properties properties= cache.get(name);
 
    if (properties != null) return properties;
 
    try {

        InputStream is= Resources.class.getResourceAsStream(

                    "resources"+fs+name+".properties");

        properties= new Properties();
 
        properties.load(is);

        is.close();
 
        cache.put(name, properties);
 
        return properties;

    }

    catch (IOException ioe) {

        return null;

    }

}

Given a name (such as "tokens"), this method either finds an already cached
Properties object from the cache or it searches for a file named
"resources/<name>.properties". In our example it searches for a file named
"resources/tokens.properties". If all is fine it creates a new Properties object,
loads the contents of the file in the object, adds the object to the cache and
finally returns the Properties object. If anything fails this method returns
a 'null' value.

Note that we didn't hardcode the file separator character (a slash or backslash
in western languages) because it can be anything; any character we don't know
in advance and we don't want to know it either: let Java deal with it.

A next time a Properties file is wanted it is either cached already or it will
be freshly loaded and put into the cache. This little scenario takes care that
every distinct Properties object will be loaded once at most only.

There's one interesting little thing going on here: this method loads a Properties
object relative to where the class is stored in which this method is defined.
Suppose the class itself is stored in '/usr/jos/java/compiler'. Then the
resource will be searched in directory '/usr/jos/java/compiler/resource'.

We stick this method and the cache itself in a class named 'Resources':

Expand|Select|Wrap|Line Numbers

 
abstract class Resources {
 
    protected Resources() { }
 
    private static Map<String, Properties> cache= 

                    new HashMap<String, Properties>();
 
    protected static synchronized Properties getProperties(String name) { 

        ... 

    }

}

I've made it an abstract class with a single protected default constructor
because I want the table classes to extend from it. Note that the method for
supplying Properties objects is synchronized, just in case two separate threads
want to get the same Properties object at the same time, i.e. we don't want to
load the thing twice or more at the same time.

Also note that this method is a static method just as the cache object itself.
We don't need a Resources object for just loading Properties objects. The class
is a utility class; all it does is supplying functionality.

TokenTable

Let's get on with the boring stuff; here's the TokenTable class:

Expand|Select|Wrap|Line Numbers

 
abstract class TokenTable extends Resources {
 
    private TokenTable() { }
 
    // different types of tokens:

    static final int T_ENDT= -1; // end of stream reached

    static final int T_CHAR=  0; // an ordinary character

    static final int T_NUMB=  1; // a number

    static final int T_TEXT=  2; // a recognized token

    static final int T_NAME=  3; // an identifier
 
    // regexps for different token types

    static final Pattern spcePattern= Pattern.compile(pat("space"));

    static final Pattern numbPattern= Pattern.compile(pat("number"));

    static final Pattern wordPattern= Pattern.compile(pat("word"));

    static final Pattern sym2Pattern= Pattern.compile(pat("symbol2"));

    static final Pattern sym1Pattern= Pattern.compile(pat("symbol1"));

    static final Pattern charPattern= Pattern.compile(pat("char"));
 
    private static String pat(String pattern) {
 
        return getProperties("tokens").getProperty(pattern);

    }

}

We've extended from the Resourses (abstract) class and made the TokenTable class
an abstract class too. The default constructor of this class is private so no-one
can instantiate an object from this class. It isn't needed, i.e. only one table
is needed by any Tokenizer that gets instantiated.

The first part of this class defines a couple of int constants which we have seen
before in the part of this article that showed the Tokenizer class.

The last part initializes the Patterns needed by a Tokenizer. We make use of
the caching facility in the base class Resources by getting property values
from the same Property object over and over again for all six values.

Resources again

The Parser(s) need to know about some more or less fixed matters also:

1) the names of the built-in functions;
2) the names of 'special' built-in functions;
3) the unary operator tokens;
4) the binary operator tokens;
5) the assignment operator tokens;
6) the arity of all operators and functions.

On top of that, for every operator or function name, whether 'special' or not,
a class name is associated with the name/token. The class names are needed by
the code generator but we store that information in the ParserTable for simplicity.

The arity of an operator or function is the number of arguments needed by that
operator or function, e.g. the arity of the 'sin' function is 1, the arity of
the '*' operator is 2 etc.

All the properties files have the same structure: every line in it looks like this:

<description> = <arity> <name> <class>

The <description> is a single word unique description of the function or operator.
The <arity> is a number which is, *ahem*, the arity of the function or operator.
The <name> is the name or token of the function or operator itself and the <class>
is the fully qualified name of the class that implements the function or operator.

As an example, here is part of the contents of the 'functions.properties' file:

Expand|Select|Wrap|Line Numbers

 
sine        = 1    sin    compiler.instruction.SinInstruction

cosine        = 1    cos    compiler.instruction.CosInstruction

tangent        = 1    tan    compiler.instruction.TanInstruction

exponential    = 1    exp    compiler.instruction.ExpInstruction

logarithm    = 1    log    compiler.instruction.LogInstruction

abs        = 1    abs    compiler.instruction.AbsInstruction
 
minimum        = 2    min    compiler.instruction.MinInstruction

maximum        = 2    max    compiler.instruction.MaxInstruction

The first column is the descriptive single word which makes up the key of the
Properties object. The value String makes up three columns: the arity, the
name or token of the function or operator and the last column is the fully
qualified name of the class that implements the function or operator.

The Resources class uses another table for that: the GeneratorTable which defines
a Map also:

Expand|Select|Wrap|Line Numbers

 
protected static Map<String, String> classes= new HashMap<String, String>();

This Map stores tuples of the form <name or token> - <class name> and is build
by the Resources class on the fly.

Next we add a method to the Resources class to handle a Properties object like
this:

Expand|Select|Wrap|Line Numbers

 
protected static Set<String> getResource(String name) {
 
    Properties properties= getProperties(name);

    Set<String> tokens;
 
    if (properties == null) return null;
 
    tokens= new HashSet<String>();
 
    for(Map.Entry property: properties.entrySet()) {

        String[] entries= ((String)property.getValue()).

                    trim().split("\\s+");
 
        int marked= entries[1].indexOf('@');
 
        ParserTable.arity.put(entries[1], Integer.valueOf(entries[0]));
 
        if (marked < 0)

            tokens.add(entries[1]);

        else 

            tokens.add(entries[1].substring(0, marked));
 
        GeneratorTable.classes.put(entries[1], entries[2]);

    }
 
    return tokens;

}

This is quite a large method; let's see what it does: it returns a Set of names
or tokens; the Set consists of all the names in the second column of the table
(see above). It builds up the set as follows:

it reads every value (the String following the '=' character) and splits the
value into a String array 'entries'. entries[0] is the arity, entries[1] is the
name of the function or operator and entries[2] is the class name. We check if
the name contains a '@' sign and put every element in its place in the appropriate
map. The 'classes' map is updated and the 'tokens' Set is updated and returned
at the end of this method. The '@' sign is removed from the name for the 'tokens'
Set but it is used as part of the key for the 'classes' Map.

Note that this method directly refers to a member of its subclass: the 'arity'
Map. It updates that map for every line in the .properties file. Purist would
argue that we're breaking the Liskov Substitution Priniple (see another artice
for an explanation) but this base class as well as its subclasses TokenTable
and ParserTable are not really classes; they're just a bunch of static methods
grouped in abstract classes for convenience. I could've dumped those methods
and Maps in one big utility class as well but I didn't. I want to group the
functionalities in classes for a bit of clarity.

We're still not done with this Resources class. A ParserTable needs to store
information about 'special' functions and reserved words. The ParserTable needs
sets of Strings for that. Here's another utility method in our Resources class:

Expand|Select|Wrap|Line Numbers

 
protected static Set<String> getSet(String name) {
 
    Properties properties= getProperties(name);

    Set<String> set= new HashSet<String>();
 
    if (properties != null)

        for (Object elem : properties.keySet())

            set.add((String)elem);
 
    return set;

}

This methods uses a Properties object and stores the keys of that object in a Set.
Here's the file that lists the reserved words of our little language; I named it
'reserved.properties':

Expand|Select|Wrap|Line Numbers

 
function  =    a declaration or definition of a user function

listfunc  =    a declaration or definition of a list user function

The return value of the method above given this file is a Set with two elements
in it: 'function' and 'listfunc', both reserved words.

ParserTable

Finally, after all this boring bookkeeping, here's the ParserTable class:

Expand|Select|Wrap|Line Numbers

 
abstract public class ParserTable extends Resources {
 
    // total number of binary operator precedences

    private static final int PREC= 6;
 
    private ParserTable() { }
 
    public static final int T_FUNC=  TokenTable.T_NAME+1; 

    public static final int T_QUOT=  TokenTable.T_NAME+2;

    public static final int T_USER=  TokenTable.T_NAME+3;

    public static final int T_WORD=  TokenTable.T_NAME+4;
 
    public static final Map<String, Integer> arity= 

                new HashMap<String, Integer>();
 
    public static final Set<String> funcs= getResource("functions");

    public static final Set<String> quots= getResource("quotes");

    public static final Set<String> rword= getSet("reserved");

    public static final Set<String> lfncs= getSet("listfuncs");
 
    // all unary, binary operators and assignments

    static final Set<String> unaops= getResource("unaops");

    static final Set<String> pstops= getResource("postops");

    static final Set<String> asgns= getResource("assigns");
 
    static final List<Set<String>> binops= new ArrayList<Set<String>>(PREC);
 
    static {

        for (int i= 0; i < PREC; i++)

            binops.add(getResource("binops"+i));

    }

}

Similar to the TokenTable (see above), this class is nothing more than a couple
of Sets, Maps and Lists of Maps. The first part of this class defines a couple of
constants needed by the Parser(s) while the rest of the class uses the methods
in its superclass (Resources) to fill its Sets and Maps.

The 'arity' Map is filled on the fly when other maps and sets are loaded.

The List of Maps represents the binary operators; the 0th element of this List
store the binary operators with the lowest precedence ( ':' ), and so on.
See the previous article part for a description of all binary operators and their
precedence.

The Sets of the ParserTable are discussed when we implement our Parser(s).
A short peek:

- the Set funcs stores the names of the built-in normal function names;
- the Set quots stores the names of special built-in function names;
- the Set rword stores the reserved words;
- the Set lfncs stores the names of built-in functions that take lists as
their parameters.

GeneratorTable

There's one more table to be discussed: the GeneratorTable. This table stores
the names of the operators or functions and associates them with their fully
qualified class name. The data for this table is picked up by the Resources
class from the .properties files and stored in a Map in this table.

A second Map in this class caches instructions:

Expand|Select|Wrap|Line Numbers

 
private static Map<String, Instruction> instructions= 

                new HashMap<String, Instruction>();

Given the name of the instruction the real instruction can be associated with it.
This caching mechanism implements the Flyweight pattern, e.g. there's only one
'sin' instruction instantiation needed, no matter how many times the sine function
is used in expressions. This Map takes care of the mechanics needed by this pattern.

The GeneratorTable class implements two methods for just this:

Expand|Select|Wrap|Line Numbers

 
protected static Instruction getInstruction(String name) {
 
    try {

        return (Instruction)Class.forName(classes.get(name)).

                newInstance();    

    }

    catch (Exception e) {

        e.printStackTrace();

        return null;

    }

}

This method instantiates a new Instruction given the name of the operator or
function. When it fails (it shouldn't) it prints the complete stack trace telling
us *why* it failed. Most likely the .properties files contain erroneous data which
should be fixed.

The second method takes care of the caching of instructions:

Expand|Select|Wrap|Line Numbers

 
protected static Instruction cacheInstruction(String name) {
 
    Instruction instruction= instructions.get(name);
 
    if (instruction == null)

        instructions.put(name, instruction= getInstruction(name));
 
    return instruction;

}

If the instruction was cached before, it is returned, otherwise the previous method
is invoked which instantiates a new instruction for us given the Map that stores
the fully qualified class name of the wanted instruction.

Both methods are protected because the outide world shouldn't even know that this
class exists. It is solely used by the Parser(s) and/or the Interpreter.

Concluding remarks

There's nothing interesting about the four classes described above. The Resources
class is the base class of the TokenTable, ParserTable and GeneratorTable classes
for conveniece.

The four classes are simple utitility classes, no more and no less and don't need
any instances at all. The base class Resources does all the work: it loads contents
from files into Properties objects and fills the maps of the sub-classes.

The main purpose of those four classes is that I don't want to hard-code every
single name or token or whatever in the rest of the code. I want to be able to
change simple .properties files and alter the language without having to alter
much Java code.

This was the boring chapter of the Compilers article; it had to be written for
reasons of completeness; it was quite a big chapter too; I showed how tables fill
themselves when loaded by the JVM. The data comes from .properties files which
are easy to edit. Those .properties files are not a 'user feature'; the contents
of those .properties files have to be correct in order for the entire compilation
system to behave correctly.

I do hope I haven't scared you off too much with this part of the Compilers article,
but that's life: compiler writing is 20% inspiration and 80% perspiration. This
boring article part took care of most if not all of the perspiration.

The inspriration will be back in the sequel(s) of this article. I'm sure we'll
see each other again next week when we finally can dig into Parsers.

kind regards,

Jos

Jun 6 '07 #1

Subscribe Reply

7895

Similar topics

3121

looking for compilers and/or platforms **without** C++ exception support

by: Bill Davidson | last post by:

Hi there, Please forgive me for posting this article on multiple groups. Being new in the newsgroups, I was not sure which group would have been appropriate for my question. Sorry. My...

C / C++

1840

Book for bookkeeping with vb.net source code

by: Andy Sze | last post by:

Hi, Where can I buy a book how to design a simple bookkeeping system with vb.net source code ? Thanks !

.NET Framework

2667

Link compatibility among C compilers?

by: Derek | last post by:

As I understand it there is a good amount of link compatibility among C compilers. For example, I can compile main.c with GCC and func.c with Sun One and link the objects using either linker (GNU...

C / C++

1498

New and enhanced low cost C compilers

by: Chris Stephens | last post by:

Low Cost C Compilers ------------------------------------ HI-TECH Software's C compilers are now available to support the ARM, dsPIC, msp430, 8051, PIC 10 to 17, PIC 18 as well as many other...

C / C++

2168

designing of the compilers

by: pransri2006 | last post by:

Hi guys! I think all of u know about the designing of compilers. Can any body tell me about the designing of the compilers. And also tell me the difference between the compilers and Interpreter...

C / C++

2984

What's going on with C Compilers and C99??

by: albert.neu | last post by:

Hello! What is a good compiler to use? (for MS Windows, for Linux) Any recommendations?? What's the point of having a C99 standard, if different compilers still produce differing results? ...

C / C++

3880

Compilers - 3: Grammars

by: JosAH | last post by:

Greetings, this week we discuss the design of the syntactic aspects of our little language; it helps with the design for the parser(s) that recognize such syntax. Last week we saw the tokenizer:...

Java

2330

C++ single entry bookkeeping library

by: Bartholomew Simpson | last post by:

I thought I'd ask in here (after countless hours "googling" proved fruitless). I need a simple library of C++ classes for single entry bookkeeping - I dont want to reinvent the wheel if its been...

C / C++

4039

Compilers - 7: Instructions

by: JosAH | last post by:

Greetings, Introduction This part of the article is one week late; I apologize for that; my excuse is: bizzy, bizzy, bizzy; I attended a nice course and I had to lecture a bit and there...

Java

7134

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

7014

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

7180

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

7229

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

6905

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

7395

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

5485

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

4921

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

3103

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET