473,549 Members | 2,591 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Good Representation for an Abstract Syntax Tree

I'm in the process of writing an interpreter for lambda calculus (i.e.
a small functional programming language) in C. I've previously written
one in Haskell so I understand at least some of the concepts involved
in writing an interpreter and I know how to define a grammar and use
lex/yacc to parse it. What I don't know is how to represent the
abstract syntax tree (AST) of my language in C. In Haskell I used
what's called an algebraic data type like so:

data Exp = Lam Id Exp
| Var Id
| App Exp Exp
| Lit Int

Legend:
Exp - expression (i.e. one of Lam, Var, App or Lit)
Lam - lambda abstraction (i.e. a function)
App - function application
Lit - integer literal
Id - variable name (i.e. a string)

How would I represent such a structure in C, perhaps with a union with
some sort of tag? I guess I would use a class hierarchy in an
object-oriented language but I really want to stay in C since I
consider this an educational experience for me (i.e. the goal is not to
write an interpreter as quickly as possible but rather learn how to do
it in an efficient way in C).

Once I have an AST I can begin applying transformations such as closure
and CPS (continuation-passing style) conversions to it and I think I
can achieve that on my own but right now finding a good AST
representation is holding me back.

P.S. For those who are interested my plan is to write an all C
interpreter for a strict functional language and use the Boehm garbage
collector.

Thanks in advance,

Johan Tibell

Jul 18 '06 #1
9 4004
jo**********@gm ail.com wrote:
I'm in the process of writing an interpreter for lambda calculus (i.e.
a small functional programming language) in C. I've previously written
one in Haskell so I understand at least some of the concepts involved
in writing an interpreter and I know how to define a grammar and use
lex/yacc to parse it. What I don't know is how to represent the
abstract syntax tree (AST) of my language in C. In Haskell I used
what's called an algebraic data type like so:

data Exp = Lam Id Exp
| Var Id
| App Exp Exp
| Lit Int

Legend:
Exp - expression (i.e. one of Lam, Var, App or Lit)
Lam - lambda abstraction (i.e. a function)
App - function application
Lit - integer literal
Id - variable name (i.e. a string)

How would I represent such a structure in C, perhaps with a union with
some sort of tag? I guess I would use a class hierarchy in an
object-oriented language but I really want to stay in C since I
consider this an educational experience for me (i.e. the goal is not to
write an interpreter as quickly as possible but rather learn how to do
it in an efficient way in C).
Yes, a series of struct within a union would be reasonably efficient.
Here is one way to lay them out:

struct Exp;

struct Lam
{
char *id;
struct Exp *exp;
};

struct Var
{
char *id;
};

struct App
{
struct Exp *exp1;
struct Exp *exp2;
};

struct Lit
{
int value;
};

enum Type
{
LAM,
VAR,
APP,
LIT
};

struct Exp
{
enum Type type;
union {
struct Lam *lam;
struct Var *var;
struct App *app;
struct Lit *lit;
} form;
};

For those types of expression that only contain a single data member
(Var only contains a string and Lit only contains an integer), you may
choose to place their data directly into the union rather than adding a
level of indirection.

--
Simon.
Jul 18 '06 #2
jo**********@gm ail.com wrote:
I'm in the process of writing an interpreter for lambda calculus (i.e.
a small functional programming language) in C. I've previously written
one in Haskell so I understand at least some of the concepts involved
in writing an interpreter and I know how to define a grammar and use
lex/yacc to parse it. What I don't know is how to represent the
abstract syntax tree (AST) of my language in C. In Haskell I used
what's called an algebraic data type like so:

data Exp = Lam Id Exp
| Var Id
| App Exp Exp
| Lit Int

Legend:
Exp - expression (i.e. one of Lam, Var, App or Lit)
Lam - lambda abstraction (i.e. a function)
App - function application
Lit - integer literal
Id - variable name (i.e. a string)

How would I represent such a structure in C, perhaps with a union with
some sort of tag?
I would (and, for a different language, have) do it in two layers.

I'd have a bunch of functions expressing the expression structure, eg,

isLambda(Exp), lambdaId(Exp), lambdaBody(Exp)
isVar(Exp), varId(Exp)
isApp(Exp), appFun(Exp), appArg(Exp)
isLit(Exp), litInt(Exp)

In the C file that defines those functions, I'd have whatever representation
I choose. It might be

struct ExpStruct
{
enum { LAM, VAR, APP, LIT } type;
union
{
struct LamStruct lam;
struct VarStruct var;
struct AppStruct app;
struct LitStruct lit; /* or perhaps `int lit` */
}
};

(And this /wouldn't/ be in the public header file, which would have
the minimum needed to give the function signatures. As a first
cut, it would say things like:

typedef struct ExpStruct *Exp;

While some people argue against typedefing struct pointers in this
way, it allows you, if necessary, to /completely/ change your
representation later, eg:

typedef int Exp;

and have expressions encoded as integers, which might be indexes
into an array of expression objects or immediate values or
whatever. The access-via-functions means that the code doesn't
care.
)

If this representation turned out to be naff, I'd pick a different one,
but all being well, the rest of the code would be protected from the
representation change by the functional API.

If performance of the API turned out to be a problem, I might be able to
judiciously inline or macroise the API.

[For the language I did, I had a fiendish representation that played
not-everso-portable games with pointers-as-integers so that I could
reduce the space taken up by nodes, since there was some assumption
that the code would be running in smallish devices with "sane" pointer
semantics. Don't do that unless you /have/ to, and I bet you won't.]

--
Chris "seeker" Dollin
"Reaching out for mirrors hidden in the web." - Renaissance, /Running Hard/

Jul 18 '06 #3
In article <e9**********@m alatesta.hpl.hp .com>
Chris Dollin <ch**********@h p.comwrote:
struct ExpStruct
{
enum { LAM, VAR, APP, LIT } type;
union
{
struct LamStruct lam;
struct VarStruct var;
struct AppStruct app;
struct LitStruct lit; /* or perhaps `int lit` */
}
};
Note that the union-member of a "struct ExpStruct" needs a name.
For instance, the second-to-last line above should perhaps read
"} un;".

I occasionally wish that C had un-named union-members of outer
"struct"s. So I sometimes fake them, using code like this:

struct exp {
enum { LAM, VAR, APP, LIT } exp_type;
union {
struct lam un_lam;
struct var un_var;
struct app un_app;
struct lit un_lit;
} exp_un;
};
#define exp_lam exp_un.un_lam
#define exp_var exp_un.un_var
#define exp_app exp_un.un_app
#define exp_lit exp_un.un_lit

Then, given a "struct exp *ep" pointing to a valid "exp", I can
write:

switch (ep->exp_type) {
case LAM:
do something with ep->exp_lam;
...
case VAR:
do something with ep->exp_var;
...
...
}

The "#define"s make these expand to ep->exp_un.un_la m, ep->exp_un.un_va r,
and so on.

Without the intermediate "#define"s (and using the names in Chris
Dollin's example code, with the small correction), this becomes:

switch (ep->type) {
case LAM:
do something with ep->un.lam;
...
...
}

You have to name the union member, then write the name of the
union member.

Some non-C languages (including Plan 9 C) do have a way to insert
anonymous items that "bubble out" to an outer enclosing struct,
but neither C89 nor C99 support this.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Jul 18 '06 #4
Chris Torek wrote:
In article <e9**********@m alatesta.hpl.hp .com>
Chris Dollin <ch**********@h p.comwrote:
> struct ExpStruct
{
enum { LAM, VAR, APP, LIT } type;
union
{
struct LamStruct lam;
struct VarStruct var;
struct AppStruct app;
struct LitStruct lit; /* or perhaps `int lit` */
}
};


Note that the union-member of a "struct ExpStruct" needs a name.
For instance, the second-to-last line above should perhaps read
"} un;".

I occasionally wish that C had un-named union-members of outer
"struct"s. So I sometimes fake them, using code like this:

struct exp {
enum { LAM, VAR, APP, LIT } exp_type;
union {
struct lam un_lam;
struct var un_var;
struct app un_app;
struct lit un_lit;
} exp_un;
};
#define exp_lam exp_un.un_lam
#define exp_var exp_un.un_var
#define exp_app exp_un.un_app
#define exp_lit exp_un.un_lit

Then, given a "struct exp *ep" pointing to a valid "exp", I can
write:

switch (ep->exp_type) {
case LAM:
do something with ep->exp_lam;
...
case VAR:
do something with ep->exp_var;
...
...
}

The "#define"s make these expand to ep->exp_un.un_la m, ep->exp_un.un_va r,
and so on.

Without the intermediate "#define"s (and using the names in Chris
Dollin's example code, with the small correction), this becomes:

switch (ep->type) {
case LAM:
do something with ep->un.lam;
...
...
}

You have to name the union member, then write the name of the
union member.

Some non-C languages (including Plan 9 C) do have a way to insert
anonymous items that "bubble out" to an outer enclosing struct,
but neither C89 nor C99 support this.
lcc-win32 supports this.
Jul 18 '06 #5
Chris Torek wrote:
In article <e9**********@m alatesta.hpl.hp .com>
Chris Dollin <ch**********@h p.comwrote:
> struct ExpStruct
{
enum { LAM, VAR, APP, LIT } type;
union
{
struct LamStruct lam;
struct VarStruct var;
struct AppStruct app;
struct LitStruct lit; /* or perhaps `int lit` */
}
};

Note that the union-member of a "struct ExpStruct" needs a name.
For instance, the second-to-last line above should perhaps read
"} un;".
ARGH. Thanks, Chris. Sorry, Johan.

It was a typo. Well, a braino.

--
Chris "Brain-O, Bray-ay-ay-ayn-O, daylight come & ..." Dollin
"Who are you? What do you want?" /Babylon 5/

Jul 19 '06 #6
jacob navia <ja***@jacob.re mcomp.frwrote:
Chris Torek wrote:
[ SNIP! over 60 lines ]
Some non-C languages (including Plan 9 C) do have a way to insert
anonymous items that "bubble out" to an outer enclosing struct,
but neither C89 nor C99 support this.

<spamsupports this.
You just quoted _seventy lines_ just to add a oneliner that says that
<spammy spammy spamis non-conforming (which, btw, we already know all
too well)? Grief, have you so few customers that you need such tactics?

Richard
Jul 19 '06 #7
jacob navia said:
Chris Torek wrote:
<snip>
>>
Some non-C languages (including Plan 9 C) do have a way to insert
anonymous items that "bubble out" to an outer enclosing struct,
but neither C89 nor C99 support this.

lcc-win32 supports this.
That might be relevant in an lcc newsgroup, but what matters in comp.lang.c
is that the construct is non-standard and non-portable, and should thus be
avoided by those who need to write portable code.

Please stop confusing "Navia's implementation" with "the real world of
portable programming".

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
Jul 19 '06 #8
Simon Biber wrote:
Yes, a series of struct within a union would be reasonably efficient.
Here is one way to lay them out:

struct Exp;

struct Lam
{
char *id;
struct Exp *exp;
};

struct Var
{
char *id;
};

struct App
{
struct Exp *exp1;
struct Exp *exp2;
};

struct Lit
{
int value;
};

enum Type
{
LAM,
VAR,
APP,
LIT
};

struct Exp
{
enum Type type;
union {
struct Lam *lam;
struct Var *var;
struct App *app;
struct Lit *lit;
} form;
};

For those types of expression that only contain a single data member
(Var only contains a string and Lit only contains an integer), you may
choose to place their data directly into the union rather than adding a
level of indirection.

--
Simon.
When dynamically allocating memory for such a struct using malloc will
sizeof(Exp) be enough to get sufficient memory allocated for the
biggest union member? Will this work:

struct Exp *exp;
exp = malloc(sizeof(s truct Exp));
if (!exp)
fprintf(stderr, "out of memory\n");

exp->form.lam = malloc(sizeof(s truct Lam));
if (!exp->form.lam)
fprintf(stderr, "out of memory\n");

(Also, is it good style?)

Jul 20 '06 #9
"Johan Tibell" <jo**********@g mail.comwrites:
Simon Biber wrote:
[...]
>struct Exp
{
enum Type type;
union {
struct Lam *lam;
struct Var *var;
struct App *app;
struct Lit *lit;
} form;
};
[...]
>
When dynamically allocating memory for such a struct using malloc will
sizeof(Exp) be enough to get sufficient memory allocated for the
biggest union member?
Yes. sizeof for a union yields at least the size of its largest
member. (It's typically the size of its largest member rounded up to
the alignment of its most strictly aligned member.)
Will this work:

struct Exp *exp;
exp = malloc(sizeof(s truct Exp));
if (!exp)
fprintf(stderr, "out of memory\n");

exp->form.lam = malloc(sizeof(s truct Lam));
if (!exp->form.lam)
fprintf(stderr, "out of memory\n");

(Also, is it good style?)
Yes, and mostly.

Rather than
exp = malloc(sizeof(s truct Exp));
...
exp->form.lam = malloc(sizeof(s truct Lam));
I'd write:
exp = malloc(sizeof *exp);
...
exp->form.lam = malloc(sizeof *(exp->form.lam));

(The parentheses on the second malloc aren't strictly necessary, but I
find it easier to read with them.)

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jul 20 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

59
3886
by: seberino | last post by:
I've heard 2 people complain that word 'global' is confusing. Perhaps 'modulescope' or 'module' would be better? Am I the first peope to have thought of this and suggested it? Is this a candidate for Python 3000 yet? Chris
14
4609
by: dreamcatcher | last post by:
I always have this idea that typedef a data type especially a structure is very convenient in coding, but my teacher insisted that I should use the full struct declaration and no further explanations, so I wonder is there any good using typedef ? and I also know that when a data type being typedefed become an abstract data type, so what...
6
4004
by: Melkor Ainur | last post by:
Hello, I'm attempting to build an interpreter for a pascal-like language. Currently, I don't generate any assembly. Instead, I just build an abstract syntax tree representing what I've parsed from a user's program and use a C backend to evaluate the tree. I'm using Lex, Yacc and C. Now, there are some functions that are built into this...
5
1587
by: The Cool Giraffe | last post by:
I'm designing an ABC and in connection to that i have run into some "huh!" and "oh...". Let me put it as a list. 1. Since the class will only contain bodies of the methods, only the header file is needed. There will be no definitions provided until i derive the ABC. True or false? 2. Since i'll have two different classes (both derived...
1
1946
by: techdesk100 | last post by:
Does anyone know why this is not possible int main(void) { goto labelb; labela: { int i = c; goto labelc; }
206
8202
by: WaterWalk | last post by:
I've just read an article "Building Robust System" by Gerald Jay Sussman. The article is here: http://swiss.csail.mit.edu/classes/symbolic/spring07/readings/robust-systems.pdf In it there is a footprint which says: "Indeed, one often hears arguments against building exibility into an engineered sys- tem. For example, in the philosophy of...
5
2475
by: Dave the Funkatron | last post by:
Hey all, I'm using MinGW as part of my toolchain in Eclipse, and I am trying to figure out why I am getting a compiler error when I include the <limitsheader. The command that eclipse is running is g++ -IC:\Dave\School\common\cxx -IC:\Dave\School\common\third party \packages\glut-3.7.6-bin\include -IC:\Dave\School\common\third party
75
3351
by: Amkcoder | last post by:
http://amkcoder.fileave.com/L_BitWise.zip http://amkcoder.fileave.com/L_ptr2.zip
23
3120
by: tonytech08 | last post by:
What I like about the C++ object model: that the data portion of the class IS the object (dereferencing an object gets you the data of a POD object). What I don't like about the C++ object model: that most OO features are not available for class object design without loss of POD-ness. So, I'm more than leaning toward "bad" because of...
0
7734
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7979
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
7826
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5385
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5107
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3493
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1960
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1074
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
781
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.