473,394 Members | 1,567 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

C parser yielding syntax tree data structure?

For a research project, we're looking for a reliable parser for C
that will take an ANSI C program and yield a tree representation of
the program (as a Java or C++ object). Of course a grammar e.g. in
jflex/jbison that will yield the same thing is fine too. We have been
able to find some grammars and parsers, of unknown reliability, that
don't yield a syntax tree; we want to avoid starting with a flaky
parser and/or adding the syntax tree code.

Preferably the tokens in the tree will contain information
on the line number and character number of the token, but if it
is sufficiently easy to add that code, then we can do that too.

Thanks for any info you can give.

--Jamie. (efil4dreN)
Apr 8 '06 #1
7 5664
"Jamie Andrews" <an*****@csd.uwo.ca> wrote in message
For a research project, we're looking for a reliable parser for C
that will take an ANSI C program and yield a tree representation of
the program (as a Java or C++ object). Of course a grammar e.g. in
jflex/jbison that will yield the same thing is fine too. We have been
able to find some grammars and parsers, of unknown reliability, that
don't yield a syntax tree; we want to avoid starting with a flaky
parser and/or adding the syntax tree code.

Preferably the tokens in the tree will contain information
on the line number and character number of the token, but if it
is sufficiently easy to add that code, then we can do that too.


(Since this is cross-posted, for those on comp.lang.c: yes, I've
posted most of these links previously...)

I don't know which if any of these may fulfill your needs, but they may be
worth a look. I also noticed some of the links are bad as I posted, but
they may still help you to track them down.

CIL - C Intermediate Language - C to C transformation
http://manju.cs.berkeley.edu/cil/

WCC - A C Subset Compiler (DECUS ftp links now appear to be dead...sorry)
http://www.decus.org/libcatalog/desc...ml/v00281.html
ftp://ftp.encompassus.org/lib/

npath - C Source Complexity Measures
http://www.geonius.com/software/tools/npath.html

Check: A unit test framework for C
http://check.sourceforge.net/

CTool Library (call-graph generator, source transformations)
http://ctool.sourceforge.net/

Cproto automatically generates C function prototypes
http://cproto.sourceforge.net/

JSCPP - a C preprocessor + parser with special modes
http://www.die-schoens.de/prg/

CXREF C language cross referencing program
in volume1 of comp.sources.unix:
http://ftp.sunet.se/pub/usenet/ftp.u....sources.unix/

CSur Le projet Csur (in French)
An analyzer of code C to detect common program execution errors
http://www.lsv.ens-cachan.fr/~goubault/Csur/csur.html

Chico State Mini-C Compiler (CSMCC) is a student training load-and-go
compiler (incomplete, teaching tool)
http://www.ecst.csuchico.edu/~sameerg/compproj.html
http://www.ecst.csuchico.edu/~hilzer/csci250/proj/

Edward Willink's C++ grammars:
http://www.computing.surrey.ac.uk/research/dsrg/fog/
(some of the links have an extra text '/v' in them, just delete)

ISO C/C++ grammars version 1.2 (c-c++-grammars-1.2.tar.gz)
http://www.sigala.it/sandro/download.php

A C99 Parser, a recursive decent parser
http://www.mazumdar.demon.co.uk/c_parser.html

Ctags generates an index (or tag) file of language objects
http://ctags.sourceforge.net/

Cdecl English<->C translator for C declarations
cdecl in volume6 of comp.sources.unix:
cdecl2 in volume14 of comp.sources.unix:
http://ftp.sunet.se/pub/usenet/ftp.u....sources.unix/
Rod Pemberton
Apr 9 '06 #2
(Jamie Andrews) wrote:
For a research project, we're looking for a reliable parser for C
that will take an ANSI C program and yield a tree representation of
the program (as a Java or C++ object). ...


I've not done it, but, if I should solve the same problem, my first
step will be see if a C compiler can "dump" the tree in a readable
format. By example, gcc allows the options -fdump-tree-xxxx

It could work, ... or not.

Apr 9 '06 #3
(Jamie Andrews) wrote:
For a research project, we're looking for a reliable parser for C
that will take an ANSI C program and yield a tree representation of
the program (as a Java or C++ object). Of course a grammar e.g. in
jflex/jbison that will yield the same thing is fine too. We have been
able to find some grammars and parsers, of unknown reliability, that
don't yield a syntax tree; we want to avoid starting with a flaky
parser and/or adding the syntax tree code.


On my search for a C++ Parser that yields an AST, I tried two
parsers, that look fine for C, while not being able to parse all C++
constructs.

- C or C++ grammar for ANTLR (http://www.antlr.org/grammar/list)

- ELSA/Elkhound (http://www.cs.berkeley.edu/~smcpeak/elkhound/)

I am currently using ELSA, hoping the few remaining bugs
(resp. C++) are fixed some time.

Cheers,
Arndt
Apr 13 '06 #4
"Jamie Andrews" <an*****@csd.uwo.ca> wrote in message
For a research project, we're looking for a reliable parser for C
that will take an ANSI C program and yield a tree representation of
the program (as a Java or C++ object). ...


The DMS Software Reengineering Toolkit provides a full ANSI C front
end, with preprocessor, builds ASTs and symbol table information, and
provides facilities for constructing custom analyzers and
source-to-source transformations. See
http://www.semdesigns.com/Products/F...CFrontEnd.html

--
Ira Baxter, CTO
www.semanticdesigns.com

Apr 13 '06 #5
On Wednesday 12 April 2006 22:48, Arndt Muehlenfeld wrote:
(Jamie Andrews) wrote:
For a research project, we're looking for a reliable parser for C
that will take an ANSI C program and yield a tree representation of
the program (as a Java or C++ object). Of course a grammar e.g. in
jflex/jbison that will yield the same thing is fine too. We have been
able to find some grammars and parsers, of unknown reliability, that
don't yield a syntax tree; we want to avoid starting with a flaky
parser and/or adding the syntax tree code.


Consider ROSE
http://www.llnl.gov/CASC/rose/

I understand that another version is due within a month or two.

-paul-
--
Paul E. Black (p.*****@acm.org)

Apr 14 '06 #6
> > For a research project, we're looking for a reliable parser for C
that will take an ANSI C program and yield a tree representation of
the program (as a Java or C++ object). ...
I don't know which if any of these may fulfill your needs, but they may be
worth a look. I also noticed some of the links are bad as I posted, but
they may still help you to track them down.

CIL - C Intermediate Language - C to C transformation
http://manju.cs.berkeley.edu/cil/

WCC - A C Subset Compiler (DECUS ftp links now appear to be dead...sorry)
http://www.decus.org/libcatalog/desc...ml/v00281.html
ftp://ftp.encompassus.org/lib/

npath - C Source Complexity Measures
http://www.geonius.com/software/tools/npath.html

Check: A unit test framework for C
http://check.sourceforge.net/

CTool Library (call-graph generator, source transformations)
http://ctool.sourceforge.net/

Cproto automatically generates C function prototypes
http://cproto.sourceforge.net/

JSCPP - a C preprocessor + parser with special modes
http://www.die-schoens.de/prg/

CXREF C language cross referencing program
in volume1 of comp.sources.unix:
http://ftp.sunet.se/pub/usenet/ftp.u....sources.unix/

CSur Le projet Csur (in French)
An analyzer of code C to detect common program execution errors
http://www.lsv.ens-cachan.fr/~goubault/Csur/csur.html

Chico State Mini-C Compiler (CSMCC) is a student training load-and-go
compiler (incomplete, teaching tool)
http://www.ecst.csuchico.edu/~sameerg/compproj.html
http://www.ecst.csuchico.edu/~hilzer/csci250/proj/

Edward Willink's C++ grammars:
http://www.computing.surrey.ac.uk/research/dsrg/fog/
(some of the links have an extra text '/v' in them, just delete)

ISO C/C++ grammars version 1.2 (c-c++-grammars-1.2.tar.gz)
http://www.sigala.it/sandro/download.php

A C99 Parser, a recursive decent parser
http://www.mazumdar.demon.co.uk/c_parser.html

Ctags generates an index (or tag) file of language objects
http://ctags.sourceforge.net/

Cdecl English<->C translator for C declarations
cdecl in volume6 of comp.sources.unix:
cdecl2 in volume14 of comp.sources.unix:
http://ftp.sunet.se/pub/usenet/ftp.u....sources.unix/


These additional links may be of some use. ASTRÉE appears to be great
but I don't see any code release...

CCURED memory safe C transformations (for CIL)
http://manju.cs.berkeley.edu/ccured/

C Code Checker (for CIL)
http://www.drugphish.ch/~jonny/cca.html

PScan Scan C files for format string overflows
http://www.striker.ottawa.on.ca/~aland/pscan/

CQUAL C checking through extended type qualifiers
http://www.cs.umd.edu/~jfoster/cqual/

Smatch - Source Matcher, C source checker for Linux Kernel
http://smatch.sourceforge.net/

SPLint Secure Programming Lint error detection
http://www.splint.org

BOON Buffer Overrun detectiON
http://www.cs.berkeley.edu/~daw/boon/

CZECH, project pedantic error detection
http://pedantic.sourceforge.net/

Flawfinder for C (in Python)
http://www.dwheeler.com/flawfinder/

ASTRÉE determines absence of runtime errors (in OCAML)
http://www.astree.ens.fr/
"In Nov. 2003, ASTRÉE was able to prove completely automatically the absence
of any RTE in the primary flight control software of the Airbus A340
fly-by-wire system, a program of 132,000 lines of C"
Rod Pemberton

Apr 16 '06 #7
(Jamie Andrews) wrote:
For a research project, we're looking for a reliable parser for C
that will take an ANSI C program and yield a tree representation of
the program (as a Java or C++ object). Of course a grammar e.g. in
jflex/jbison that will yield the same thing is fine too. We have been
able to find some grammars and parsers, of unknown reliability, that
don't yield a syntax tree; we want to avoid starting with a flaky
parser and/or adding the syntax tree code.

Preferably the tokens in the tree will contain information
on the line number and character number of the token, but if it
is sufficiently easy to add that code, then we can do that too.

Thanks for any info you can give.


Linus Torvalds of Linux fame once wrote one of these called "sparse".
See http://freshmeat.net/projects/sparse/

I think the latest version is in
http://www.kernel.org/pub/scm/devel/sparse/ you need to use the "git"
version control system to download it unfortunately.

It doesn't give C++ or Java objects as an output. But, as you'll find
there is not much gained by treating a whole tree as an object, the
important thing is that it has a single root.

I have no idea how good this parser is, you'll have to find that out
for yourself. It's hard to tell how good any C parser is that doesn't
get heavy use, and hard enough to tell if a parser does get heavy use.
Apr 22 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
by: Mike | last post by:
Does anyone know of a minimal/mini/tiny/small xml parser in c? I'm looking for something small that accepts a stream or string, builds a c structure, and then returns an opaque pointer to that...
8
by: Andy | last post by:
Hi, all I am trying to design a parser for C program using C++. Currently what I did for syntax tree is to design a class for each nontermials in the grammar, and use inherentance to link them....
1
by: Daniel Bass | last post by:
Using VC#.Net, I want to take a statement, that loosely follows the rules of an SQL'a "WHERE" statement, and determine whether that statement is true or false. For example: ( ( Head = 'abc')...
28
by: Marc Gravell | last post by:
In Linq, you can apparently get a meaningful body from and expression's .ToString(); random question - does anybody know if linq also includes a parser? It just seemed it might be a handy way to...
0
by: UncleRic | last post by:
Environment: Mac OS X (10.4.10) on MacBook Pro I'm a Perl Neophyte. I've downloaded the XML::Parser module and am attempting to install it in my working directory (referenced via PERL5LIB env): ...
4
by: Bartc | last post by:
"vaib" <vaibhavpanghal@gmail.comwrote in message news:26a44cc5-0f08-41fe-859b-0d27daf3ca1d@f24g2000prh.googlegroups.com... I don't know the formal approach to these things but I haven't come...
3
by: gigs | last post by:
i need to make parser to go trough node tree i have this structure <root> <bookstore pass="1"> <book> <name id="1">programming xml</name> </book> <book> <name id="2">programming C#</name>
2
by: Rex | last post by:
Hello, I am a Python programmer facing my first small XML project. I am looking to find a simple tool to take an XSD file and convert the XSD tree structure to another text format (e.g. an...
1
by: Malthe Borch | last post by:
(Note: repost from python-dev) The ``compiler.ast`` module makes parsing Python source-code and AST manipulation relatively painless and it's straight-forward to implement a transformer class. ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.