By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,666 Members | 1,909 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,666 IT Pros & Developers. It's quick & easy.

Choosing the right parser for parsing C headers

P: n/a
Hi,

I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)
I chose to make my UT-generator in Python 2.4. However, I am now
encountering problems in choosing the right parser for the job. I
struggle in choosing between the inappropriate, the out-of-date, the
alpha, or the too-big-for-the task...

So far I've indentified 9(!) potential candidates (Mostly taken from
the http://www.python.org/moin/LanguageParsing page) :

- Plex:
Only a lexical analyser as far as I understand. Kinda RE++, no syntax
processing
- ply:
Lex / Yacc for python! Tackle the Beast! Syntax processing looks
complex..
- Pyggy:
Lex / Yacc -styled too. More recent, but will a 0.4 version be good
enough?
- PyLR:
fast parser with core functions in C... hasn't moved since '97
- Pyparsing:
quick and easy parser... but I don't think it does more than lexical
analysis
- spark:
Here's some wood. Now build your house.
- yapps2 :
yapps2+ (I hesitate to call it yapps3):
chosen by http://www.python.org/sigs/parser-si...-standard.html.
Is the choice up-to-date?
But will it do for parsing C?
- TPG (Toy Parser Generator):
looks cool
- ANTLR (latest version from Jan 28 produces Python code) :
Seems powerful and has a lot of support, but I don't want to have to
use an exterior Java tool. Furthermore, does it let me control what
happens at each stage easily, or does it just make me a compiler?

I've omitted these: shlex, kwparsing (webpage?), PyBison, Trap
(webpage?), DParser, and SimpleParse (I don't want the extra
dependancy).

I was hoping for a quick and easy choice, but got caught in the tar pit
of Too Much Information. Parsing is a large and complex field. As an
added handicap, I'm new to the dark minefield of parsers... I've had
some experience with Lex/Yacc, and have some knowledge of parser
theory, through a course on compilators. I am thus used to EBNF-style
grammar.
I was disappointed to see that Parser-SIG has died out.
Would you have any ideas on which parser is best suited for the task?

John

Jul 18 '05 #1
Share this Question
Share on Google+
11 Replies


P: n/a
"Jean de Largentaye" <jl*********@gmail.com> writes:
Hi,

I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)


IMO, for parsing 'real-world' C header files, nothing can beat gccxml.

Thomas
Jul 18 '05 #2

P: n/a
Jean de Largentaye wrote:
I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)

I chose to make my UT-generator in Python 2.4. However, I am now
encountering problems in choosing the right parser for the job. I
struggle in choosing between the inappropriate, the out-of-date, the
alpha, or the too-big-for-the task...


why not use a real compiler?

http://www.boost.org/libs/python/pyste/
http://www.gccxml.org/HTML/Index.html

</F>

Jul 18 '05 #3

P: n/a
Hello Jean,
- ply:
Lex / Yacc for python! Tackle the Beast! Syntax processing looks

mini_c is a C compiler written using ply. You can just use it as is.
http://people.cs.uchicago.edu/~varmaa/mini_c/

HTH.
--
------------------------------------------------------------------------
Miki Tebeka <mi*********@zoran.com>
http://tebeka.bizhat.com
The only difference between children and adults is the price of the toys
Jul 18 '05 #4

P: n/a
Thomas Heller wrote:
IMO, for parsing 'real-world' C header files, nothing can beat gccxml.


no free tool, at least. if a budget is involved, I'd recommend checking
out the Edison Design Group stuff.

</F>

Jul 18 '05 #5

P: n/a
GCC-XML looks like a very interesting alternative, as Python includes
tools to parse XML.
The mini-C compiler looks like a step in the right direction for me.
I'm going to look into that.
I'm not comfortable with C++ yet, and am not sure how I'd use Pyste.

Thanks for the information guys, you've been quite helpful!

John

Jul 18 '05 #6

P: n/a
Jean de Largentaye wrote:
GCC-XML looks like a very interesting alternative, as Python includes
tools to parse XML.
The mini-C compiler looks like a step in the right direction for me.
I'm going to look into that.
I'm not comfortable with C++ yet, and am not sure how I'd use Pyste.


to clarify, Pyste is a Python tool that uses GCCXML to generate bindings; it might
not be something that you can use out of the box for your project, but it's definitely
something you should study, and perhaps borrow implementation ideas from.

</F>

Jul 18 '05 #7

P: n/a
try http://sourceforge.net/projects/pygccxml
There are a few examples and nice ( for me ) documentation.

Roman

On Tue, 8 Feb 2005 13:35:57 +0100, Fredrik Lundh <fr*****@pythonware.com> wrote:
Jean de Largentaye wrote:
GCC-XML looks like a very interesting alternative, as Python includes
tools to parse XML.
The mini-C compiler looks like a step in the right direction for me.
I'm going to look into that.
I'm not comfortable with C++ yet, and am not sure how I'd use Pyste.


to clarify, Pyste is a Python tool that uses GCCXML to generate bindings; it might
not be something that you can use out of the box for your project, but it's definitely
something you should study, and perhaps borrow implementation ideas from.

</F>
--
http://mail.python.org/mailman/listinfo/python-list

Jul 18 '05 #8

P: n/a
That looks cool Roman, however, I'm behind a Corporate Firewall, is
there any chance you could send me a cvs snapshot?

John

Jul 18 '05 #9

P: n/a
Jean de Largentaye wrote:
Hi,

I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)
I chose to make my UT-generator in Python 2.4. However, I am now
encountering problems in choosing the right parser for the job. I
struggle in choosing between the inappropriate, the out-of-date, the
alpha, or the too-big-for-the task...


Why not see if the output from a tags file generator such as ctags or
etags will do what you want.

I often find that some simpler tools do 95% of the work and it is easier
to treat the other five percent as broken-input.

try http://ctags.sourceforge.net/
- Paddy.
Jul 18 '05 #10

P: n/a
Jean, Paddy

I use "pym" to extract bits of pascal out of delphi code for documentation
purposes. You have to add some stuff to the delphi code (in your case, C
header), but these are added within comment blocks, and the interesting
thing is that you add python code(!) as a kind of dynamic markup which pym
executes while parsing the file.

In other words, you can write python code within a comment block in your
C-header to generate unit-tests into other files, and get that code
executed with pym.

Keep well
Caleb
On Tue, 08 Feb 2005 19:58:33 GMT, Paddy McCarthy <pa********@netscape.net>
wrote:
Jean de Largentaye wrote:
Hi,
I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)
I chose to make my UT-generator in Python 2.4. However, I am now
encountering problems in choosing the right parser for the job. I
struggle in choosing between the inappropriate, the out-of-date, the
alpha, or the too-big-for-the task...


Why not see if the output from a tags file generator such as ctags or
etags will do what you want.

I often find that some simpler tools do 95% of the work and it is easier
to treat the other five percent as broken-input.

try http://ctags.sourceforge.net/
- Paddy.


Jul 18 '05 #11

P: n/a

Jean de Largentaye wrote:
Hi,

I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking the broken tree" :)


I was thinking "cdecl", and googling brought up this:

http://arrowtheory.com/software/python/

Another option, which I used recently when I had to parse a whole bunch
of Oracle 'create table' scripts [with semi-structured comments which
had to be mined for additional info]: write a recursive descent parser
-- but maybe the grammar of C function declarations is too complicated
for this.

HTH,
John

Jul 18 '05 #12

This discussion thread is closed

Replies have been disabled for this discussion.