browse: forums | FAQ
Connecting Tech Pros Worldwide

Hey there! Do you need Python help?

Get answers from our community of Python experts on BYTES! It's free.

Choosing the right parser for parsing C headers

Jean de Largentaye
Guest
 
Posts: n/a
#1: Jul 18 '05
Hi,

I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)
I chose to make my UT-generator in Python 2.4. However, I am now
encountering problems in choosing the right parser for the job. I
struggle in choosing between the inappropriate, the out-of-date, the
alpha, or the too-big-for-the task...

So far I've indentified 9(!) potential candidates (Mostly taken from
the http://www.python.org/moin/LanguageParsing page) :

- Plex:
Only a lexical analyser as far as I understand. Kinda RE++, no syntax
processing
- ply:
Lex / Yacc for python! Tackle the Beast! Syntax processing looks
complex..
- Pyggy:
Lex / Yacc -styled too. More recent, but will a 0.4 version be good
enough?
- PyLR:
fast parser with core functions in C... hasn't moved since '97
- Pyparsing:
quick and easy parser... but I don't think it does more than lexical
analysis
- spark:
Here's some wood. Now build your house.
- yapps2 :
yapps2+ (I hesitate to call it yapps3):
chosen by http://www.python.org/sigs/parser-si...-standard.html.
Is the choice up-to-date?
But will it do for parsing C?
- TPG (Toy Parser Generator):
looks cool
- ANTLR (latest version from Jan 28 produces Python code) :
Seems powerful and has a lot of support, but I don't want to have to
use an exterior Java tool. Furthermore, does it let me control what
happens at each stage easily, or does it just make me a compiler?

I've omitted these: shlex, kwparsing (webpage?), PyBison, Trap
(webpage?), DParser, and SimpleParse (I don't want the extra
dependancy).

I was hoping for a quick and easy choice, but got caught in the tar pit
of Too Much Information. Parsing is a large and complex field. As an
added handicap, I'm new to the dark minefield of parsers... I've had
some experience with Lex/Yacc, and have some knowledge of parser
theory, through a course on compilators. I am thus used to EBNF-style
grammar.
I was disappointed to see that Parser-SIG has died out.
Would you have any ideas on which parser is best suited for the task?

John




Thomas Heller
Guest
 
Posts: n/a
#2: Jul 18 '05

re: Choosing the right parser for parsing C headers


"Jean de Largentaye" <jlargentaye@gmail.com> writes:
[color=blue]
> Hi,
>
> I need to parse a subset of C (a header file), and generate some unit
> tests for the functions listed in it. I thus need to parse the code,
> then rewrite function calls with wrong parameters. What I call "shaking
> the broken tree" :)[/color]

IMO, for parsing 'real-world' C header files, nothing can beat gccxml.

Thomas
Fredrik Lundh
Guest
 
Posts: n/a
#3: Jul 18 '05

re: Choosing the right parser for parsing C headers


Jean de Largentaye wrote:
[color=blue]
> I need to parse a subset of C (a header file), and generate some unit
> tests for the functions listed in it. I thus need to parse the code,
> then rewrite function calls with wrong parameters. What I call "shaking
> the broken tree" :)
>
> I chose to make my UT-generator in Python 2.4. However, I am now
> encountering problems in choosing the right parser for the job. I
> struggle in choosing between the inappropriate, the out-of-date, the
> alpha, or the too-big-for-the task...[/color]

why not use a real compiler?

http://www.boost.org/libs/python/pyste/
http://www.gccxml.org/HTML/Index.html

</F>



Miki Tebeka
Guest
 
Posts: n/a
#4: Jul 18 '05

re: Choosing the right parser for parsing C headers


Hello Jean,
[color=blue]
> - ply:
> Lex / Yacc for python! Tackle the Beast! Syntax processing looks[/color]
mini_c is a C compiler written using ply. You can just use it as is.
http://people.cs.uchicago.edu/~varmaa/mini_c/

HTH.
--
------------------------------------------------------------------------
Miki Tebeka <miki.tebeka@zoran.com>
http://tebeka.bizhat.com
The only difference between children and adults is the price of the toys
Fredrik Lundh
Guest
 
Posts: n/a
#5: Jul 18 '05

re: Choosing the right parser for parsing C headers


Thomas Heller wrote:
[color=blue]
> IMO, for parsing 'real-world' C header files, nothing can beat gccxml.[/color]

no free tool, at least. if a budget is involved, I'd recommend checking
out the Edison Design Group stuff.

</F>



Jean de Largentaye
Guest
 
Posts: n/a
#6: Jul 18 '05

re: Choosing the right parser for parsing C headers


GCC-XML looks like a very interesting alternative, as Python includes
tools to parse XML.
The mini-C compiler looks like a step in the right direction for me.
I'm going to look into that.
I'm not comfortable with C++ yet, and am not sure how I'd use Pyste.

Thanks for the information guys, you've been quite helpful!

John

Fredrik Lundh
Guest
 
Posts: n/a
#7: Jul 18 '05

re: Choosing the right parser for parsing C headers


Jean de Largentaye wrote:
[color=blue]
> GCC-XML looks like a very interesting alternative, as Python includes
> tools to parse XML.
> The mini-C compiler looks like a step in the right direction for me.
> I'm going to look into that.
> I'm not comfortable with C++ yet, and am not sure how I'd use Pyste.[/color]

to clarify, Pyste is a Python tool that uses GCCXML to generate bindings; it might
not be something that you can use out of the box for your project, but it's definitely
something you should study, and perhaps borrow implementation ideas from.

</F>



Roman Yakovenko
Guest
 
Posts: n/a
#8: Jul 18 '05

re: Choosing the right parser for parsing C headers


try http://sourceforge.net/projects/pygccxml
There are a few examples and nice ( for me ) documentation.

Roman

On Tue, 8 Feb 2005 13:35:57 +0100, Fredrik Lundh <fredrik@pythonware.com> wrote:[color=blue]
> Jean de Largentaye wrote:
>[color=green]
> > GCC-XML looks like a very interesting alternative, as Python includes
> > tools to parse XML.
> > The mini-C compiler looks like a step in the right direction for me.
> > I'm going to look into that.
> > I'm not comfortable with C++ yet, and am not sure how I'd use Pyste.[/color]
>
> to clarify, Pyste is a Python tool that uses GCCXML to generate bindings; it might
> not be something that you can use out of the box for your project, but it's definitely
> something you should study, and perhaps borrow implementation ideas from.
>
> </F>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>[/color]
Jean de Largentaye
Guest
 
Posts: n/a
#9: Jul 18 '05

re: Choosing the right parser for parsing C headers


That looks cool Roman, however, I'm behind a Corporate Firewall, is
there any chance you could send me a cvs snapshot?

John

Paddy McCarthy
Guest
 
Posts: n/a
#10: Jul 18 '05

re: Choosing the right parser for parsing C headers


Jean de Largentaye wrote:[color=blue]
> Hi,
>
> I need to parse a subset of C (a header file), and generate some unit
> tests for the functions listed in it. I thus need to parse the code,
> then rewrite function calls with wrong parameters. What I call "shaking
> the broken tree" :)
> I chose to make my UT-generator in Python 2.4. However, I am now
> encountering problems in choosing the right parser for the job. I
> struggle in choosing between the inappropriate, the out-of-date, the
> alpha, or the too-big-for-the task...[/color]

Why not see if the output from a tags file generator such as ctags or
etags will do what you want.

I often find that some simpler tools do 95% of the work and it is easier
to treat the other five percent as broken-input.

try http://ctags.sourceforge.net/


- Paddy.
Caleb Hattingh
Guest
 
Posts: n/a
#11: Jul 18 '05

re: Choosing the right parser for parsing C headers


Jean, Paddy

I use "pym" to extract bits of pascal out of delphi code for documentation
purposes. You have to add some stuff to the delphi code (in your case, C
header), but these are added within comment blocks, and the interesting
thing is that you add python code(!) as a kind of dynamic markup which pym
executes while parsing the file.

In other words, you can write python code within a comment block in your
C-header to generate unit-tests into other files, and get that code
executed with pym.

Keep well
Caleb


On Tue, 08 Feb 2005 19:58:33 GMT, Paddy McCarthy <paddy3118x@netscape.net>
wrote:
[color=blue]
> Jean de Largentaye wrote:[color=green]
>> Hi,
>> I need to parse a subset of C (a header file), and generate some unit
>> tests for the functions listed in it. I thus need to parse the code,
>> then rewrite function calls with wrong parameters. What I call "shaking
>> the broken tree" :)
>> I chose to make my UT-generator in Python 2.4. However, I am now
>> encountering problems in choosing the right parser for the job. I
>> struggle in choosing between the inappropriate, the out-of-date, the
>> alpha, or the too-big-for-the task...[/color]
>
> Why not see if the output from a tags file generator such as ctags or
> etags will do what you want.
>
> I often find that some simpler tools do 95% of the work and it is easier
> to treat the other five percent as broken-input.
>
> try http://ctags.sourceforge.net/
>
>
> - Paddy.[/color]

John Machin
Guest
 
Posts: n/a
#12: Jul 18 '05

re: Choosing the right parser for parsing C headers



Jean de Largentaye wrote:[color=blue]
> Hi,
>
> I need to parse a subset of C (a header file), and generate some unit
> tests for the functions listed in it. I thus need to parse the code,
> then rewrite function calls with wrong parameters. What I call[/color]
"shaking[color=blue]
> the broken tree" :)[/color]

I was thinking "cdecl", and googling brought up this:

http://arrowtheory.com/software/python/

Another option, which I used recently when I had to parse a whole bunch
of Oracle 'create table' scripts [with semi-structured comments which
had to be mined for additional info]: write a recursive descent parser
-- but maybe the grammar of C function declarations is too complicated
for this.

HTH,
John

Closed Thread