473,418 Members | 2,079 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,418 software developers and data experts.

Choosing the right parser for parsing C headers

Hi,

I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)
I chose to make my UT-generator in Python 2.4. However, I am now
encountering problems in choosing the right parser for the job. I
struggle in choosing between the inappropriate, the out-of-date, the
alpha, or the too-big-for-the task...

So far I've indentified 9(!) potential candidates (Mostly taken from
the http://www.python.org/moin/LanguageParsing page) :

- Plex:
Only a lexical analyser as far as I understand. Kinda RE++, no syntax
processing
- ply:
Lex / Yacc for python! Tackle the Beast! Syntax processing looks
complex..
- Pyggy:
Lex / Yacc -styled too. More recent, but will a 0.4 version be good
enough?
- PyLR:
fast parser with core functions in C... hasn't moved since '97
- Pyparsing:
quick and easy parser... but I don't think it does more than lexical
analysis
- spark:
Here's some wood. Now build your house.
- yapps2 :
yapps2+ (I hesitate to call it yapps3):
chosen by http://www.python.org/sigs/parser-si...-standard.html.
Is the choice up-to-date?
But will it do for parsing C?
- TPG (Toy Parser Generator):
looks cool
- ANTLR (latest version from Jan 28 produces Python code) :
Seems powerful and has a lot of support, but I don't want to have to
use an exterior Java tool. Furthermore, does it let me control what
happens at each stage easily, or does it just make me a compiler?

I've omitted these: shlex, kwparsing (webpage?), PyBison, Trap
(webpage?), DParser, and SimpleParse (I don't want the extra
dependancy).

I was hoping for a quick and easy choice, but got caught in the tar pit
of Too Much Information. Parsing is a large and complex field. As an
added handicap, I'm new to the dark minefield of parsers... I've had
some experience with Lex/Yacc, and have some knowledge of parser
theory, through a course on compilators. I am thus used to EBNF-style
grammar.
I was disappointed to see that Parser-SIG has died out.
Would you have any ideas on which parser is best suited for the task?

John

Jul 18 '05 #1
11 9572
"Jean de Largentaye" <jl*********@gmail.com> writes:
Hi,

I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)


IMO, for parsing 'real-world' C header files, nothing can beat gccxml.

Thomas
Jul 18 '05 #2
Jean de Largentaye wrote:
I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)

I chose to make my UT-generator in Python 2.4. However, I am now
encountering problems in choosing the right parser for the job. I
struggle in choosing between the inappropriate, the out-of-date, the
alpha, or the too-big-for-the task...


why not use a real compiler?

http://www.boost.org/libs/python/pyste/
http://www.gccxml.org/HTML/Index.html

</F>

Jul 18 '05 #3
Hello Jean,
- ply:
Lex / Yacc for python! Tackle the Beast! Syntax processing looks

mini_c is a C compiler written using ply. You can just use it as is.
http://people.cs.uchicago.edu/~varmaa/mini_c/

HTH.
--
------------------------------------------------------------------------
Miki Tebeka <mi*********@zoran.com>
http://tebeka.bizhat.com
The only difference between children and adults is the price of the toys
Jul 18 '05 #4
Thomas Heller wrote:
IMO, for parsing 'real-world' C header files, nothing can beat gccxml.


no free tool, at least. if a budget is involved, I'd recommend checking
out the Edison Design Group stuff.

</F>

Jul 18 '05 #5
GCC-XML looks like a very interesting alternative, as Python includes
tools to parse XML.
The mini-C compiler looks like a step in the right direction for me.
I'm going to look into that.
I'm not comfortable with C++ yet, and am not sure how I'd use Pyste.

Thanks for the information guys, you've been quite helpful!

John

Jul 18 '05 #6
Jean de Largentaye wrote:
GCC-XML looks like a very interesting alternative, as Python includes
tools to parse XML.
The mini-C compiler looks like a step in the right direction for me.
I'm going to look into that.
I'm not comfortable with C++ yet, and am not sure how I'd use Pyste.


to clarify, Pyste is a Python tool that uses GCCXML to generate bindings; it might
not be something that you can use out of the box for your project, but it's definitely
something you should study, and perhaps borrow implementation ideas from.

</F>

Jul 18 '05 #7
try http://sourceforge.net/projects/pygccxml
There are a few examples and nice ( for me ) documentation.

Roman

On Tue, 8 Feb 2005 13:35:57 +0100, Fredrik Lundh <fr*****@pythonware.com> wrote:
Jean de Largentaye wrote:
GCC-XML looks like a very interesting alternative, as Python includes
tools to parse XML.
The mini-C compiler looks like a step in the right direction for me.
I'm going to look into that.
I'm not comfortable with C++ yet, and am not sure how I'd use Pyste.


to clarify, Pyste is a Python tool that uses GCCXML to generate bindings; it might
not be something that you can use out of the box for your project, but it's definitely
something you should study, and perhaps borrow implementation ideas from.

</F>
--
http://mail.python.org/mailman/listinfo/python-list

Jul 18 '05 #8
That looks cool Roman, however, I'm behind a Corporate Firewall, is
there any chance you could send me a cvs snapshot?

John

Jul 18 '05 #9
Jean de Largentaye wrote:
Hi,

I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)
I chose to make my UT-generator in Python 2.4. However, I am now
encountering problems in choosing the right parser for the job. I
struggle in choosing between the inappropriate, the out-of-date, the
alpha, or the too-big-for-the task...


Why not see if the output from a tags file generator such as ctags or
etags will do what you want.

I often find that some simpler tools do 95% of the work and it is easier
to treat the other five percent as broken-input.

try http://ctags.sourceforge.net/
- Paddy.
Jul 18 '05 #10
Jean, Paddy

I use "pym" to extract bits of pascal out of delphi code for documentation
purposes. You have to add some stuff to the delphi code (in your case, C
header), but these are added within comment blocks, and the interesting
thing is that you add python code(!) as a kind of dynamic markup which pym
executes while parsing the file.

In other words, you can write python code within a comment block in your
C-header to generate unit-tests into other files, and get that code
executed with pym.

Keep well
Caleb
On Tue, 08 Feb 2005 19:58:33 GMT, Paddy McCarthy <pa********@netscape.net>
wrote:
Jean de Largentaye wrote:
Hi,
I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)
I chose to make my UT-generator in Python 2.4. However, I am now
encountering problems in choosing the right parser for the job. I
struggle in choosing between the inappropriate, the out-of-date, the
alpha, or the too-big-for-the task...


Why not see if the output from a tags file generator such as ctags or
etags will do what you want.

I often find that some simpler tools do 95% of the work and it is easier
to treat the other five percent as broken-input.

try http://ctags.sourceforge.net/
- Paddy.


Jul 18 '05 #11

Jean de Largentaye wrote:
Hi,

I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking the broken tree" :)


I was thinking "cdecl", and googling brought up this:

http://arrowtheory.com/software/python/

Another option, which I used recently when I had to parse a whole bunch
of Oracle 'create table' scripts [with semi-structured comments which
had to be mined for additional info]: write a recursive descent parser
-- but maybe the grammar of C function declarations is too complicated
for this.

HTH,
John

Jul 18 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

42
by: Fred Ma | last post by:
Hello, This is not a troll posting, and I've refrained from asking because I've seen similar threads get all nitter-nattery. But I really want to make a decision on how best to invest my time....
10
by: Paul Kooistra | last post by:
I need a tool to browse text files with a size of 10-20 Mb. These files have a fixed record length of 800 bytes (CR/LF), and containt records used to create printed pages by an external company. ...
3
by: cr88192 | last post by:
for various reasons, I added an imo ugly hack to my xml parser. basically, I wanted the ability to have binary payload within the xml parse trees. this was partly because I came up with a binary...
4
by: Greg B | last post by:
Well since getopt() doesn't seem to be compatible with Windows, and the free implementation of it for Windows that I found still had some annoying restrictions, I thought I'd whip up a simple...
28
by: Marc Gravell | last post by:
In Linq, you can apparently get a meaningful body from and expression's .ToString(); random question - does anybody know if linq also includes a parser? It just seemed it might be a handy way to...
8
by: Filipe Fernandes | last post by:
I have a project that uses a proprietary format and I've been using regex to extract information from it. I haven't hit any roadblocks yet, but I'd like to use a parsing library rather than...
4
by: fbrewster | last post by:
I'm writing an HTML parser and would like to use Internet Explorers DOM parser. Can I use Internet Explorers DOM parser through a web service? thanks for the help
0
by: Gordon Fraser | last post by:
Hi, I'm trying to parse Python code to an AST, apply some changes to the AST and then compile and run the AST, but I'm running into problems when trying to evaluate/execute the resulting code...
0
by: Orestis Markou | last post by:
Have you tried passing in empty dicts for globals and locals? I think that the defaults will be the *current* globals and locals, and then of course your namespace is broken... On Tue, Oct 7,...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.