By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,009 Members | 2,821 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,009 IT Pros & Developers. It's quick & easy.

Parsing C header files with python

P: n/a
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.

What's the easiest way breaking down this header file into a list of
functions and their argument using python? Is there something that will
parse this (Perhaps a protoize.py) ? I don't want (or understand!) a full C
parser, just this simple case.

It seems like someone should have done something like this before, but
googling for python, header file and protoize just gives me information on
compiling python. If there isn't anything I'll have a go with regexps.

The reason of parsing the header file is because I want to generate (using
python) a wrapper allow the library to be called from a different language.
I've only got to generate this wrapper once, so the python doesn't have to
be efficient.

Thanks,
Ian

--
"Thinks: I can't think of a thinks. End of thinks routine": Blue Bottle
Jul 18 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
>>>>> "Ian" == Ian McConnell <ia*@emit.demon.co.uk> writes:

Ian> I've got a header file which lists a whole load of C functions of the form
Ian> int func1(float *arr, int len, double arg1);
Ian> int func2(float **arr, float *arr2, int len, double arg1, double arg2);

Ian> It's a numerical library so all functions return an int and
Ian> accept varying combinations of float pointers, ints and
Ian> doubles.

Ian> What's the easiest way breaking down this header file into a
Ian> list of functions and their argument using python? Is there

Well, what comes immediately to mind (I might be overlooking
something) is that the function name is immediately before '(', and
arguments come after it separated by ','. Start with regexps and work
from there...
--
Ville Vainio http://tinyurl.com/2prnb
Jul 18 '05 #2

P: n/a
"Ian McConnell" <ia*@emit.demon.co.uk> wrote in message
news:87************@emit.demon.co.uk...
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.


If regexp's give you pause, try this pyparsing example. It makes heavy use
of setting results names, so that the parsed tokens can be easily retrieved
from the results as if they were named attributes.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul
------------------------
from pyparsing import *

testdata = """
int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);
"""

ident = Word(alphas, alphanums + "_")
vartype = Combine( oneOf("float double int") + Optional(Word("*")), adjacent
= False)
arglist = delimitedList( Group(vartype.setResultsName("type") +
ident.setResultsName("name")) )
functionCall = Literal("int") + ident.setResultsName("name") + \
"(" + arglist.setResultsName("args") + ")" + ";"

for fn,s,e in functionCall.scanString(testdata):
print fn.name
for a in fn.args:
print " -", a.type, a.name

------------------------
gives the following output:

func1
- float* arr
- int len
- double arg1
func2
- float** arr
- float* arr2
- int len
- double arg1
- double arg2
Jul 18 '05 #3

P: n/a
Ian McConnell <ia*@emit.demon.co.uk> wrote in message news:<87************@emit.demon.co.uk>...
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.

What's the easiest way breaking down this header file into a list of
functions and their argument using python? Is there something that will
parse this (Perhaps a protoize.py) ? I don't want (or understand!) a full C
parser, just this simple case.
<<SNIP>>
Thanks,
Ian

Would this suffice:

<CODE>
import re
import pprint
hdr=''' int func1(float *arr, int len, double arg1); int func2(float **arr, float *arr2, int len, double arg1, double arg2);

''' print hdr int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

func2args = {}
for line in hdr.split('\n'): line = [word for word in re.split(r'[\s,;()]+', line) if word]
if len(line)>2:func2args[line[1]] = line[2:]

pprint.pprint(func2args) {'func1': ['float', '*arr', 'int', 'len', 'double', 'arg1'],
'func2': ['float',
'**arr',
'float',
'*arr2',
'int',
'len',
'double',
'arg1',
'double',
'arg2']}


</CODE>
Jul 18 '05 #4

P: n/a
"Paul McGuire" <pt***@austin.rr._bogus_.com> writes:
"Ian McConnell" <ia*@emit.demon.co.uk> wrote in message
news:87************@emit.demon.co.uk...
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.


If regexp's give you pause, try this pyparsing example. It makes heavy use
of setting results names, so that the parsed tokens can be easily retrieved
from the results as if they were named attributes.

Download pyparsing at http://pyparsing.sourceforge.net.


Thanks. Your example with pyparsing was just what I was looking for. It also
copes very nicely with newlines and spacing in the header file.

Jul 18 '05 #5

P: n/a
"Ian McConnell" <ia*@emit.demon.co.uk> wrote in message
news:87************@emit.demon.co.uk...
"Paul McGuire" <pt***@austin.rr._bogus_.com> writes:
<snip>
Thanks. Your example with pyparsing was just what I was looking for. It also copes very nicely with newlines and spacing in the header file.

Ian -

It is just at this kind of one-off parsing job that I think pyparsing really
shines. I am sure that you could have accomplished this with regexp's, but
a) it would have taken at least a bit longer
b) it would have required more whitespace handline (such as function decls
that span linebreaks)
c) it would have been trickier to add other unanticipated changes (support
for other arg data types (such as char, long), embedded comments, etc.)

BTW, all it takes to make this grammar comment-immune is to add the
following statement before calling scanString():

functionCall.ignore( cStyleComment )

cStyleComment is predefined in the pyparsing module to recognize /* ... */
comments. Adding this will properly handle (i.e., skip over) definitions
like:

/*
int commentedOutFunc(float arg1, float arg2);
*/

Try that with regexp's!

-- Paul
Jul 18 '05 #6

P: n/a
Hello Ian,
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.

What's the easiest way breaking down this header file into a list of
functions and their argument using python? Is there something that will
parse this (Perhaps a protoize.py) ? I don't want (or understand!) a full C
parser, just this simple case.

There is an ANSI-C parser in ply (http://systems.cs.uchicago.edu/ply/)
which you can use.

Bye.
--
------------------------------------------------------------------------
Miki Tebeka <mi*********@zoran.com>
http://tebeka.spymac.net
The only difference between children and adults is the price of the toys
Jul 18 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.