471,602 Members | 1,303 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,602 software developers and data experts.

Parsing C header files with python

I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.

What's the easiest way breaking down this header file into a list of
functions and their argument using python? Is there something that will
parse this (Perhaps a protoize.py) ? I don't want (or understand!) a full C
parser, just this simple case.

It seems like someone should have done something like this before, but
googling for python, header file and protoize just gives me information on
compiling python. If there isn't anything I'll have a go with regexps.

The reason of parsing the header file is because I want to generate (using
python) a wrapper allow the library to be called from a different language.
I've only got to generate this wrapper once, so the python doesn't have to
be efficient.

Thanks,
Ian

--
"Thinks: I can't think of a thinks. End of thinks routine": Blue Bottle
Jul 18 '05 #1
6 8253
>>>>> "Ian" == Ian McConnell <ia*@emit.demon.co.uk> writes:

Ian> I've got a header file which lists a whole load of C functions of the form
Ian> int func1(float *arr, int len, double arg1);
Ian> int func2(float **arr, float *arr2, int len, double arg1, double arg2);

Ian> It's a numerical library so all functions return an int and
Ian> accept varying combinations of float pointers, ints and
Ian> doubles.

Ian> What's the easiest way breaking down this header file into a
Ian> list of functions and their argument using python? Is there

Well, what comes immediately to mind (I might be overlooking
something) is that the function name is immediately before '(', and
arguments come after it separated by ','. Start with regexps and work
from there...
--
Ville Vainio http://tinyurl.com/2prnb
Jul 18 '05 #2
"Ian McConnell" <ia*@emit.demon.co.uk> wrote in message
news:87************@emit.demon.co.uk...
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.


If regexp's give you pause, try this pyparsing example. It makes heavy use
of setting results names, so that the parsed tokens can be easily retrieved
from the results as if they were named attributes.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul
------------------------
from pyparsing import *

testdata = """
int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);
"""

ident = Word(alphas, alphanums + "_")
vartype = Combine( oneOf("float double int") + Optional(Word("*")), adjacent
= False)
arglist = delimitedList( Group(vartype.setResultsName("type") +
ident.setResultsName("name")) )
functionCall = Literal("int") + ident.setResultsName("name") + \
"(" + arglist.setResultsName("args") + ")" + ";"

for fn,s,e in functionCall.scanString(testdata):
print fn.name
for a in fn.args:
print " -", a.type, a.name

------------------------
gives the following output:

func1
- float* arr
- int len
- double arg1
func2
- float** arr
- float* arr2
- int len
- double arg1
- double arg2
Jul 18 '05 #3
Ian McConnell <ia*@emit.demon.co.uk> wrote in message news:<87************@emit.demon.co.uk>...
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.

What's the easiest way breaking down this header file into a list of
functions and their argument using python? Is there something that will
parse this (Perhaps a protoize.py) ? I don't want (or understand!) a full C
parser, just this simple case.
<<SNIP>>
Thanks,
Ian

Would this suffice:

<CODE>
import re
import pprint
hdr=''' int func1(float *arr, int len, double arg1); int func2(float **arr, float *arr2, int len, double arg1, double arg2);

''' print hdr int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

func2args = {}
for line in hdr.split('\n'): line = [word for word in re.split(r'[\s,;()]+', line) if word]
if len(line)>2:func2args[line[1]] = line[2:]

pprint.pprint(func2args) {'func1': ['float', '*arr', 'int', 'len', 'double', 'arg1'],
'func2': ['float',
'**arr',
'float',
'*arr2',
'int',
'len',
'double',
'arg1',
'double',
'arg2']}


</CODE>
Jul 18 '05 #4
"Paul McGuire" <pt***@austin.rr._bogus_.com> writes:
"Ian McConnell" <ia*@emit.demon.co.uk> wrote in message
news:87************@emit.demon.co.uk...
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.


If regexp's give you pause, try this pyparsing example. It makes heavy use
of setting results names, so that the parsed tokens can be easily retrieved
from the results as if they were named attributes.

Download pyparsing at http://pyparsing.sourceforge.net.


Thanks. Your example with pyparsing was just what I was looking for. It also
copes very nicely with newlines and spacing in the header file.

Jul 18 '05 #5
"Ian McConnell" <ia*@emit.demon.co.uk> wrote in message
news:87************@emit.demon.co.uk...
"Paul McGuire" <pt***@austin.rr._bogus_.com> writes:
<snip>
Thanks. Your example with pyparsing was just what I was looking for. It also copes very nicely with newlines and spacing in the header file.

Ian -

It is just at this kind of one-off parsing job that I think pyparsing really
shines. I am sure that you could have accomplished this with regexp's, but
a) it would have taken at least a bit longer
b) it would have required more whitespace handline (such as function decls
that span linebreaks)
c) it would have been trickier to add other unanticipated changes (support
for other arg data types (such as char, long), embedded comments, etc.)

BTW, all it takes to make this grammar comment-immune is to add the
following statement before calling scanString():

functionCall.ignore( cStyleComment )

cStyleComment is predefined in the pyparsing module to recognize /* ... */
comments. Adding this will properly handle (i.e., skip over) definitions
like:

/*
int commentedOutFunc(float arg1, float arg2);
*/

Try that with regexp's!

-- Paul
Jul 18 '05 #6
Hello Ian,
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.

What's the easiest way breaking down this header file into a list of
functions and their argument using python? Is there something that will
parse this (Perhaps a protoize.py) ? I don't want (or understand!) a full C
parser, just this simple case.

There is an ANSI-C parser in ply (http://systems.cs.uchicago.edu/ply/)
which you can use.

Bye.
--
------------------------------------------------------------------------
Miki Tebeka <mi*********@zoran.com>
http://tebeka.spymac.net
The only difference between children and adults is the price of the toys
Jul 18 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

8 posts views Thread by Gerrit Holl | last post: by
4 posts views Thread by Marian Jancar | last post: by
6 posts views Thread by Matthew Barnes | last post: by
7 posts views Thread by beliavsky | last post: by
2 posts views Thread by Todd Moyer | last post: by
10 posts views Thread by george young | last post: by
11 posts views Thread by Jean de Largentaye | last post: by
12 posts views Thread by karoly.kiripolszky | last post: by
1 post views Thread by XIAOLAOHU | last post: by
reply views Thread by MichaelMortimer | last post: by
reply views Thread by CCCYYYY | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.