473,395 Members | 1,937 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Parsing C header files with python

I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.

What's the easiest way breaking down this header file into a list of
functions and their argument using python? Is there something that will
parse this (Perhaps a protoize.py) ? I don't want (or understand!) a full C
parser, just this simple case.

It seems like someone should have done something like this before, but
googling for python, header file and protoize just gives me information on
compiling python. If there isn't anything I'll have a go with regexps.

The reason of parsing the header file is because I want to generate (using
python) a wrapper allow the library to be called from a different language.
I've only got to generate this wrapper once, so the python doesn't have to
be efficient.

Thanks,
Ian

--
"Thinks: I can't think of a thinks. End of thinks routine": Blue Bottle
Jul 18 '05 #1
6 8666
>>>>> "Ian" == Ian McConnell <ia*@emit.demon.co.uk> writes:

Ian> I've got a header file which lists a whole load of C functions of the form
Ian> int func1(float *arr, int len, double arg1);
Ian> int func2(float **arr, float *arr2, int len, double arg1, double arg2);

Ian> It's a numerical library so all functions return an int and
Ian> accept varying combinations of float pointers, ints and
Ian> doubles.

Ian> What's the easiest way breaking down this header file into a
Ian> list of functions and their argument using python? Is there

Well, what comes immediately to mind (I might be overlooking
something) is that the function name is immediately before '(', and
arguments come after it separated by ','. Start with regexps and work
from there...
--
Ville Vainio http://tinyurl.com/2prnb
Jul 18 '05 #2
"Ian McConnell" <ia*@emit.demon.co.uk> wrote in message
news:87************@emit.demon.co.uk...
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.


If regexp's give you pause, try this pyparsing example. It makes heavy use
of setting results names, so that the parsed tokens can be easily retrieved
from the results as if they were named attributes.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul
------------------------
from pyparsing import *

testdata = """
int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);
"""

ident = Word(alphas, alphanums + "_")
vartype = Combine( oneOf("float double int") + Optional(Word("*")), adjacent
= False)
arglist = delimitedList( Group(vartype.setResultsName("type") +
ident.setResultsName("name")) )
functionCall = Literal("int") + ident.setResultsName("name") + \
"(" + arglist.setResultsName("args") + ")" + ";"

for fn,s,e in functionCall.scanString(testdata):
print fn.name
for a in fn.args:
print " -", a.type, a.name

------------------------
gives the following output:

func1
- float* arr
- int len
- double arg1
func2
- float** arr
- float* arr2
- int len
- double arg1
- double arg2
Jul 18 '05 #3
Ian McConnell <ia*@emit.demon.co.uk> wrote in message news:<87************@emit.demon.co.uk>...
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.

What's the easiest way breaking down this header file into a list of
functions and their argument using python? Is there something that will
parse this (Perhaps a protoize.py) ? I don't want (or understand!) a full C
parser, just this simple case.
<<SNIP>>
Thanks,
Ian

Would this suffice:

<CODE>
import re
import pprint
hdr=''' int func1(float *arr, int len, double arg1); int func2(float **arr, float *arr2, int len, double arg1, double arg2);

''' print hdr int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

func2args = {}
for line in hdr.split('\n'): line = [word for word in re.split(r'[\s,;()]+', line) if word]
if len(line)>2:func2args[line[1]] = line[2:]

pprint.pprint(func2args) {'func1': ['float', '*arr', 'int', 'len', 'double', 'arg1'],
'func2': ['float',
'**arr',
'float',
'*arr2',
'int',
'len',
'double',
'arg1',
'double',
'arg2']}


</CODE>
Jul 18 '05 #4
"Paul McGuire" <pt***@austin.rr._bogus_.com> writes:
"Ian McConnell" <ia*@emit.demon.co.uk> wrote in message
news:87************@emit.demon.co.uk...
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.


If regexp's give you pause, try this pyparsing example. It makes heavy use
of setting results names, so that the parsed tokens can be easily retrieved
from the results as if they were named attributes.

Download pyparsing at http://pyparsing.sourceforge.net.


Thanks. Your example with pyparsing was just what I was looking for. It also
copes very nicely with newlines and spacing in the header file.

Jul 18 '05 #5
"Ian McConnell" <ia*@emit.demon.co.uk> wrote in message
news:87************@emit.demon.co.uk...
"Paul McGuire" <pt***@austin.rr._bogus_.com> writes:
<snip>
Thanks. Your example with pyparsing was just what I was looking for. It also copes very nicely with newlines and spacing in the header file.

Ian -

It is just at this kind of one-off parsing job that I think pyparsing really
shines. I am sure that you could have accomplished this with regexp's, but
a) it would have taken at least a bit longer
b) it would have required more whitespace handline (such as function decls
that span linebreaks)
c) it would have been trickier to add other unanticipated changes (support
for other arg data types (such as char, long), embedded comments, etc.)

BTW, all it takes to make this grammar comment-immune is to add the
following statement before calling scanString():

functionCall.ignore( cStyleComment )

cStyleComment is predefined in the pyparsing module to recognize /* ... */
comments. Adding this will properly handle (i.e., skip over) definitions
like:

/*
int commentedOutFunc(float arg1, float arg2);
*/

Try that with regexp's!

-- Paul
Jul 18 '05 #6
Hello Ian,
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.

What's the easiest way breaking down this header file into a list of
functions and their argument using python? Is there something that will
parse this (Perhaps a protoize.py) ? I don't want (or understand!) a full C
parser, just this simple case.

There is an ANSI-C parser in ply (http://systems.cs.uchicago.edu/ply/)
which you can use.

Bye.
--
------------------------------------------------------------------------
Miki Tebeka <mi*********@zoran.com>
http://tebeka.spymac.net
The only difference between children and adults is the price of the toys
Jul 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Gerrit Holl | last post by:
Posted with permission from the author. I have some comments on this PEP, see the (coming) followup to this message. PEP: 321 Title: Date/Time Parsing and Formatting Version: $Revision: 1.3 $...
4
by: Marian Jancar | last post by:
Hi, Is there a module for parsing spec files available? Marian -- -- Best Regards,
6
by: Matthew Barnes | last post by:
I'm considering submitting a patch for Python 2.4 to allow environment variable expansion in ConfigParser files. The use cases for this should be obvious. I'd like to be able to specify something...
7
by: beliavsky | last post by:
Ideally, one can use someone's C++ code by just looking at the header files (which should contain comments describing the functions in addition to function definitions), without access to the full...
2
by: Todd Moyer | last post by:
I would like to use Python to parse a *python-like* data description language. That is, it would have it's own keywords, but would have a syntax like Python. For instance: Ob1 ('A'): Ob2...
10
by: george young | last post by:
For each run of my app, I have a known set of (<100) wafer names. Names are sometimes simply integers, sometimes a short string, and sometimes a short string followed by an integer, e.g.: 5, 6,...
11
by: Jean de Largentaye | last post by:
Hi, I need to parse a subset of C (a header file), and generate some unit tests for the functions listed in it. I thus need to parse the code, then rewrite function calls with wrong parameters....
12
by: karoly.kiripolszky | last post by:
Helo ppl! At the job I was given the task to make a script to analyze C++ code based on concepts my boss had. To do this I needed to represent C++ code structure in Python somehow. I read the...
3
by: Steven Allport | last post by:
I am working on processing eml email message using the email module (python 2.5), on files exported from an Outlook PST file, to extract the composite parts of the email. In most instances this...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.