473,651 Members | 2,663 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Parser to list function names in C++?

Hi,

I would like to create a simplistic parser which goes through each .h file
and finds each function prototype (or inline implementation) along with
class names and member functions.

Examples:

test.h:

void f1();
inline int f2() {return 0;}

class A
{
void f3();
}

How would I aproach this from a simple viewpoint without a steep learning
curve. I know there exist a dozen parsers which are all pretty advanced and
requires lots of background knowledge but for my simple needs I think it
might be a bit overkill.
The parser should be in C++ too since rest of the app is also C++.

Any ideas how to proceed?

-- Henrik
Dec 5 '06 #1
9 3605

Henrik Goldman wrote:
Hi,

I would like to create a simplistic parser which goes through each .h file
and finds each function prototype (or inline implementation) along with
class names and member functions.

Examples:

test.h:

void f1();
inline int f2() {return 0;}

class A
{
void f3();
}

How would I aproach this from a simple viewpoint without a steep learning
curve. I know there exist a dozen parsers which are all pretty advanced and
requires lots of background knowledge but for my simple needs I think it
might be a bit overkill.
The parser should be in C++ too since rest of the app is also C++.

Any ideas how to proceed?
A true C++ parser is alot of work.

You could take an open source program that has a parser and teach it to
do what you want.

Perhaps you can look at doxygen or gcc.
G
Dec 5 '06 #2
Henrik Goldman wrote:
I would like to create a simplistic parser which goes through
each .h file and finds each function prototype (or inline
implementation) along with class names and member
functions.
....
Any ideas how to proceed?
One approach would be to use a regular expression engine
to do the searching.

For example if I load your 'test.h' example header file
into Zeus and search for this regular expression:

[_a-z0-9]+[ &*\t]+[_a-z0-9 \t]*[_a-z0-9]+[ \t]*[(]+

it only finds these lines:

void f1();
inline int f2() {return 0;}
void f3();

Jussi Jumppanen
Zeus For Windows - "The ultimate programmer's editor/IDE"
http://www.zeusedit.com

Dec 5 '06 #3
I suggest you have a look at flex/bison, or ANTLR.

Joseph.

Dec 6 '06 #4
CTG
you first of all sit down and work out the rules:
examples:

declaration of each function has a '(' followed by a ')' and a ';'
semicolon at the end except the in case of inline one.
I dont think its hard at all.

Henrik Goldman wrote:
Hi,

I would like to create a simplistic parser which goes through each .h file
and finds each function prototype (or inline implementation) along with
class names and member functions.

Examples:

test.h:

void f1();
inline int f2() {return 0;}

class A
{
void f3();
}

How would I aproach this from a simple viewpoint without a steep learning
curve. I know there exist a dozen parsers which are all pretty advanced and
requires lots of background knowledge but for my simple needs I think it
might be a bit overkill.
The parser should be in C++ too since rest of the app is also C++.

Any ideas how to proceed?

-- Henrik
Dec 6 '06 #5
Henrik Goldman wrote:
Hi,

I would like to create a simplistic parser which goes through each .h file
and finds each function prototype (or inline implementation) along with
class names and member functions.

Examples:

test.h:

void f1();
inline int f2() {return 0;}

class A
{
void f3();
}

How would I aproach this from a simple viewpoint without a steep learning
curve. I know there exist a dozen parsers which are all pretty advanced and
requires lots of background knowledge but for my simple needs I think it
might be a bit overkill.
There are sort of two approaches I see. One is to use text pattern
matching like jussij suggests. (Though remember to also search for A-Z
and if you want to be pedantic, stuff like $ that you can also use in
identifiers but probably no one actually does. Also his won't spot
things like constructors (no return value), functions where there are
newlines in the whitespace (you can't use grep for those), operators,
and probably some other special cases.) There's a variant of this which
would use something like Flex to create a lexer, in which case you just
have to deal with whole tokens. This would might be easier if you know
at least a little Flex (or the ideas behind it) and can find the file
that GCC uses or something to do their lexing. Then again, it might
not.

The problem with that is that I'm not sure how hard it would be to get
just the lines in question. I mean, I know that jussij probably didn't
spent a lot of time working on that and could get something more to the
point with some more effort, but I suspect that it would be very
difficult to get something that works in full generality. At the same
time, if your results don't have to be perfect, this solution could be
very lightweight, even to the point of running a slightly modified
version of jussij's regex over your code with grep.

Now, as for if you want exact answers, you might have to go with one of
those parsers. I'll just give a shoutout for one that I know personally
called Elsa. It is complete and accurate enough to parse its own source
then output the source again in a form where it can be compiled and the
rebuilt version used to run the regression suite. At least, I think it
is, though I'm not quite sure how, because I'm currently fixing a
number of "pretty-printing" bugs that block correct translation of the
GCC 3.4 headers. (I'm working on a project that uses it for
source-to-source transformations .) There is one semi-show-stopping bug
in the parsing end though, which is that code containing endl or flush
confuses it. However, replacing endl with "\n" except in the definition
(I use a regex for telling apart uses and the definition; it's not
perfect either) will let things work right. (I know it's not quite
semantics preserving.) However, if you can stand to do that change,
it's quite easy to write an extension that will do what you want.
http://www.cs.berkeley.edu/~smcpeak/...lsa/semgrep.cc
has about a two and a half page long program that is "semantic grep";
you give it a variable name, and it will tell you all the places a
variable with that name is declared or used. On the other hand, if you
want to include it in another project... probably this is not the best
option. See www.cubewano.org/oink.

So pro with the parser approach is that it's very robust modulo bugs in
the implementation (in the case of Elsa, which will hopefully go away
in the fairly near future... Mozilla is eyeing the Oink project --
which now more or less includes Elsa -- for helping them), but the cons
are that it is pretty much by definition quite heavyweight. And there
are of course other options here. The other one that might be useful is
OpenC++, though I don't know much about that project. You could try to
hack the GCC front end. That's all the open-source c++ parsers I know
of.

Evan Driscoll

Dec 6 '06 #6

Henrik Goldman wrote:
Hi,

I would like to create a simplistic parser which goes through each .h file
and finds each function prototype (or inline implementation) along with
class names and member functions.

Examples:

test.h:

void f1();
inline int f2() {return 0;}

class A
{
void f3();
}

How would I aproach this from a simple viewpoint without a steep learning
curve. I know there exist a dozen parsers which are all pretty advanced and
requires lots of background knowledge but for my simple needs I think it
might be a bit overkill.
The parser should be in C++ too since rest of the app is also C++.

Any ideas how to proceed?

-- Henrik
Your tool to do this will depend on what you want to do with
the output.

As someone else mentioned, you could get the output using
doxygen. I spent a day and a half playing around with it's
options and got it to producde what you need plus a ton of
other dependency related diagrams - class dependencies,
include file dependencies, and function call dependencies.

It's very flexible. I produced html output but it can also
producde XML output which can then be processed by some
other program.

Dec 6 '06 #7
Hi,
As someone else mentioned, you could get the output using
doxygen. I spent a day and a half playing around with it's
options and got it to producde what you need plus a ton of
other dependency related diagrams - class dependencies,
include file dependencies, and function call dependencies.

It's very flexible. I produced html output but it can also
producde XML output which can then be processed by some
other program.
That actually sounds like a very useful idea. I just had a quick look and it
certainly looks interesting. It seems to give what I need but generates alot
of output so I must look into which files needs to be parsed etc.

-- Henrik
Dec 6 '06 #8
Hi Evan,

Thanks for the suggestions.

I did look into Elsa but found it rather huge for my simple needs. Basically
I am trying to create an obfuscator which just changes names of functions
and classes. Elsa can probably do alot more then just this but the time to
learn how things work far superseeds the needs for my project.

-- Henrik
Dec 6 '06 #9
CTG wrote:
you first of all sit down and work out the rules:

Please don't top-post. Your replies belong following or interspersed
with properly trimmed quotes. See the majority of other posts in the
newsgroup, or the group FAQ list:
<http://www.parashift.c om/c++-faq-lite/how-to-post.html>
examples:

declaration of each function has a '(' followed by a ')' and a ';'
semicolon at the end except the in case of inline one.
How do you distinguish that from a function call?
>
I dont think its hard at all.
That probably means you haven't thought enough.

Such prototype declarations are not required by the language.

You have to be able to handle this as well:
void f()
{
return;
}

int main()
{
f();
return 0;
}
So no semicolon and no inline keyword to help. I recommend not trying
to roll your own on this. Use one of the prefab programs mentioned
elsewhere.

Brian
Dec 6 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
6311
by: Mike | last post by:
Does anyone know of a minimal/mini/tiny/small xml parser in c? I'm looking for something small that accepts a stream or string, builds a c structure, and then returns an opaque pointer to that structure. There should then be a function to search that structure given the pointer, tag, and an optional attribute. I'm looking initially at only text data, no numbers, though eventuall there will be some binary data (CDATA?). Thanks.
1
2022
by: Patrick Gunia | last post by:
Hi, i´m trying to build a xml - parser, which should simply list all used tokens an dattributes including their values. So far, so good, this works, but now i try to check for illegal phrases in the source document regarding starttags. here is my parser so far: tokenlibrary.cpp: #include <iostream> using namespace std;
17
1830
by: Jeff Robichaud | last post by:
Hi, I would like to know if there exists a tool that one can use to test some javascript code before actually running it ? Something that would check the syntax at least ? I found that most mistakes that I make when coding in javascript are syntax-related. Here's a typical scenario: I use a function like getElementByID. Notice the big D...In Internet Explorer I get the classic "object doesn't support this property or method" message,...
3
1952
by: cr88192 | last post by:
for various reasons, I added an imo ugly hack to my xml parser. basically, I wanted the ability to have binary payload within the xml parse trees. this was partly because I came up with a binary xml format (mentioned more later), and thought it would be "useful" to be able to store binary data inline with this format, and still wanted to keep things balanced (whatever the binary version can do, the textual version can do as well). the...
4
5511
by: Tom Warren | last post by:
About once a year or so for the last 10 years, I update my street address parser and I'm starting to look at it again. This parser splits a street address line into its smallest common elements (number, trailer, pre, name, suffix, post, unit, unit id). I always start this update process by searching Google-Groups and Google-web for anything new out there, but there is never very much. Has anyone run into anything in their travels?...
6
9135
by: dam_fool_2003 | last post by:
Hai, I thank those who helped me to create a single linked list with int type. Now I wanted to try out for a void* type. Below is the code: #include<stdlib.h> #include<stdio.h> #include<string.h> #include<stddef.h> struct node
7
1414
by: Fuzzyman | last post by:
Hello all, I'm writing a module that takes user input as strings and (effectively) translates them to function calls with arguments and keyword arguments.to pass a list I use a sort of 'list constructor' - so the syntax looks a bit like : checkname(arg1, "arg 2", 'arg 3', keywarg="value", keywarg2='value2', default=list("val1", 'val2'))
5
2390
by: Little | last post by:
I have this program and I need to work on the test portion, which tests if a Val is in the list. It returns false no matter what could you look at the part and see what might need to be done to fix it. It reads in the file and sorts out the files into the four different lists. F.txt int main 2 " " help
8
718
by: dmp | last post by:
What are Linked list? Please somebody show some ready made programs of linked list
0
8275
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8802
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
8465
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8579
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7297
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
4144
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2699
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1909
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1587
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.