By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,260 Members | 1,305 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,260 IT Pros & Developers. It's quick & easy.

C++ Source Reverse Engineer - How to write a parser ?

P: n/a
Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.

Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.
Secondary can anyone recommend a good tool that currently exists to do
the job?

Thanks.

Jun 6 '07 #1
Share this Question
Share on Google+
6 Replies


P: n/a
Herby wrote:
Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.

Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.
The gcc source.

--
Ian Collins.
Jun 6 '07 #2

P: n/a
Herby wrote:
Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.

Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.
If you have to ask this question you should IMHO better start with a smaller
project.
Secondary can anyone recommend a good tool that currently exists to do
the job?
I don't know if its a good one, because its a bit outdated but may be worth
a try: gccxml uses the gcc frontend to parse the sources and creates a xml
output which can be easily read.

Mathias
Jun 6 '07 #3

P: n/a
Herby wrote:
Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.
ambitious goal, I think, but I hope you will succeed somehow :)
The problem that I see is that a human-written source code is usually
the most comprehensible expression of an algorithm that embodies all the
details about the algorithm itself. Of course you can find some sort of
compromise: for example, there are UML programs that are able to sketch
diagrams from the source files.
I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.
You are right. Especially if you decide the abstraction level that you
want to stop at, it may be much simpler.
Can anyone provide me with links etc on how one would go about writing
such a parser?
I have a link and a suggestion. The link is:
http://www.cs.berkeley.edu/~smcpeak/.../sources/elsa/
elsa is an opensource c/c++ parser, I think it's quite accurate. Of
course that means that you need to have a deep knowledge of the c/c++
syntax.

the suggestion is: look at some sourcecode of opensource UML editors.
They do a similar thing to what you are trying to do, probably you can
find some interesting hint.
No doubt i would also need a reference to the syntax rules of C++ etc.
C++ standard. There is everything, including the BNF syntax
specification of the language.
Regards,

Zeppe
Jun 6 '07 #4

P: n/a
On Jun 6, 10:58 am, Herby <prmarjo...@gmail.comwrote:
Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.

Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.
The classical approach (beside building one by hand) is using tools
like lex and yacc (or bison). You should read on about compiler
building (what you want to build is a compiler, if I understand you
correctly (Translating your-own-language-tm into C++)), lexing and
parsing.

If you want to stay inside c++ you can use boost::spirit, which is
similar to using yacc, but without the need to use an extra tool.

Note that spirit is a library that basically takes a modified form of
the EBNF syntax and embeds it into C++. Take a close look at how it is
implemented, because the technique used might be a better approach to
solving your problem (Just a wild guess, since I do not know what
problem you are trying to solve).

If you go the spirit route there is also boost::wave which is a full
implementation of the C++ preprocessor (in fact IIRC the only FULL
implementation of it.). Someone told me that there is also a person
who is working on a full c++ parser using spirit, but i have not yet
seen any further detail on it.

--
Fabio Fracassi
Jun 6 '07 #5

P: n/a

"Herby" <pr********@gmail.comwrote in message
news:11**********************@o5g2000hsb.googlegro ups.com...
Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
You don't need to write a parser to do reverse engineering.
It is probably true that to do reverse engineering, you will need a parser.

Building a C++ parser is lot harder than people who have not
done it think it is. You need a lexer, covering all the standard's dark
corner requirements.
You need a preprocessor. You need a non-standard parsing
engine because C++ isn't LALR, and yacc won't work.
You need a grammar not just for ANSI C++ but for the dialect
of C++ you actually have (Sun? GNU? Microsoft?)
If you are a realist, you'll need a symbol table telling you
where names are defined and what they are defined as, that
is scope accurate. Expect building a robust parser to take several
man-years
at a minimum; we have considerably more than that in ours
to address the above issues.
Although this has some overlap with say a compiler it would also seem
significantly different too.
Ours captures comments and most preprocessor conditionals unexpanded.
Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.
Check comp.compilers and various conferences on reverse engineering.
You won't find a lot of specific detail; you'll find tantalizing hints of
how to solve problems but that won't remove the sweat
equity required. I've been down that route.
Secondary can anyone recommend a good tool that currently exists to do
the job?
Depends on what you mean by "reverse engineering".
If what you want are all the above features packaged in a form in
which you can construct a reverse engineering tool,
then DMS may suit your needs:

http://www.semanticdesigns.com/Produ...pFrontEnd.html

If you mean "a tool that does reverse engineering", then Scientific
Toolworks may have what you want.
Thanks.
--
Ira Baxter, CTO
www.semanticdesigns.com
Jun 6 '07 #6

P: n/a
Guys thanks for all the interesting responses.

I have worked as a software developer for 10+ years mostly in
maintenance mode for medium to large C++ projects. Usually these
projects do not have some kind of design roadmap to guide you into
them.
I feel this is much more the reality.

At best you have some kind of source browser within your IDE, find all
references, goto definition etc.

In this time i have come up with some ideas of my own that build on
these and i would really like to try them out. So i am reversing the
source to something more abstract allowing to reason more effectively
with the source i may be about to modify.

http://www.objectmentor.com/resources/downloads.html

The about link is a script that gives some design quality metrics for
a set of header files.
Its a good start, but id like to write something proper and take the
idea much further...

Again these are some of the tools on the market -
http://en.wikipedia.org/wiki/List_of..._code_analysis

So hope this makes it clear what im trying to achieve.


Jun 7 '07 #7

This discussion thread is closed

Replies have been disabled for this discussion.