473,383 Members | 1,748 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

C++ Source Reverse Engineer - How to write a parser ?

Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.

Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.
Secondary can anyone recommend a good tool that currently exists to do
the job?

Thanks.

Jun 6 '07 #1
6 7356
Herby wrote:
Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.

Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.
The gcc source.

--
Ian Collins.
Jun 6 '07 #2
Herby wrote:
Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.

Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.
If you have to ask this question you should IMHO better start with a smaller
project.
Secondary can anyone recommend a good tool that currently exists to do
the job?
I don't know if its a good one, because its a bit outdated but may be worth
a try: gccxml uses the gcc frontend to parse the sources and creates a xml
output which can be easily read.

Mathias
Jun 6 '07 #3
Herby wrote:
Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.
ambitious goal, I think, but I hope you will succeed somehow :)
The problem that I see is that a human-written source code is usually
the most comprehensible expression of an algorithm that embodies all the
details about the algorithm itself. Of course you can find some sort of
compromise: for example, there are UML programs that are able to sketch
diagrams from the source files.
I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.
You are right. Especially if you decide the abstraction level that you
want to stop at, it may be much simpler.
Can anyone provide me with links etc on how one would go about writing
such a parser?
I have a link and a suggestion. The link is:
http://www.cs.berkeley.edu/~smcpeak/.../sources/elsa/
elsa is an opensource c/c++ parser, I think it's quite accurate. Of
course that means that you need to have a deep knowledge of the c/c++
syntax.

the suggestion is: look at some sourcecode of opensource UML editors.
They do a similar thing to what you are trying to do, probably you can
find some interesting hint.
No doubt i would also need a reference to the syntax rules of C++ etc.
C++ standard. There is everything, including the BNF syntax
specification of the language.
Regards,

Zeppe
Jun 6 '07 #4
On Jun 6, 10:58 am, Herby <prmarjo...@gmail.comwrote:
Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.

Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.
The classical approach (beside building one by hand) is using tools
like lex and yacc (or bison). You should read on about compiler
building (what you want to build is a compiler, if I understand you
correctly (Translating your-own-language-tm into C++)), lexing and
parsing.

If you want to stay inside c++ you can use boost::spirit, which is
similar to using yacc, but without the need to use an extra tool.

Note that spirit is a library that basically takes a modified form of
the EBNF syntax and embeds it into C++. Take a close look at how it is
implemented, because the technique used might be a better approach to
solving your problem (Just a wild guess, since I do not know what
problem you are trying to solve).

If you go the spirit route there is also boost::wave which is a full
implementation of the C++ preprocessor (in fact IIRC the only FULL
implementation of it.). Someone told me that there is also a person
who is working on a full c++ parser using spirit, but i have not yet
seen any further detail on it.

--
Fabio Fracassi
Jun 6 '07 #5

"Herby" <pr********@gmail.comwrote in message
news:11**********************@o5g2000hsb.googlegro ups.com...
Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
You don't need to write a parser to do reverse engineering.
It is probably true that to do reverse engineering, you will need a parser.

Building a C++ parser is lot harder than people who have not
done it think it is. You need a lexer, covering all the standard's dark
corner requirements.
You need a preprocessor. You need a non-standard parsing
engine because C++ isn't LALR, and yacc won't work.
You need a grammar not just for ANSI C++ but for the dialect
of C++ you actually have (Sun? GNU? Microsoft?)
If you are a realist, you'll need a symbol table telling you
where names are defined and what they are defined as, that
is scope accurate. Expect building a robust parser to take several
man-years
at a minimum; we have considerably more than that in ours
to address the above issues.
Although this has some overlap with say a compiler it would also seem
significantly different too.
Ours captures comments and most preprocessor conditionals unexpanded.
Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.
Check comp.compilers and various conferences on reverse engineering.
You won't find a lot of specific detail; you'll find tantalizing hints of
how to solve problems but that won't remove the sweat
equity required. I've been down that route.
Secondary can anyone recommend a good tool that currently exists to do
the job?
Depends on what you mean by "reverse engineering".
If what you want are all the above features packaged in a form in
which you can construct a reverse engineering tool,
then DMS may suit your needs:

http://www.semanticdesigns.com/Produ...pFrontEnd.html

If you mean "a tool that does reverse engineering", then Scientific
Toolworks may have what you want.
Thanks.
--
Ira Baxter, CTO
www.semanticdesigns.com
Jun 6 '07 #6
Guys thanks for all the interesting responses.

I have worked as a software developer for 10+ years mostly in
maintenance mode for medium to large C++ projects. Usually these
projects do not have some kind of design roadmap to guide you into
them.
I feel this is much more the reality.

At best you have some kind of source browser within your IDE, find all
references, goto definition etc.

In this time i have come up with some ideas of my own that build on
these and i would really like to try them out. So i am reversing the
source to something more abstract allowing to reason more effectively
with the source i may be about to modify.

http://www.objectmentor.com/resources/downloads.html

The about link is a script that gives some design quality metrics for
a set of header files.
Its a good start, but id like to write something proper and take the
idea much further...

Again these are some of the tools on the market -
http://en.wikipedia.org/wiki/List_of..._code_analysis

So hope this makes it clear what im trying to achieve.


Jun 7 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: M.E.Farmer | last post by:
Hello c.l.py!, I have just finished this and decided to share. PySourceColor is a module to convert Python source into colored html. Yes it has been done before, but I like this better:) You can...
0
by: Roberto Nunnari | last post by:
Hi all. I announce that there's a brand new XML parser in the Open Source arena: NunniMJAX - release 1.0 http://nunnimjax.nunnisoft.ch NunniMJAX is a Java, non validating XML parser. Its...
0
by: Roberto Nunnari | last post by:
Hi all. I announce that there's a brand new XML parser in the Open Source arena: NunniMCAX - release 1.0 http://nunnimcax.nunnisoft.ch/en/ NunniMCAX is a C, non validating XML parser. Its...
9
by: TCMA | last post by:
I am looking for some tools to help me understand source code of a program written in C++ by someone else. Are there any non-commercial, open source C or C++ tools to reverse engineer C or C++...
11
by: Matt | last post by:
I have object/machine code in static library (written and compiled using C++) that I wish to make difficult to reverse-engineer. I am told by others that some could reverse-engineer this...
15
by: Enzo | last post by:
Hi Ng, It's possible to protect the source code of a js file? With PHP? Thanks in advance! Enzo
4
by: kj | last post by:
I consider myself quite proficient in C and a few other programming languages, but I have never succeeded in understanding a largish program (such as zsh or ncurses) at the source level. ...
15
by: Fady Anwar | last post by:
Hi while browsing the net i noticed that there is sites publishing some software that claim that it can decompile .net applications i didn't bleave it in fact but after trying it i was surprised...
66
by: Jon Skeet [C# MVP] | last post by:
I'm sure the net will be buzzing with this news fairly soon, but just in case anyone hasn't seen it yet: Microsoft are going to make the source code for the .NET framework (parts of it,...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.