473,732 Members | 2,207 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

C++ Source Reverse Engineer - How to write a parser ?

Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.

Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.
Secondary can anyone recommend a good tool that currently exists to do
the job?

Thanks.

Jun 6 '07 #1
6 7381
Herby wrote:
Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.

Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.
The gcc source.

--
Ian Collins.
Jun 6 '07 #2
Herby wrote:
Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.

Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.
If you have to ask this question you should IMHO better start with a smaller
project.
Secondary can anyone recommend a good tool that currently exists to do
the job?
I don't know if its a good one, because its a bit outdated but may be worth
a try: gccxml uses the gcc frontend to parse the sources and creates a xml
output which can be easily read.

Mathias
Jun 6 '07 #3
Herby wrote:
Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.
ambitious goal, I think, but I hope you will succeed somehow :)
The problem that I see is that a human-written source code is usually
the most comprehensible expression of an algorithm that embodies all the
details about the algorithm itself. Of course you can find some sort of
compromise: for example, there are UML programs that are able to sketch
diagrams from the source files.
I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.
You are right. Especially if you decide the abstraction level that you
want to stop at, it may be much simpler.
Can anyone provide me with links etc on how one would go about writing
such a parser?
I have a link and a suggestion. The link is:
http://www.cs.berkeley.edu/~smcpeak/.../sources/elsa/
elsa is an opensource c/c++ parser, I think it's quite accurate. Of
course that means that you need to have a deep knowledge of the c/c++
syntax.

the suggestion is: look at some sourcecode of opensource UML editors.
They do a similar thing to what you are trying to do, probably you can
find some interesting hint.
No doubt i would also need a reference to the syntax rules of C++ etc.
C++ standard. There is everything, including the BNF syntax
specification of the language.
Regards,

Zeppe
Jun 6 '07 #4
On Jun 6, 10:58 am, Herby <prmarjo...@gma il.comwrote:
Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.

Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.
The classical approach (beside building one by hand) is using tools
like lex and yacc (or bison). You should read on about compiler
building (what you want to build is a compiler, if I understand you
correctly (Translating your-own-language-tm into C++)), lexing and
parsing.

If you want to stay inside c++ you can use boost::spirit, which is
similar to using yacc, but without the need to use an extra tool.

Note that spirit is a library that basically takes a modified form of
the EBNF syntax and embeds it into C++. Take a close look at how it is
implemented, because the technique used might be a better approach to
solving your problem (Just a wild guess, since I do not know what
problem you are trying to solve).

If you go the spirit route there is also boost::wave which is a full
implementation of the C++ preprocessor (in fact IIRC the only FULL
implementation of it.). Someone told me that there is also a person
who is working on a full c++ parser using spirit, but i have not yet
seen any further detail on it.

--
Fabio Fracassi
Jun 6 '07 #5

"Herby" <pr********@gma il.comwrote in message
news:11******** **************@ o5g2000hsb.goog legroups.com...
Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
You don't need to write a parser to do reverse engineering.
It is probably true that to do reverse engineering, you will need a parser.

Building a C++ parser is lot harder than people who have not
done it think it is. You need a lexer, covering all the standard's dark
corner requirements.
You need a preprocessor. You need a non-standard parsing
engine because C++ isn't LALR, and yacc won't work.
You need a grammar not just for ANSI C++ but for the dialect
of C++ you actually have (Sun? GNU? Microsoft?)
If you are a realist, you'll need a symbol table telling you
where names are defined and what they are defined as, that
is scope accurate. Expect building a robust parser to take several
man-years
at a minimum; we have considerably more than that in ours
to address the above issues.
Although this has some overlap with say a compiler it would also seem
significantly different too.
Ours captures comments and most preprocessor conditionals unexpanded.
Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.
Check comp.compilers and various conferences on reverse engineering.
You won't find a lot of specific detail; you'll find tantalizing hints of
how to solve problems but that won't remove the sweat
equity required. I've been down that route.
Secondary can anyone recommend a good tool that currently exists to do
the job?
Depends on what you mean by "reverse engineering".
If what you want are all the above features packaged in a form in
which you can construct a reverse engineering tool,
then DMS may suit your needs:

http://www.semanticdesigns.com/Produ...pFrontEnd.html

If you mean "a tool that does reverse engineering", then Scientific
Toolworks may have what you want.
Thanks.
--
Ira Baxter, CTO
www.semanticdesigns.com
Jun 6 '07 #6
Guys thanks for all the interesting responses.

I have worked as a software developer for 10+ years mostly in
maintenance mode for medium to large C++ projects. Usually these
projects do not have some kind of design roadmap to guide you into
them.
I feel this is much more the reality.

At best you have some kind of source browser within your IDE, find all
references, goto definition etc.

In this time i have come up with some ideas of my own that build on
these and i would really like to try them out. So i am reversing the
source to something more abstract allowing to reason more effectively
with the source i may be about to modify.

http://www.objectmentor.com/resources/downloads.html

The about link is a script that gives some design quality metrics for
a set of header files.
Its a good start, but id like to write something proper and take the
idea much further...

Again these are some of the tools on the market -
http://en.wikipedia.org/wiki/List_of..._code_analysis

So hope this makes it clear what im trying to achieve.


Jun 7 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2575
by: M.E.Farmer | last post by:
Hello c.l.py!, I have just finished this and decided to share. PySourceColor is a module to convert Python source into colored html. Yes it has been done before, but I like this better:) You can easily define your own colorscheme. example usage: # Highlight PySourceColor.py python PySourceColor.py or # Show help
0
1701
by: Roberto Nunnari | last post by:
Hi all. I announce that there's a brand new XML parser in the Open Source arena: NunniMJAX - release 1.0 http://nunnimjax.nunnisoft.ch NunniMJAX is a Java, non validating XML parser. Its APIs and functioning are very similar to SAX. That means that if you're familiar with SAX it will be straitforward to start using it.
0
1434
by: Roberto Nunnari | last post by:
Hi all. I announce that there's a brand new XML parser in the Open Source arena: NunniMCAX - release 1.0 http://nunnimcax.nunnisoft.ch/en/ NunniMCAX is a C, non validating XML parser. Its APIs and functioning are recall SAX. That means that if you're familiar with SAX it will be straitforward to start using it.
9
4996
by: TCMA | last post by:
I am looking for some tools to help me understand source code of a program written in C++ by someone else. Are there any non-commercial, open source C or C++ tools to reverse engineer C or C++ programs with source codes on linux? i.e. It parses any sized C or C++ project to help reverse engineer, document, draw UML diagram and understand it and thus maintain it better.
11
4108
by: Matt | last post by:
I have object/machine code in static library (written and compiled using C++) that I wish to make difficult to reverse-engineer. I am told by others that some could reverse-engineer this object/machine code to generate some or all of the source, and while it may not be a trivial task, it would not be impossible. Do tools, processes, or other means exist by which I could do this? I'm looking for something analogous to code obfuscation...
15
3389
by: Enzo | last post by:
Hi Ng, It's possible to protect the source code of a js file? With PHP? Thanks in advance! Enzo
4
2500
by: kj | last post by:
I consider myself quite proficient in C and a few other programming languages, but I have never succeeded in understanding a largish program (such as zsh or ncurses) at the source level. Basically, I quickly become disoriented, losing sight of the forest for the trees. What's your approach for understanding a large program at the source level? By "understanding a program" I mean more than just figuring out where to zero in to make a...
15
5084
by: Fady Anwar | last post by:
Hi while browsing the net i noticed that there is sites publishing some software that claim that it can decompile .net applications i didn't bleave it in fact but after trying it i was surprised that i could retrieve my code from my applications after i compile it so i need to know to prevent this from happening to my applications Thanx in advance
66
7459
by: Jon Skeet [C# MVP] | last post by:
I'm sure the net will be buzzing with this news fairly soon, but just in case anyone hasn't seen it yet: Microsoft are going to make the source code for the .NET framework (parts of it, including the BCL, ASP.NET and LINQ) available both for viewing and debugging into. I won't go into all the details here, as they're covered on Scott Guthrie's blog:
0
8946
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9447
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9307
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9235
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9181
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6031
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
3261
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2721
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2180
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.