473,508 Members | 2,363 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

C parsing fun

Helo ppl!

At the job I was given the task to make a script to analyze C++ code
based on concepts my boss had. To do this I needed to represent C++
code structure in Python somehow. I read the docs for Yapps, pyparsing
and other stuff like those, then I came up with a very simple idea. I
realized that bracketed code is almost like a Python list, except I
have to replace curly brackets with squared ones and surround the
remaining stuff with quotes. This process invokes no recursion or node
objects, only pure string manipulations so I believe it's really fast.
Finally I can get the resulting list by calling eval() with the
string.

For example when I need to parse a class definition, I only need to
look for a list item containing the pattern "*class*", and the next
item will be the contents of the class as another list.

You can grab the code at:

http://kiri.csing.hu/stack/python/bloppy-0.1.zip

(test script [test.py] included)

Feb 5 '07 #1
12 2061
and the great thing is that the algorithm can be used with any
language that structures the code with brackets, like PHP and many
others.

Feb 5 '07 #2
based on concepts my boss had. To do this I needed to represent C++
code structure in Python somehow. I read the docs for Yapps, pyparsing
and other stuff like those, then I came up with a very simple idea. I
realized that bracketed code is almost like a Python list, except I
have to replace curly brackets with squared ones and surround the
remaining stuff with quotes. This process invokes no recursion or node
yes that's a nice solution
sometimes it's not enough though (won't work on code obfuscated with
macros)

anyway if you need something more sophisticated then i'd recommend
gccxml or it's python binding:

http://www.language-binding.net/pygccxml/pygccxml.html

Feb 5 '07 #3
Thx for responding, Szabolcs! I've already tried that, but couldn't
manage to get it to work. The source I tried to parse is a huge MSVC
7.1 solution containing about 38 projects, and I believe the code is
so complex that it has too many different dependencies and GCC just
can't handle them. Btw I'm not deeply familiar with C++ compilers, so
maybe it was because of compiler misconfiguration, but I really don't
know...

Szabolcs Nagy írta:
based on concepts my boss had. To do this I needed to represent C++
code structure in Python somehow. I read the docs for Yapps, pyparsing
and other stuff like those, then I came up with a very simple idea. I
realized that bracketed code is almost like a Python list, except I
have to replace curly brackets with squared ones and surround the
remaining stuff with quotes. This process invokes no recursion or node

yes that's a nice solution
sometimes it's not enough though (won't work on code obfuscated with
macros)

anyway if you need something more sophisticated then i'd recommend
gccxml or it's python binding:

http://www.language-binding.net/pygccxml/pygccxml.html
Feb 5 '07 #4
In <11**********************@v33g2000cwv.googlegroups .com>,
karoly.kiripolszky wrote:
and the great thing is that the algorithm can be used with any
language that structures the code with brackets, like PHP and many
others.
But it fails if brackets appear in comments or literal strings.

Ciao,
Marc 'BlackJack' Rintsch

Feb 5 '07 #5

Marc 'BlackJack' Rintsch írta:
In <11**********************@v33g2000cwv.googlegroups .com>,
karoly.kiripolszky wrote:
and the great thing is that the algorithm can be used with any
language that structures the code with brackets, like PHP and many
others.

But it fails if brackets appear in comments or literal strings.

Ciao,
Marc 'BlackJack' Rintsch
Feb 5 '07 #6
You're right, thank you for the comment! I will look after how to
avoid this.

Marc 'BlackJack' Rintsch írta:
In <11**********************@v33g2000cwv.googlegroups .com>,
karoly.kiripolszky wrote:
and the great thing is that the algorithm can be used with any
language that structures the code with brackets, like PHP and many
others.

But it fails if brackets appear in comments or literal strings.

Ciao,
Marc 'BlackJack' Rintsch
Feb 5 '07 #7
Károly Kiripolszky wrote:
You're right, thank you for the comment! I will look after how to
avoid this.
And after you have resolved this 'small' ;-) detail you will probably
notice, that some full functional and in wide use being parser have
still trouble with this ...

Claudio
>
Marc 'BlackJack' Rintsch írta:
>In <11**********************@v33g2000cwv.googlegroups .com>,
karoly.kiripolszky wrote:
>>and the great thing is that the algorithm can be used with any
language that structures the code with brackets, like PHP and many
others.
But it fails if brackets appear in comments or literal strings.

Ciao,
Marc 'BlackJack' Rintsch
Feb 5 '07 #8
I've found a brute-force solution. In the preprocessing phase I simply
strip out the comments (things inside comments won't appear in the
result) and replace curly brackets with these symbols: #::OPEN::# and
#::CLOSE::#. After parsing I convert them back. In fact I can disclude
commented lines from the analyzis as I only have to cope with
production code.

Claudio Grondi írta:
Károly Kiripolszky wrote:
You're right, thank you for the comment! I will look after how to
avoid this.
And after you have resolved this 'small' ;-) detail you will probably
notice, that some full functional and in wide use being parser have
still trouble with this ...

Claudio

Marc 'BlackJack' Rintsch írta:
In <11**********************@v33g2000cwv.googlegroups .com>,
karoly.kiripolszky wrote:

and the great thing is that the algorithm can be used with any
language that structures the code with brackets, like PHP and many
others.
But it fails if brackets appear in comments or literal strings.

Ciao,
Marc 'BlackJack' Rintsch
Feb 5 '07 #9
http://kiri.csing.hu/stack/python/bloppy-0.2.zip

Test data now also contains brackets in literal strings.

Claudio Grondi írta:
Károly Kiripolszky wrote:
You're right, thank you for the comment! I will look after how to
avoid this.
And after you have resolved this 'small' ;-) detail you will probably
notice, that some full functional and in wide use being parser have
still trouble with this ...

Claudio

Marc 'BlackJack' Rintsch írta:
In <11**********************@v33g2000cwv.googlegroups .com>,
karoly.kiripolszky wrote:

and the great thing is that the algorithm can be used with any
language that structures the code with brackets, like PHP and many
others.
But it fails if brackets appear in comments or literal strings.

Ciao,
Marc 'BlackJack' Rintsch
Feb 5 '07 #10
Helo again!

When I came up with this idea on how to parse C files with ease, I was
at home and I only have access to the sources in subject in the
office. So I've tried the previously posted algorithm on the actual
source today and I realized my originally example data I've ran the
test with was so simple, that with some header files the algorithm
still failed. I had to make some further changes and by now I was able
to parse 1135 header files in 5 seconds with no errors.

This may be considered as spamming, but this package is so small I
don't wan't to create a page for it on SF or Google Code. Furthermore
I want to provide people who find this topic a working solution, so
here's the latest not-so-elegant-brute-force-but-fast parser:

http://kiri.csing.hu/stack/python/bloppy-0.3.zip

On Feb 5, 1:43 pm, "karoly.kiripolszky" <karoly.kiripols...@gmail.com>
wrote:
Helo ppl!

At the job I was given the task to make a script to analyze C++ code
based on concepts my boss had. To do this I needed to represent C++
code structure in Python somehow. I read the docs for Yapps, pyparsing
and other stuff like those, then I came up with a very simple idea. I
realized that bracketed code is almost like a Python list, except I
have to replace curly brackets with squared ones and surround the
remaining stuff with quotes. This process invokes no recursion or node
objects, only pure string manipulations so I believe it's really fast.
Finally I can get the resulting list by calling eval() with the
string.

For example when I need to parse a class definition, I only need to
look for a list item containing the pattern "*class*", and the next
item will be the contents of the class as another list.

You can grab the code at:

http://kiri.csing.hu/stack/python/bloppy-0.1.zip

(test script [test.py] included)

Feb 6 '07 #11
Károly Kiripolszky <ka****************@gmail.comwrote:
I've found a brute-force solution. In the preprocessing phase I simply
strip out the comments (things inside comments won't appear in the
result) and replace curly brackets with these symbols: #::OPEN::# and
#::CLOSE::#.
This fails when the code already has the strings "#::OPEN::#" and
"#::CLOSE::" in it.

--
Roberto Bonvallet
Feb 8 '07 #12
Yes, of course. But you can still fine-tune the code for the sources
you want to parse. The C++ header files I needed to analyze contained
no such strings. I believe there are very few real-life .h files out
there containing those. In fact I chose #::OPEN::# and #::CLOSE::#
because they're more foreign to C++ like eg. ::OPEN or #OPEN would be.
I hope this makes sense. :)

Roberto Bonvallet írta:
Károly Kiripolszky <ka****************@gmail.comwrote:
I've found a brute-force solution. In the preprocessing phase I simply
strip out the comments (things inside comments won't appear in the
result) and replace curly brackets with these symbols: #::OPEN::# and
#::CLOSE::#.

This fails when the code already has the strings "#::OPEN::#" and
"#::CLOSE::" in it.

--
Roberto Bonvallet
Feb 8 '07 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
9425
by: Gerrit Holl | last post by:
Posted with permission from the author. I have some comments on this PEP, see the (coming) followup to this message. PEP: 321 Title: Date/Time Parsing and Formatting Version: $Revision: 1.3 $...
2
3916
by: Cigdem | last post by:
Hello, I am trying to parse the XML files that the user selects(XML files are on anoher OS400 system called "wkdis3"). But i am permenantly getting that error: Directory0: \\wkdis3\ROOT\home...
16
2854
by: Terry | last post by:
Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed...
0
4101
by: Pentti | last post by:
Can anyone help to understand why re-parsing occurs on a remote database (using database links), even though we are using a prepared statement on the local database: Scenario: ======== We...
9
4042
by: ankitdesai | last post by:
I would like to parse a couple of tables within an individual player's SHTML page. For example, I would like to get the "Actual Pitching Statistics" and the "Translated Pitching Statistics"...
5
4282
by: randy | last post by:
Can some point me to a good example of parsing XML using C# 2.0? Thanks
3
4359
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in...
13
4474
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...
7
2393
by: Daniel Fetchinson | last post by:
Many times a more user friendly date format is convenient than the pure date and time. For example for a date that is yesterday I would like to see "yesterday" instead of the date itself. And for...
1
4349
by: eyeore | last post by:
Hello everyone my String reverse code works but my professor wants me to use pop top push or Stack code and parsing code could you please teach me how to make this code work with pop top push or...
0
7223
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7115
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
1
7036
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7489
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5624
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
5047
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4705
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3191
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
414
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.