Helo ppl!
At the job I was given the task to make a script to analyze C++ code
based on concepts my boss had. To do this I needed to represent C++
code structure in Python somehow. I read the docs for Yapps, pyparsing
and other stuff like those, then I came up with a very simple idea. I
realized that bracketed code is almost like a Python list, except I
have to replace curly brackets with squared ones and surround the
remaining stuff with quotes. This process invokes no recursion or node
objects, only pure string manipulations so I believe it's really fast.
Finally I can get the resulting list by calling eval() with the
string.
For example when I need to parse a class definition, I only need to
look for a list item containing the pattern "*class*", and the next
item will be the contents of the class as another list.
You can grab the code at: http://kiri.csing.hu/stack/python/bloppy-0.1.zip
(test script [test.py] included) 12 2061
and the great thing is that the algorithm can be used with any
language that structures the code with brackets, like PHP and many
others.
based on concepts my boss had. To do this I needed to represent C++
code structure in Python somehow. I read the docs for Yapps, pyparsing
and other stuff like those, then I came up with a very simple idea. I
realized that bracketed code is almost like a Python list, except I
have to replace curly brackets with squared ones and surround the
remaining stuff with quotes. This process invokes no recursion or node
yes that's a nice solution
sometimes it's not enough though (won't work on code obfuscated with
macros)
anyway if you need something more sophisticated then i'd recommend
gccxml or it's python binding: http://www.language-binding.net/pygccxml/pygccxml.html
Thx for responding, Szabolcs! I've already tried that, but couldn't
manage to get it to work. The source I tried to parse is a huge MSVC
7.1 solution containing about 38 projects, and I believe the code is
so complex that it has too many different dependencies and GCC just
can't handle them. Btw I'm not deeply familiar with C++ compilers, so
maybe it was because of compiler misconfiguration, but I really don't
know...
Szabolcs Nagy írta:
based on concepts my boss had. To do this I needed to represent C++
code structure in Python somehow. I read the docs for Yapps, pyparsing
and other stuff like those, then I came up with a very simple idea. I
realized that bracketed code is almost like a Python list, except I
have to replace curly brackets with squared ones and surround the
remaining stuff with quotes. This process invokes no recursion or node
yes that's a nice solution
sometimes it's not enough though (won't work on code obfuscated with
macros)
anyway if you need something more sophisticated then i'd recommend
gccxml or it's python binding:
http://www.language-binding.net/pygccxml/pygccxml.html
In <11**********************@v33g2000cwv.googlegroups .com>,
karoly.kiripolszky wrote:
and the great thing is that the algorithm can be used with any
language that structures the code with brackets, like PHP and many
others.
But it fails if brackets appear in comments or literal strings.
Ciao,
Marc 'BlackJack' Rintsch
Marc 'BlackJack' Rintsch írta:
In <11**********************@v33g2000cwv.googlegroups .com>,
karoly.kiripolszky wrote:
and the great thing is that the algorithm can be used with any
language that structures the code with brackets, like PHP and many
others.
But it fails if brackets appear in comments or literal strings.
Ciao,
Marc 'BlackJack' Rintsch
You're right, thank you for the comment! I will look after how to
avoid this.
Marc 'BlackJack' Rintsch írta:
In <11**********************@v33g2000cwv.googlegroups .com>,
karoly.kiripolszky wrote:
and the great thing is that the algorithm can be used with any
language that structures the code with brackets, like PHP and many
others.
But it fails if brackets appear in comments or literal strings.
Ciao,
Marc 'BlackJack' Rintsch
Károly Kiripolszky wrote:
You're right, thank you for the comment! I will look after how to
avoid this.
And after you have resolved this 'small' ;-) detail you will probably
notice, that some full functional and in wide use being parser have
still trouble with this ...
Claudio
>
Marc 'BlackJack' Rintsch írta:
>In <11**********************@v33g2000cwv.googlegroups .com>, karoly.kiripolszky wrote:
>>and the great thing is that the algorithm can be used with any language that structures the code with brackets, like PHP and many others.
But it fails if brackets appear in comments or literal strings.
Ciao, Marc 'BlackJack' Rintsch
I've found a brute-force solution. In the preprocessing phase I simply
strip out the comments (things inside comments won't appear in the
result) and replace curly brackets with these symbols: #::OPEN::# and
#::CLOSE::#. After parsing I convert them back. In fact I can disclude
commented lines from the analyzis as I only have to cope with
production code.
Claudio Grondi írta:
Károly Kiripolszky wrote:
You're right, thank you for the comment! I will look after how to
avoid this.
And after you have resolved this 'small' ;-) detail you will probably
notice, that some full functional and in wide use being parser have
still trouble with this ...
Claudio
Marc 'BlackJack' Rintsch írta:
In <11**********************@v33g2000cwv.googlegroups .com>,
karoly.kiripolszky wrote:
and the great thing is that the algorithm can be used with any language that structures the code with brackets, like PHP and many others.
But it fails if brackets appear in comments or literal strings.
Ciao,
Marc 'BlackJack' Rintsch
http://kiri.csing.hu/stack/python/bloppy-0.2.zip
Test data now also contains brackets in literal strings.
Claudio Grondi írta:
Károly Kiripolszky wrote:
You're right, thank you for the comment! I will look after how to
avoid this.
And after you have resolved this 'small' ;-) detail you will probably
notice, that some full functional and in wide use being parser have
still trouble with this ...
Claudio
Marc 'BlackJack' Rintsch írta:
In <11**********************@v33g2000cwv.googlegroups .com>,
karoly.kiripolszky wrote:
and the great thing is that the algorithm can be used with any language that structures the code with brackets, like PHP and many others.
But it fails if brackets appear in comments or literal strings.
Ciao,
Marc 'BlackJack' Rintsch
Helo again!
When I came up with this idea on how to parse C files with ease, I was
at home and I only have access to the sources in subject in the
office. So I've tried the previously posted algorithm on the actual
source today and I realized my originally example data I've ran the
test with was so simple, that with some header files the algorithm
still failed. I had to make some further changes and by now I was able
to parse 1135 header files in 5 seconds with no errors.
This may be considered as spamming, but this package is so small I
don't wan't to create a page for it on SF or Google Code. Furthermore
I want to provide people who find this topic a working solution, so
here's the latest not-so-elegant-brute-force-but-fast parser: http://kiri.csing.hu/stack/python/bloppy-0.3.zip
On Feb 5, 1:43 pm, "karoly.kiripolszky" <karoly.kiripols...@gmail.com>
wrote:
Helo ppl!
At the job I was given the task to make a script to analyze C++ code
based on concepts my boss had. To do this I needed to represent C++
code structure in Python somehow. I read the docs for Yapps, pyparsing
and other stuff like those, then I came up with a very simple idea. I
realized that bracketed code is almost like a Python list, except I
have to replace curly brackets with squared ones and surround the
remaining stuff with quotes. This process invokes no recursion or node
objects, only pure string manipulations so I believe it's really fast.
Finally I can get the resulting list by calling eval() with the
string.
For example when I need to parse a class definition, I only need to
look for a list item containing the pattern "*class*", and the next
item will be the contents of the class as another list.
You can grab the code at:
http://kiri.csing.hu/stack/python/bloppy-0.1.zip
(test script [test.py] included)
Károly Kiripolszky <ka****************@gmail.comwrote:
I've found a brute-force solution. In the preprocessing phase I simply
strip out the comments (things inside comments won't appear in the
result) and replace curly brackets with these symbols: #::OPEN::# and
#::CLOSE::#.
This fails when the code already has the strings "#::OPEN::#" and
"#::CLOSE::" in it.
--
Roberto Bonvallet
Yes, of course. But you can still fine-tune the code for the sources
you want to parse. The C++ header files I needed to analyze contained
no such strings. I believe there are very few real-life .h files out
there containing those. In fact I chose #::OPEN::# and #::CLOSE::#
because they're more foreign to C++ like eg. ::OPEN or #OPEN would be.
I hope this makes sense. :)
Roberto Bonvallet írta:
Károly Kiripolszky <ka****************@gmail.comwrote:
I've found a brute-force solution. In the preprocessing phase I simply
strip out the comments (things inside comments won't appear in the
result) and replace curly brackets with these symbols: #::OPEN::# and
#::CLOSE::#.
This fails when the code already has the strings "#::OPEN::#" and
"#::CLOSE::" in it.
--
Roberto Bonvallet
This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Gerrit Holl |
last post by:
Posted with permission from the author.
I have some comments on this PEP, see the (coming) followup to this message.
PEP: 321
Title: Date/Time Parsing and Formatting
Version: $Revision: 1.3 $...
|
by: Cigdem |
last post by:
Hello,
I am trying to parse the XML files that the user selects(XML files are
on anoher OS400 system called "wkdis3"). But i am permenantly getting
that error:
Directory0: \\wkdis3\ROOT\home...
|
by: Terry |
last post by:
Hi,
This is a newbie's question. I want to preload 4 images and only when
all 4 images has been loaded into browser's cache, I want to start a
slideshow() function. If images are not completed...
|
by: Pentti |
last post by:
Can anyone help to understand why re-parsing occurs on a remote database
(using database links), even though we are using a prepared statement on
the local database:
Scenario:
========
We...
|
by: ankitdesai |
last post by:
I would like to parse a couple of tables within an individual player's
SHTML page. For example, I would like to get the "Actual Pitching
Statistics" and the "Translated Pitching Statistics"...
| |
by: randy |
last post by:
Can some point me to a good example of parsing XML using C# 2.0?
Thanks
|
by: toton |
last post by:
Hi,
I have some ascii files, which are having some formatted text. I want
to read some section only from the total file.
For that what I am doing is indexing the sections (denoted by .START
in...
|
by: Chris Carlen |
last post by:
Hi:
Having completed enough serial driver code for a TMS320F2812
microcontroller to talk to a terminal, I am now trying different
approaches to command interpretation.
I have a very simple...
|
by: Daniel Fetchinson |
last post by:
Many times a more user friendly date format is convenient than the
pure date and time.
For example for a date that is yesterday I would like to see
"yesterday" instead of the date itself. And for...
|
by: eyeore |
last post by:
Hello everyone my String reverse code works but my professor wants me to use pop top push or Stack code and parsing code could you please teach me how to make this code work with pop top push or...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
| |
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
| |
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The...
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...
| |