Hi Folks,
I'm trying to strip C/C++ style comments (/* ... */ or // ) from
source code using Python regexps.
If I don't have to worry about comments embedded in strings, it seems
pretty straightforward (this is what I'm using now):
cpp_pat = re.compile(r"""
/\* .*? \*/ | # C comments
// [^\n\r]* # C++ comments
""",re.S|re .X)
s = file('myprog.cp p').read()
cpp_pat.sub(' ',s)
However, the sticking point is dealing with tokens like /* embedded
within a string:
const char *mystr = "This is /*trouble*/";
I've inherited a working Perl script, which I'd like to reimplement in
Python so that I don't have to spawn a new Perl process in my Python
program each time I want to strip comments from a file. The Perl script
looks like this:
#!/usr/bin/perl -w
$/ = undef; # no line delimiter
$_ = <>; # read entire file
s! ((['"]) (?: \\. | .)*? \2) | # skip quoted strings
/\* .*? \*/ | # delete C comments
// [^\n\r]* # delete C++ comments
! $1 || ' ' # change comments to a single space
!xseg; # ignore white space, treat as single line
# evaluate result, repeat globally
print;
The Perl regexp above uses some sort of conditional to deal with this,
by replacing a quoted string with itself if the initial match is a
quoted string. Is there some equivalent feature in Python regexps?
Lorin 4 4670
> Is there some equivalent feature in Python regexps?
cpp_pat = re.compile('(/\*.*?\*/)|(".*?")', re.S)
def subfunc(match):
if match.group(2):
return match.group(2)
else:
return ''
stripped_c_code = cpp_pat.sub(sub func, c_code)
....I suppose this is what the Perl code might do, but I'm not sure,
since trying to read it hurts my brain...
#------------------------------------------------------------------------
import re, sys
def q(c):
"""Returns a regular expression that matches a region delimited by c,
inside which c may be escaped with a backslash"""
return r"%s(\\.|[^%s])*%s" % (c, c, c)
single_quoted_s tring = q('"')
double_quoted_s tring = q("'")
c_comment = r"/\*.*?\*/"
cxx_comment = r"//[^\n]*[\n]"
rx = re.compile("|". join([single_quoted_s tring, double_quoted_s tring,
c_comment, cxx_comment]), re.DOTALL)
def replace(x):
x = x.group(0)
if x.startswith("/"): return ' '
return x
result = rx.sub(replace, sys.stdin.read( ))
sys.stdout.writ e(result)
#------------------------------------------------------------------------
The regular expression matches ""-strings, ''-character-constants,
c-comments, and c++-comments. The replace function returns ' ' (space)
when the matched thing was a comment, or the original thing otherwise.
Depending on your use for this code, replace() should return as many
'\n's as are in the matched thing, or ' ' otherwise, so that line
numbers remain unchanged.
Basically, the regular expression is a tokenizer, and replace() chooses
what to do with each recognized token. Things not recognized as tokens
by the regular expression are left unchanged.
Jeff
PS this is the test file I used:
/* ... */ xyzzy;
456 // 123
const char *mystr = "This is /*trouble*/";
/* * */
/* /* */
// /* /* */
/* // /* */
/*
* */
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
iD8DBQFC57hHJd0 1MZaTXX0RAsE4AK CAmR8fPkU6BNofA ZQhn1X9qdWNMQCg n+8c
ex2GXeRAF+P2d3H JuRDs6zo=
=J5YT
-----END PGP SIGNATURE-----
> Is there some equivalent feature in Python regexps?
cpp_pat = re.compile('(/\*.*?\*/)|(".*?")', re.S)
def subfunc(match):
if match.group(2):
return match.group(2)
else:
return ''
stripped_c_code = cpp_pat.sub(sub func, c_code)
....I suppose this is what the Perl code might do, but I'm not sure,
since trying to read it hurts my brain...
Neat! I didn't realize that re.sub could take a function as an
argument. Thanks.
Lorin This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Margaret MacDonald |
last post by:
I've been going mad trying to figure out how to do this--it should be
easy!
Allow the user to enter '\_sometext\_', i.e., literal backslash,
underscore, some text, literal backslash, underscore and, after
submitting via POST to a preg_replace filter, get back
'_sometext_' (i.e., the same thing with the literal backslashes
stripped)
|
by: Steveo |
last post by:
I am currently stripping HTML from a string with the following code.
(I know it's not the best way to strip HTML but bear with me)
re.compile("<.*?>")
I wanted to allow all H1 and H2 tags so i changed it to:
re.compile("<*?>")
This seemed to work but it also allowed the HTML tag(basically anythin
|
by: Andy Jefferies |
last post by:
I'm having problems stripping out the whitespace at the beginning of a
particular element. In the XML snippet I've highlighted tabs and returns
as ^I and ^M respectively:
<para> ^I ^I ^I^M
Some text with occasional highlighting. Some text with occasional^M
highlighting. Some text with occasional <high>highlighting</high>.^M
Some text with occasional highlighting. Some <high>text</high> with^M
occasional highlighting.</para>^M
|
by: Patrick |
last post by:
Hello, after learning that I was taking a class in VB.NET, I have been
drafted to solve all my companies VB/scripting problems - hey, I should know
everything; I've already taken 6 classes ;) I should have been quiet about
it, but then I would never be reimbursed. Oh well.
I have been asked to write a program to ping a NetBIOS name, get the IP, and
compare the 3rd octet to a list to get the computer's location. So far, I
can ping the IP,...
|
by: Raj |
last post by:
Hi
I was hoping someone could suggest a simple way of stripping non-numeric
data from a string of numbers.
For example, if I have "ADB12458789\n"
I would like to remove the letters and the newline from this string.
I am new to C so am sure this is simple ut I don't know how to do it! Sorry!
| |
by: Lu |
last post by:
Hi, i am currently working on ASP.Net v1.0 and is encountering the following
problem. In javascript, I'm passing in:
"somepage.aspx?QSParameter=<RowID>Chèques</RowID>" as part of the query
string. However, in the code behind when I tried to get the query string
value by calling Request.QueryString("QSParameter"), the value I got is:
"<RowID>Chques</RowID>". The special character "è" has been stripped out.
The web.config file is...
|
by: David Sawyer |
last post by:
I am trying to read in an HTML file and strip out the HTML
code so that all I have left is the text of the body.
Does anyone have any suggestions for doing this?
Any HTML stripping routines or objects that perform the
function?
|
by: Spondishy |
last post by:
Hi,
I'm looking for help with a regular expression and c#.
I want to remove all tags from a piece of html except the following.
<a>
<b>
<h1>
<h2>
|
by: Benway |
last post by:
Hey all,
I have a file name like Eng-Cat-01-01-01.txt. I need to do a loop that
starts stripping the letters from the front of this file name (which
I'll store as a variable) until it reaches the "Cat" part. So I would
have a variable "Cat-01-01-01.txt" that I can use to build up another
string.
Trouble is, I'm lost. Can't figure out how to do this in VB.net. Could
anyone point me in the right direction?
|
by: FFMG |
last post by:
Hi,
I have a form that allows users to comment, add entries and so on.
But what a lot of them do is copy and paste directly from MS Word to my
forms.
almost all browsers will accept the post and give the impression that
everything is saved properly.
But, that is not the case when it comes time to displaying the message
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |