473,836 Members | 1,549 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

replace c-style comments with newlines (regexp)


I'm tryin to use regexp to replace multi-line c-style comments (like /* this /n */ ) with /n (newlines).
I tried someting like re.sub('/\*(.*)/\*' , '/n' , file)
but it doesn't work for multiple lines.

besides that I want to keep all newlines as they were in the original file,so I can still use the original linenumbers (I want to use linenumbers as a reference for later use.)
I know that that will complicate things a bit more, so this is a bit less important.

background: I'm trying to create a 'intelligent' source-code security analysis tool for c/c++ , python and php files, but filtering the comments seemsto be the biggest problem. :(

So, if you have an answer to this , please let me know how to do this!

thanks in advance,
- Alex

_______________ _______________ _______________ _______________ _____
Download de nieuwe Windows Live Messenger!
http://get.live.com/messenger/overview
Dec 21 '07 #1
3 2168
On Fri, 21 Dec 2007 00:00:47 +0000, lex __ wrote:
I'm tryin to use regexp to replace multi-line c-style comments (like /*
this /n */ ) with /n (newlines). I tried someting like
re.sub('/\*(.*)/\*' , '/n' , file) but it doesn't work for multiple
lines.

Regexes won't cross line boundaries unless you make them multiline with
re.MULTILINE.

Also, I'm no expert on regexes, but it looks to me that your regex is
greedy. I think you need the non-greedy version, which by memory (and
completely untested) is something like this:

rx = re.compile('/\*(.*?)/\*', re.MULTILINE)
Have you considered what happens when your C code includes a string
literal containing '/*'?
"Some people, when confronted with a problem, think “I know, I’ll use
regular expressions.” Now they have two problems."
-- Jamie Zawinski, in comp.lang.emacs

--
Steven.
Dec 21 '07 #2
Steven D'Aprano wrote:
On Fri, 21 Dec 2007 00:00:47 +0000, lex __ wrote:
>I'm tryin to use regexp to replace multi-line c-style comments (like /*
this /n */ ) with /n (newlines). I tried someting like
re.sub('/\*(.*)/\*' , '/n' , file) but it doesn't work for multiple
lines.
Regexes won't cross line boundaries unless you make them multiline with
re.MULTILINE.
re.MULTILINE affects the behaviour of ^ and $, the relevant flag is re.DOTALL:
Also, I'm no expert on regexes, but it looks to me that your regex is
greedy. I think you need the non-greedy version, which by memory (and
>>re.compile( "/\*(.*?)\*/", re.DOTALL).find all("/*a*/ /*b\nb*/ /*c/*c*/")
['a', 'b\nb', 'c/*c']
>>def replace(match):
.... return "\n" * match.group(1). count("\n")
....
>>re.compile(r" (/\*.*?\*/)", re.DOTALL).sub( replace, "A /*a*/ BB /*b\nb*/ CCC /*c/*c*/")
'A BB \n CCC '
Have you considered what happens when your C code includes a string
literal containing '/*'?
Indeed.

Peter
Dec 21 '07 #3
On 2007-12-21, lex __ <co*******@hotm ail.comwrote:
I'm tryin to use regexp to replace multi-line c-style comments
(like /* this /n */ ) with /n (newlines). I tried someting
like re.sub('/\*(.*)/\*' , '/n' , file) but it doesn't
work for multiple lines.

besides that I want to keep all newlines as they were in the
original file, so I can still use the original linenumbers (I
want to use linenumbers as a reference for later use.) I know
that that will complicate things a bit more, so this is a bit
less important.

background: I'm trying to create a 'intelligent' source-code
security analysis tool for c/c++ , python and php files, but
filtering the comments seems to be the biggest problem. :(

So, if you have an answer to this , please let me know how to
do this!
There are free C lexers and parsers available (e.g., gcc). I
recommend them to you. Gluing a real C parser into your Python
code might be easier than writing one. Not that it's impossible
to discover C comments with your own special-purpose, simple
parser (see Exercise 1-23 in K&R _The C Programming Language 2nd
Edition_), but it's not remotely doable with a regex.

--
Neil Cerutti
Dec 21 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
4520
by: middletree | last post by:
What's wrong with this code? strLongDesc = Replace(Replace(Replace(Replace(Trim(Request.Form("LongDesc")),"'","''"),vbC rLf,"<br>"),"<",&lt;),"<",&gt;) Background: This field is a textarea, and I needed to account for apostrophes, which I had already done, and replaced line breaks with html line breaks on my page which displays this stuff. That works fine. But then a user entered this
12
8180
by: Barnes | last post by:
Does anyone know of a good way to use the JavaScript string.replace() method in an ASP form? Here is the scenario: I have a form that cannot accept apostrophes. I want to use the replace() so that the apostrophe is automatically replace with two '' . Reason being--SQL Server does not like apostrophes being sent to database. I've tried to do this on the server side in the SQL area of the ASP page by writing a function (with some great...
6
5681
by: Danny | last post by:
I need an asp command to strip out from a string all extra punctuation such as apostrophe, comma, period, spaces dashes, etc etc and just leave the letters. Can anybody give me some ideas? Thanks
9
9163
by: Peter Row | last post by:
Hi, I know this has been asked before, but reading the threads it is still not entirely clear. Deciding which .Replace( ) to use when. Typically if I create a string in a loop I always use a StringBuilder. At present I am porting a VB6 webclass app to VB.NET and therefore I am trying to make it as efficent as possible via the new functionality of VB.NET.
4
3859
by: Cor | last post by:
Hi Newsgroup, I have given an answer in this newsgroup about a "Replace". There came an answer on that I did not understand, so I have done some tests. I got the idea that someone said, that the split method and the regex.replace method was better than the string.replace method and replace function. I did not believe that.
3
3329
by: Goran Djuranovic | last post by:
Hi all, I ran into a problem where my XMLTextReader fails on .Read() when I have "<" character in one of the attribute's values. What I am trying to do is replace illegal characters ("<", "&" , etc.) with legal stuff ("&lt;", "&amp;", etc.), before I send the XML text to a SQL Server stored procedure. Currently, I am using XMLTextReader and StringWriter to do this, here is the piece of code that fails:...
3
16933
by: TOXiC | last post by:
Hi everyone, First I say that I serched and tryed everything but I cannot figure out how I can do it. I want to open a a file (not necessary a txt) and find and replace a string. I can do it with: import fileinput, string, sys fileQuery = "Text.txt" sourceText = '''SOURCE'''
6
15597
by: JackpipE | last post by:
Here is my replace query and I need to run this on every column in my table. Right now I manually enter the column name (_LANGUAGES_SPOKEN) but this is time consuming and would like to automate this process as much as possible. Update PROFILE SET LANGUAGES_SPOKEN = replace(cast(_LANGUAGES_SPOKEN as nvarchar(255)),char(13)+char(10),':') Thanks,
5
9316
by: V S Rawat | last post by:
I was trying to use back-to-back replace functions to convert a url: str1 = str.replace("%2F","/").replace("%3F","?").replace("%3D","=").replace("%2 6","&"); It didn't replace all 4 types of strings. Then, I googled and found this suggestion of some JavaScript Tutorials, so I used replace with a regex with a global switch:
15
1913
by: =?Utf-8?B?TWlrZSAiWU9fQkVFIiBC?= | last post by:
I have a text file that contains about 8 to 10 text sequences that I need to replace. I want to search and replace all 8 to 10 text sequence anytime I run this script Here is what I have so far. Const ForReading = 1 Const ForWriting = 2
0
9818
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
10546
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10589
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10254
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7790
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6978
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5648
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4448
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4015
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.