473,382 Members | 1,357 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

how to remove c++ comments from a cpp file?

I only want to remove the comments which begin with "//".
I did like this, but it doesn't work.

r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE)
f=file.open("mycpp.cpp","r")
f=unicode(f,"utf8")
r.sub(ur"",f)

Will somebody show me the right way?
Thanks~~

Jan 26 '07 #1
9 5637
Frank Potter wrote:
I only want to remove the comments which begin with "//".
I did like this, but it doesn't work.

r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE)
f=file.open("mycpp.cpp","r")
f=unicode(f,"utf8")
r.sub(ur"",f)

Will somebody show me the right way?
Thanks~~

If you expect help with a problem, it would be nice if you told us what
the problem is. What error did you get?

But even without that I see lots of errors:

You must import re before you use it:
import re

Open a file with open((..) not file.open(...).

Once you open the file you must *read* the contents and operate on that:
data = f.read()

Then you ought to close the file:
f.close()

Now you can do your sub on the string in data -- but note, THIS WON'T
CHANGE data, but rather returns a new string which you must assign to
something:

new_data = r.sub(ur"", data)

Then do something with the new string.

Also I fear your regular expression is incorrect.

Cheers,
Gary Herron

Jan 26 '07 #2

On Jan 26, 5:08 pm, Gary Herron <gher...@islandtraining.comwrote:
Frank Potter wrote:
I only want to remove the comments which begin with "//".
I did like this, but it doesn't work.
r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE)
f=file.open("mycpp.cpp","r")
f=unicode(f,"utf8")
r.sub(ur"",f)
Will somebody show me the right way?
Thanks~~If you expect help with a problem, it would be nice if you told us what
the problem is. What error did you get?

But even without that I see lots of errors:

You must import re before you use it:
import re

Open a file with open((..) not file.open(...).

Once you open the file you must *read* the contents and operate on that:
data = f.read()

Then you ought to close the file:
f.close()

Now you can do your sub on the string in data -- but note, THIS WON'T
CHANGE data, but rather returns a new string which you must assign to
something:

new_data = r.sub(ur"", data)

Then do something with the new string.

Also I fear your regular expression is incorrect.

Cheers,
Gary Herron
Thank you.
I'm very sorry because I was in a hurry when I post this thread.
I'll post again my code here:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. f=open("show_btchina.user.js","r").read()
  4. f=unicode(f,"utf8")
  5.  
  6. r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE)
  7. f_new=r.sub(ur"",f)
  8.  
  9. open("modified.js","w").write(f_new.encode("utf8"))
  10.  
And, the problem is, it seems that only the last comment is removed.
How can I remove all of the comments, please?

Jan 26 '07 #3
At Friday 26/1/2007 06:54, Frank Potter wrote:
>
Expand|Select|Wrap|Line Numbers
  1. import re
  2. f=open("show_btchina.user.js","r").read()
  3. f=unicode(f,"utf8")
  4. r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE)
  5. f_new=r.sub(ur"",f)
  6. open("modified.js","w").write(f_new.encode("utf8"))

And, the problem is, it seems that only the last comment is removed.
How can I remove all of the comments, please?
Note that it's not as easy as simply deleting from // to end of line,
because those characters might be inside a string literal. But if you
can afford the risk, this is a simple way without re:

f = open("show_btchina.user.js","r")
modf = open("modified.js","w")
for line in f:
uline=unicode(line,"utf8")
idx = uline.find("//")
if idx==0:
continue
elif idx>0:
uline = uline[:idx]+'\n'
modf.write(uline.encode("utf8"))
modf.close()
f.close()
--
Gabriel Genellina
Softlab SRL


__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Jan 26 '07 #4
Thank you!

On Jan 26, 6:34 pm, Gabriel Genellina <gagsl...@yahoo.com.arwrote:
At Friday 26/1/2007 06:54, Frank Potter wrote:
Expand|Select|Wrap|Line Numbers
  1. import re
Expand|Select|Wrap|Line Numbers
  1.         
  2.                 f=open("show_btchina.user.js","r").read()
  3. f=unicode(f,"utf8")
  •  
  •         
  •                 r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE)
  • f_new=r.sub(ur"",f)
  •  
  •         
  •                 open("modified.js","w").write(f_new.encode("utf8"))
  •  
  •  
  • And, the problem is, it seems that only the last comment is removed.
    How can I remove all of the comments, please?Note that it's not as easy as simply deleting from // to end of line,
    because those characters might be inside a string literal. But if you
    can afford the risk, this is a simple way without re:

    f = open("show_btchina.user.js","r")
    modf = open("modified.js","w")
    for line in f:
    uline=unicode(line,"utf8")
    idx = uline.find("//")
    if idx==0:
    continue
    elif idx>0:
    uline = uline[:idx]+'\n'
    modf.write(uline.encode("utf8"))
    modf.close()
    f.close()

    --
    Gabriel Genellina
    Softlab SRL

    __________________________________________________
    Preguntá. Respondé. Descubrí.
    Todo lo que querías saber, y lo que ni imaginabas,
    está en Yahoo! Respuestas (Beta).
    ¡Probalo ya!http://www.yahoo.com.ar/respuestas
    Jan 26 '07 #5
    And using the codecs module

    Expand|Select|Wrap|Line Numbers
    1. import codecs
    2.  
    3. f = codecs.open("show_btchina.user.js","r","utf-8")
    4. modf = codecs.open("modified.js","w","utf-8")
    5. for line in f:
    6. idx = line.find(u"//")
    7. if idx==0:
    8. continue
    9. elif idx>0:
    10. line = line[:idx]+u'\n'
    11. modf.write(line)
    12. modf.close()
    13. f.close()
    14.  
    Gabriel Genellina wrote:
    At Friday 26/1/2007 06:54, Frank Potter wrote:
    >>
    Expand|Select|Wrap|Line Numbers
    1. import re
    2. f=open("show_btchina.user.js","r").read()
    3. f=unicode(f,"utf8")
    4. r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE)
    5. f_new=r.sub(ur"",f)
    6. open("modified.js","w").write(f_new.encode("utf8"))

    And, the problem is, it seems that only the last comment is removed.
    How can I remove all of the comments, please?

    Note that it's not as easy as simply deleting from // to end of line,
    because those characters might be inside a string literal. But if you
    can afford the risk, this is a simple way without re:

    f = open("show_btchina.user.js","r")
    modf = open("modified.js","w")
    for line in f:
    uline=unicode(line,"utf8")
    idx = uline.find("//")
    if idx==0:
    continue
    elif idx>0:
    uline = uline[:idx]+'\n'
    modf.write(uline.encode("utf8"))
    modf.close()
    f.close()

    Jan 26 '07 #6
    Laurent Rahuel wrote:
    And using the codecs module
    Why would you de/encode at all?

    Peter
    Jan 26 '07 #7
    On Jan 26, 3:54 am, "Frank Potter" <could....@gmail.comwrote:
    >
    I'm very sorry because I was in a hurry when I post this thread.
    I'll post again my code here:
    Expand|Select|Wrap|Line Numbers
    1. import re
    2. f=open("show_btchina.user.js","r").read()
    3. f=unicode(f,"utf8")
    4. r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE)
    5. f_new=r.sub(ur"",f)
    6. open("modified.js","w").write(f_new.encode("utf8"))
    7.  
    Here's a pyparsing version that will stay clear of '//' inside quoted
    strings. (untested)

    -- Paul
    from pyparsing import javaStyleComment, dblQuotedString

    f=open("show_btchina.user.js","r").read()
    f=unicode(f,"utf8")

    commentFilter = Suppress( javaStyleComment ).ignore( dblQuotedString )
    f_new= commentFilter.transformString(f)

    open("modified.js","w").write(f_new.encode("utf8") )

    Jan 26 '07 #8
    "Peter Otten" <__*******@web.deescribió en el mensaje
    news:ep*************@news.t-online.com...
    Laurent Rahuel wrote:
    >And using the codecs module

    Why would you de/encode at all?
    I'd say the otherwise: why not? This is the recommended practice: decode
    inputs as soon as possible, work on Unicode, encode only when you write the
    output.
    In this particular case, it's not necesary and you get the same results,
    only because these two conditions are met:

    - the encoding used is utf-8
    - we're looking for '//', and no unicode character contains '/' in its
    representation using that encoding apart from '/' itself

    Looking for the byte sequence '//' into data encoded with a different
    encoding (like utf-16 or ucs-2) could give false positives. And looking for
    other things (like '¡¡') on utf-8 could give false positives too.
    The same applies if one wants to skip string literals looking for '"' and
    '\\"'.
    Anyway for a toy script like this, perhaps it does not make any sense at
    all - but one should be aware of the potential problems.

    --
    Gabriel Genellina
    Jan 27 '07 #9
    Frank Potter wrote:
    r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE)
    f_new=r.sub(ur"",f)
    From the documentation:

    re.MULTILINE
    When specified [...] the pattern character "$" matches at the
    end of the string and at the end of each line (immediately
    preceding each newline). By default [...] "$" matches only at
    the end of the string.

    re.DOTALL
    [...] without this flag, "." will match anything except a newline.

    So a simple solution to your problem would be:

    r = re.compile("//.*")
    f_new = r.sub("", f)
    Toby
    Jan 27 '07 #10

    This thread has been closed and replies have been disabled. Please start a new discussion.

    Similar topics

    9
    by: Ken | last post by:
    I am trying to create one image using JavaScript; then later in the script remove the image - not just remove the src. The following creates the image, but I have been unable to remove it. How...
    11
    by: James Hu | last post by:
    This program is long. I don't really want to bore everyone with the details, but it handles wierd cases like: /\ * this is a comment *\ / #define FOO ??/* this is not a comment */ char...
    7
    by: William Stacey [MVP] | last post by:
    For a given *.cs, is there a quick way to remove all "///" and "//" comment lines? TIA. -- William Stacey, MVP
    100
    by: jacob navia | last post by:
    Recently, a heated debate started because of poor mr heathfield was unable to compile a program with // comments. Here is a utility for him, so that he can (at last) compile my programs :-) ...
    6
    by: sam_cit | last post by:
    Hi Everyone, I'm using remove() function to delete a file, and i observed the following behavior, Concerned file : sample.txt Operation : i open the file in read mode and don't close the...
    1
    by: michael8 | last post by:
    The problem was to write a program that reads a .cpp file containing a C++ program and produces a file with all comments stripped from the program. I finally got the answer. Here it is if...
    5
    by: howa | last post by:
    will performance increae if I removed comments & space from source code using php -w ...? given that i don't need to modify the source code, & don't use any cache?
    1
    by: Andrus | last post by:
    I need to remove all comments ( between <!-- and --tags) from XML string. I tried the following code but comments are still present. Or is it better to open xml string with a StreamReader, read...
    61
    by: arnuld | last post by:
    I have created a program which creates and renames files. I have described everything in comments. All I have is the cod-duplication. function like fopen, sprint and fwrite are being called again...
    1
    by: CloudSolutions | last post by:
    Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
    0
    by: Faith0G | last post by:
    I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
    0
    by: ryjfgjl | last post by:
    In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
    0
    by: taylorcarr | last post by:
    A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
    0
    by: aa123db | last post by:
    Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
    0
    by: ryjfgjl | last post by:
    If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
    1
    by: nemocccc | last post by:
    hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
    1
    by: Sonnysonu | last post by:
    This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
    0
    by: Hystou | last post by:
    There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

    By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

    To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.