qwweeeit wrote:
Thank you for your suggestion, but it is too complicated for me...
I decided to proceed in steps:
1. Take away all commented lines
2. Rebuild the multi-lines as single lines
ummm,
Ok all i can say is did you try this?
if not save it as a module then import it into the interperter and try
it.
This is a dead simple module to do *exactly* what you asked for :)
Like i said I have done this before so I will restate *I HAVE FAILED AT
THIS BEFORE, MANY TIMES*. Now I have a solution.
It handles stdio by default but can write to a filelike object if you
give it one.
Handles continued lines already, no need to futz around with some
solution.
Here is an example:
Py> filein = """
.... class Stripper:
.... '''python comment and whitespace stripper
.... '''
.... def __init__(self, raw):
.... ''' Store the source text & set some flags.
.... '''
.... self.raw = raw
....
.... def format(self, out=sys.stdout, comments=0,
.... spaces=1, untabify=1,eol='unix'):
.... '''Parse and send the colored source.'''
.... # Store line offsets in self.lines
.... self.lines = [0, 0]
.... pos = 0
.... # Strips the first blank line if 1
.... self.lasttoken = 1
.... self.temp = StringIO.StringIO()
.... self.spaces = spaces
.... self.comments = comments
....
.... if untabify:
.... self.raw = self.raw.expandtabs()
.... self.raw = self.raw.rstrip()+' '
.... self.out = out
.... """
Py> replacer = ReplaceParser(filein, out=sys.stdout)
Py> replacer.format()
class Stripper:
s000001
def __init__(self, raw):
s000002
self.raw = raw
def format(self, out=sys.stdout, comments=0,
spaces=1, untabify=1,eol=s000003):
s000004
# Store line offsets in self.lines
self.lines = [0, 0]
pos = 0
# Strips the first blank line if 1
self.lasttoken = 1
self.temp = StringIO.StringIO()
self.spaces = spaces
self.comments = comments
if untabify:
self.raw = self.raw.expandtabs()
self.raw = self.raw.rstrip()+s000005
self.out = out
Py> replacer.StringMap
{'s000004': "'''Parse and send the colored source.'''",
's000005': "' '",
's000001': "'''python comment and whitespace stripper :)\n '''",
's000002': "''' Store the source text & set some flags.\n '''",
's000003': "'unix'"}
You can also strip out comments with a few line.
It can easily get single comments or doubles.
add this in your __call__ function:
[snip]
self.pos = newpos
return
# kills comments
if (toktype == tokenize.COMMENT):
return
if (toktype == token.STRING):
sname = self.StringName.next()
[snip]
If you insist on writing something go ahead.
Let me know what your solution is, I am curious.
M.E.Farmer