469,929 Members | 1,800 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,929 developers. It's quick & easy.

requestion regarding regular expression

Hello,

I'm trying to analyze some autolisp code with python. In the file to
be analyzed there are many functions. Each function begins with a
"defun" statement. And before that, there may or may not have comment
line(s), which begins with ";". My goal is to export each function
into separate files, with comments, if there is any. Below is the code
that I'm struggling with:

Expand|Select|Wrap|Line Numbers
  1.  
  2. path = "C:\\AutoCAD\\LSP\\Sub.lsp"
  3. string = file(path, 'r').read()
  4.  
  5. import re
  6. pat = "\\;+.+\\n\\(DEFUN"
  7. p = re.compile(pat,re.I)
  8.  
  9. iterator = p.finditer(string)
  10. spans = [match.span() for match in iterator]
  11.  
  12. for i in range(min(15, len(spans))):
  13. print string[spans[i][0]:spans[i][1]]
  14.  
  15.  
The code above runs fine. But it only takes care of the situation in
which there is exactly one comment line above the "defun" statement.
How do I repeat the sub-pattern "\\;+.+\\n" here?
For example if I want to repeat this pattern 0 to 10 times, I know
"\\;+.+\\n{0:10}\\(DEFUN" does not work. But don't know where to put
"{0:10}". As a work around, I tried to use
pat = "|".join(["\\;+.+\\n"*i+ "\\(DEFUN" for i in range(11)]), and it
turned out to be very slow. Any help?

Thank you.

Kelie

Apr 14 '06 #1
8 1094
Kelie wrote:
Hello,

I'm trying to analyze some autolisp code with python. In the file to
be analyzed there are many functions. Each function begins with a
"defun" statement. And before that, there may or may not have comment
line(s), which begins with ";". My goal is to export each function
into separate files, with comments, if there is any. Below is the code
that I'm struggling with:

Expand|Select|Wrap|Line Numbers
  1.  path = "C:\\AutoCAD\\LSP\\Sub.lsp"
  2.  string = file(path, 'r').read()
  3.  import re
  4.  pat = "\\;+.+\\n\\(DEFUN"
  5.  p = re.compile(pat,re.I)
  6.  iterator = p.finditer(string)
  7.  spans = [match.span() for match in iterator]
  8.  for i in range(min(15, len(spans))):
  9.      print string[spans[i][0]:spans[i][1]]
  10.  

The code above runs fine. But it only takes care of the situation in
which there is exactly one comment line above the "defun" statement.


ISTM you don't need regex here, a simple line processor will work.
Something like this (untested):

path = "C:\\AutoCAD\\LSP\\Sub.lsp"
lines = open(path).readlines()

# Find the starts of all the functions
starts = [i for i, line in enumerate(lines) if line.startswith('(DEFUN')]

# Check for leading comments
for i, start in starts:
while start > 0 and lines[start-1].startswith(';'):
starts[i] = start = start-1

# Now starts should be a list of line numbers for the start of each function

Kent
Apr 14 '06 #2
Kent,

Running

path = "d:/emacs files/emacsinit.txt"
lines = open(path).readlines()
# my defun lines are lowercase,
# next two lines are all on one
starts = [i for i, line in enumerate(lines) if
line.startswith('(defun')]
for i, start in starts:
while start > 0 and lines[start-1].startswith(';'):
starts[i] = start = start-1
print starts

I get

File "D:\Python\findlines.py", line 7, in __main__
for i, start in starts:
TypeError: unpack non-sequence

Also, I don't understand the "i for i", but I don't understand a lot of
things yet :)

thanks,

rick

Apr 14 '06 #3
Em Sex, 2006-04-14 *s 07:47 -0700, BartlebyScrivener escreveu:
starts = [i for i, line in enumerate(lines) if
line.startswith('(defun')]
This line makes a list of integers. enumerate gives you a generator that
yields tuples consisting of (integer, object), and by "i for i, line"
you unpack the tuple into "(i, line)" and pick just "i".
for i, start in starts:


Here you try to unpack the elements of the list "starts" into "(i,
start)", but as we saw above the list contains just "i", so an exception
is raised.

I don't know what you want, but...

starts = [i, line for i, line in enumerate(lines) if
line.startswith('(defun')]

or

starts = [x for x in enumerate(lines) if x[1].startswith('(defun')]

....may (or may not) solve your problem.

--
Felipe.

Apr 14 '06 #4
BartlebyScrivener wrote:
Kent,

Running

path = "d:/emacs files/emacsinit.txt"
lines = open(path).readlines()
# my defun lines are lowercase,
# next two lines are all on one
starts = [i for i, line in enumerate(lines) if
line.startswith('(defun')]
for i, start in starts:
while start > 0 and lines[start-1].startswith(';'):
starts[i] = start = start-1
print starts

I get

File "D:\Python\findlines.py", line 7, in __main__
for i, start in starts:
TypeError: unpack non-sequence


Sorry, should be
for i, start in enumerate(starts):

start is a specific start line, i is the index of that start line in the
starts array (so the array can be modified in place).

Kent
Apr 14 '06 #5
That's it. Thank you! Very instructive.

Final:

path = "d:/emacs files/emacsinit.txt"
lines = open(path).readlines()
# next two lines all on one
starts = [i for i, line in enumerate(lines) if
line.startswith('(defun')]
for i, start in enumerate(starts):
while start > 0 and lines[start-1].startswith(';'):
starts[i] = start = start-1
print starts

Apr 14 '06 #6
BartlebyScrivener wrote:
That's it. Thank you! Very instructive.

Final:

path = "d:/emacs files/emacsinit.txt"
lines = open(path).readlines()
# next two lines all on one
starts = [i for i, line in enumerate(lines) if
line.startswith('(defun')]
for i, start in enumerate(starts):
while start > 0 and lines[start-1].startswith(';'):
starts[i] = start = start-1
print starts

If you don't want to hold the whole file in memory, this gets the
starts a result at a time:

def starts(source):
prelude = None
for number, line in enumerate(source): # read and number a line
if line[0] == ';':
if prelude is None:
prelude = number # Start of commented region
# else: this line just extends previous prelude
else:
if line.startswith('(defun'):
# You could append to a result here, but yield lets
# the first found one get out straightaway.
if prelude is None:
yield number
else:
yield prelude
prelude = None
path = "d:/emacs files/emacsinit.txt"
source = open(path)
try:
for line in starts(source):
print line,
# could just do: print list(starts(source))
finally:
source.close()
print

--
-Scott David Daniels
sc***********@acm.org
Apr 14 '06 #7
This is very helpful.

I wasn't the OP. I'm just learning, but I'm on the verge of making my
own file searching scripts. This will be a huge help. Thanks for
posting, and especially thanks for the comments in the code. Big help!

rick

Apr 14 '06 #8
Thanks to both of you, Kent and Scott.

Apr 15 '06 #9

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Buddy | last post: by
4 posts views Thread by Neri | last post: by
11 posts views Thread by Dimitris Georgakopuolos | last post: by
3 posts views Thread by James D. Marshall | last post: by
7 posts views Thread by Billa | last post: by
25 posts views Thread by Mike | last post: by
1 post views Thread by sunil | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.