"Steven Bethard" <st************ @gmail.com> wrote in message
news:ma******** *************** *************** @python.org...
Could you do something like:
line = ' s^\\?AAA\\?01^B BB^g; #Comment '
expr = r'(^\s*)(s|tr)( .)(\\\?%s)\3(.* ?)\3(.*)'
matcher = re.compile(expr % re.escape("AAA\ ?01"))
matcher.findall (line)
[(' ', 's', '^', '\\?AAA\\?01', 'BBB', 'g; #Comment ')]
Basically, I still use the r'' string so that I don't have to write so
many backslashes, but then I use a %s to insert the "AAA\?01" into the middle
of the expression. Looks at least a little cleaner to me.
Steve
Here's a more verbose version of Steve Bethard's suggestion. By building
up the regexp from individual parts, it is possible to give each part some
semi-meaningful name, or to attach comments to individual pieces. It also
makes it easier to maintain later. What if you had to support an additional
command besides s and tr, like 'rep'? Just change replaceCmd to read
replaceCmd = r'(s|tr|rep)'. What if you needed to support leading tabs
in addition to leading spaces? Change leadingWhite as needed. For
that matter, just giving the finished regexp the name 'replaceCmdExpr '
gives the reader more of a clue as to what the regexp's purpose is,
as the original code did with extra comments.
I find nearly *all* regexp's to be cryptic, and when I need them, I
usually assemble them in some fashion such as this. David Mertz
proposes a similar style in his very good book, "Text Processing
in Python."
(Some quibble with the practice of aligning '=' signs, but I find it to be a
helpful guide to the eye when declaring a set of related strings such as
these, assuming of course that one edits using a fixed space font.)
So why does the key get prepended with the backslashes and
question marks?
-- Paul
(I'll bet you thought I'd post a pyparsing version. :) Well, in a
certain way, I did.)
import re
line = ' s^\\?AAA\\?01^B BB^g; #Comment '
r1 = r'(^\s*)(s|tr)( .)(\\\?\\??'
key = "AAA\?01"
r2 = r'\\??)\3(.*?)\ 3(.*)'
r = r1 + re.escape(key) + r2
print re.compile(r).f indall(line)
# desired regexp, from Steve Bethard's post
# r'(^\s*)(s|tr)( .)(\\\?%s)\3(.* ?)\3(.*)'
# build up regexp by parts
key = r'AAA\?01'
leadingWhite = r'(^\s*)'
replaceCmd = r'(s|tr)'
sepChar = r'(.)'
# prepend \'s and ?'s, only the OP knows why...
findString = r'(\\\?\\??%s)' % re.escape(key)
# sepCharRef references the char read by sepChar,
# to support separators other than '^'
sepCharRef = r'\3'
replString = r'(.*?)'
restOfLine = r'(.*)'
replaceCmdExpr = leadingWhite + replaceCmd + \
sepChar + findString + sepCharRef + \
replString + sepCharRef + restOfLine
matcher = re.compile( replaceCmdExpr )
print matcher.findall (line)