473,516 Members | 3,488 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

regular expressions, substituting and adding in one step?

Ok, this might look familiar. I'd like to use regular expressions to
change this line:

self.source += '<p>' + paragraph + '</p>\n\n'

to read:

self.source += '<p>%s</p>\n\n' % paragraph

Now, matching the middle part and replacing it with '%s' is easy, but
how would I add the extra string to the end of the line? Is it done all
at once, or must I make a new regex to match?

Also, I figure I'd use a group to match the word 'paragraph', and use
that group to insert the word at the end, but how will I 'retain' the
state of \1 if I use more than one regex to do this?

I'd like to do this for several lines, so I'm trying not to make it too
specific (i.e., matching the entire line, for example, and then adding
text after it, if that's possible).

So the questions are, how do you use regular expressions to add text to
the end of a line, even if you aren't matching the end of the line in
the first place? Or does that entail using separate regexes that *do* do
this? If the latter, how do I retain the value of the groups taken from
the first re?

Thanks, hope that made some sense.
May 8 '06 #1
6 1402
John Salerno wrote:
So the questions are, how do you use regular expressions to add text to
the end of a line, even if you aren't matching the end of the line in
the first place? Or does that entail using separate regexes that *do* do
this? If the latter, how do I retain the value of the groups taken from
the first re?


Here's what I have so far:

-----------

import re

txt_file = open(r'C:\Python24\myscripts\re_test.txt')
new_string = re.sub(r"' \+ ([a-z]+) \+ '", '%s', txt_file.read())
new_string = re.sub(r'$', ' % paragraph', new_string)
txt_file.close()

-----------

re_test.txt contains:

self.source += '<p>' + paragraph + '</p>\n\n'

Both substitutions work, but now I just need to figure out how to
replace the hard-coded ' % paragraph' parameter with something that uses
the group taken from the first regex. I'm guessing if I don't use it at
that time, then it's lost. I suppose I could create a MatchObject and
save group(1) as a variable for later use, but that would be a lot of
extra steps, so I wanted to see if there's a way to do it all at one
time with regular expressions.

Thanks.
May 8 '06 #2
"John Salerno" <jo******@NOSPAMgmail.com> wrote in message
news:yE******************@news.tufts.edu...
Ok, this might look familiar. I'd like to use regular expressions to
change this line:

self.source += '<p>' + paragraph + '</p>\n\n'

to read:

self.source += '<p>%s</p>\n\n' % paragraph

John -

You've been asking for re-based responses, so I apologize in advance for
this digression. Pyparsing is an add-on Python module that can provide a
number of features beyond just text matching and parsing. Pyparsing allows
you to define callbacks (or "parse actions") that get invoked during the
parsing process, and these callbacks can modify the matched text.

Since your re approach seems to be on a fairly convergent path, I felt I
needed to come up with more demanding examples to justify a pyparsing
solution. So I contrived these additional cases:

self.source += '<p>' + paragraph + '</p>\n\n'
listItem1 = '<li>' + someText + '</li>'
listItem2 = '<li>' + someMoreText + '</li>'
self.source += '<ul>' + listItem1 + '\n' + listItem2 + '\n' + '</ul>\n\n'

The following code processes these expressions. Admittedly, it is not as
terse as your re-based code samples have been, but it may give you another
data point in your pursuite of a solution. (The pyparsing home wiki is at
http://pyparsing.wikispaces.com.)

The purpose of the intermediate classes is to convert the individual terms
of the string expresssion into a list of string terms, either variable
references or quoted literals. This conversion is done in the term-specific
parse actions created by makeTermParseAction. Then the overall string
expression gets its own parse action, which processes the list of term
objects, and creates the modified string expression. Two different string
expression conversion functions are shown, one generating string
interpolation expressions, and one generating "".join() expressions.

Hope this helps, or is at least mildly entertaining,
-- Paul
================
from pyparsing import *

testLines = r"""
self.source += '<p>' + paragraph + '</p>\n\n'
listItem1 = '<li>' + someText + '</li>'
listItem2 = '<li>' + someMoreText + '</li>'
self.source += '<ul>' + listItem1 + '\n' + listItem2 + '\n' + '</ul>\n\n'
"""

# define some classes to use during parsing
class StringExprTerm(object):
def __init__(self,content):
self.content = content

class VarRef(StringExprTerm):
pass

class QuotedLit(StringExprTerm):
pass

def makeTermParseAction(cls):
def parseAction(s,l,tokens):
return cls(tokens[0])
return parseAction

# define parts we want to recognize as terms in a string expression
varName = Word(alphas+"_", alphanums+"_")
varName.setParseAction( makeTermParseAction( VarRef ) )
quotedString.setParseAction( removeQuotes, makeTermParseAction(
QuotedLit ) )
stringTerm = varName | quotedString

# define a string expression in terms of term expressions
PLUS = Suppress("+")
EQUALS = Suppress("=")
stringExpr = EQUALS + stringTerm + ZeroOrMore( PLUS + stringTerm )

# define a parse action, to be invoked every time a string expression is
found
def interpolateTerms(originalString,locn,tokens):
out = []
refs = []
terms = tokens
for term in terms:
if isinstance(term,QuotedLit):
out.append( term.content )
elif isinstance(term,VarRef):
out.append( "%s" )
refs.append( term.content )
else:
print "hey! this is impossible!"

# generate string to be interpolated, and interp operator
outstr = "'" + "".join(out) + "' % "

# generate interpolation argument tuple
if len(refs) > 1:
outstr += "(" + ",".join(refs) + ")"
else:
outstr += ",".join(refs)

# return generated string (don't forget leading = sign)
return "= " + outstr

stringExpr.setParseAction( interpolateTerms )

print "Original:",
print testLines
print
print "Modified:",
print stringExpr.transformString( testLines )

# define slightly different parse action, to use list join instead of string
interp
def createListJoin(originalString,locn,tokens):
out = []
terms = tokens
for term in terms:
if isinstance(term,QuotedLit):
out.append( "'" + term.content + "'" )
elif isinstance(term,VarRef):
out.append( term.content )
else:
print "hey! this is impossible!"

# generate string to be interpolated, and interp operator
outstr = "[" + ",".join(out) + "]"

# return generated string (don't forget leading = sign)
return "= ''.join(" + outstr + ")"

del stringExpr.parseAction[:]
stringExpr.setParseAction( createListJoin )

print
print "Modified (2):",
print stringExpr.transformString( testLines )

================
Prints out:
Original:
self.source += '<p>' + paragraph + '</p>\n\n'
listItem1 = '<li>' + someText + '</li>'
listItem2 = '<li>' + someMoreText + '</li>'
self.source += '<ul>' + listItem1 + '\n' + listItem2 + '\n' + '</ul>\n\n'

Modified:
self.source += '<p>%s</p>\n\n' % paragraph
listItem1 = '<li>%s</li>' % someText
listItem2 = '<li>%s</li>' % someMoreText
self.source += '<ul>%s\n%s\n</ul>\n\n' % (listItem1,listItem2)

Modified (2):
self.source += ''.join(['<p>',paragraph,'</p>\n\n'])
listItem1 = ''.join(['<li>',someText,'</li>'])
listItem2 = ''.join(['<li>',someMoreText,'</li>'])
self.source += ''.join(['<ul>',listItem1,'\n',listItem2,'\n','</ul>\n\n'])
================
May 8 '06 #3
John Salerno wrote:
Ok, this might look familiar. I'd like to use regular expressions to
change this line:

self.source += '<p>' + paragraph + '</p>\n\n'

to read:

self.source += '<p>%s</p>\n\n' % paragraph

Now, matching the middle part and replacing it with '%s' is easy, but
how would I add the extra string to the end of the line? Is it done all
at once, or must I make a new regex to match?

Also, I figure I'd use a group to match the word 'paragraph', and use
that group to insert the word at the end, but how will I 'retain' the
state of \1 if I use more than one regex to do this?


Do it all in one match / substitution using \1 to insert the value of
the paragraph group at the new location:

In [19]: test = "self.source += '<p>' + paragraph + '</p>\n\n'"

In [20]: re.sub(r"'<p>' \+ (.*?) \+ '</p>\n\n'", r"'<p>%s</p>\n\n' %
\1", test)
Out[20]: "self.source += '<p>%s</p>\n\n' % paragraph"

Kent
May 10 '06 #4
Kent Johnson wrote:
Do it all in one match / substitution using \1 to insert the value of
the paragraph group at the new location:

In [19]: test = "self.source += '<p>' + paragraph + '</p>\n\n'"

In [20]: re.sub(r"'<p>' \+ (.*?) \+ '</p>\n\n'", r"'<p>%s</p>\n\n' %
\1", test)
Out[20]: "self.source += '<p>%s</p>\n\n' % paragraph"


Interesting. Thanks! I was just doing some more reading of the re
module, so now I understand sub() better. I'll give this a try too. Call
me crazy, but I'm interested in regular expressions right now. :)
May 10 '06 #5
John Salerno wrote:
Call
me crazy, but I'm interested in regular expressions right now. :)


Not crazy at all. REs are a powerful and useful tool that every
programmer should know how to use. They're just not the right tool for
every job!

Kent
May 10 '06 #6
Kent Johnson wrote:
They're just not the right tool for
every job!


Thank god for that! As easy as they've become to me (after seeming
utterly cryptic and impenetrable), they are still a little unwieldy.
Next step: learn how to write look-ahead and look-behind REs! :)
May 10 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2687
by: LuKrOz | last post by:
Someone could tell me how can I get the same result substituting ereg with preg_match and ereg_replace with preg_replace. $result = ereg("<\>(.+)<\>",$this->buffer,$token); $this->buffer = ereg_replace("<\>.+<\>","<>",$this->buffer) ; Thanks.
9
10334
by: Ron Adam | last post by:
Is it possible to match a string to regular expression pattern instead of the other way around? For example, instead of finding a match within a string, I want to find out, (pass or fail), if a string is a partial match to an re. Given an re of 'abcd and a bunch of other stuff' This is what i'm looking for:
3
2000
by: Tom | last post by:
I have struggled with the issue of whether or not to use Regular Expressions for a long time now, and after implementing many text manipulating solutions both ways, I've found that writing specialized code instead of an RE is almost always the better solution. Here is why.... RE's are complex. Sure it is one line of code, but it is on...
7
2503
by: Sally B. | last post by:
Hi, how would I go about doing the following: substitute all the strings 'the' and 'dog' in the 'mytext' id div only? The following doesn't work for 'the'. I also need to use regular expressions for the replace command... but I can't get that far. Any reason this doesn't work as below: <script> function start() {...
11
3528
by: Ron Rohrssen | last post by:
Slightly off topic.... How can I write a regex that limits user input to 3 digits in the range of 1-128? I've been trying \d{1,128} but this allows for a match on more than 3 digits. Thanks.
4
1935
by: GenoJoe | last post by:
If you are not new to VB.NET but are new to regular expressions, you need to get a free copy of "Pragmatic Guide to Regular Expressions for VB.NET Programmers". I wrote this guide because all of the sources that I researched for information on this topic, including Microsoft Help pages, did not properly address it from the viewpoint of someone...
7
3796
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I want to avoid that. My question here is if there is a way to pass either a memory stream or array of "find", "replace" expressions or any other way...
6
2276
by: Ludwig | last post by:
Hi, i'm using the regular expression \b\w to find the beginning of a word, in my C# application. If the word is 'public', for example, it works. However, if the word is '<public', it does not work: it seems that < is not a valid character, so the beginning of the word starts at theletter 'p' instead of '<'. Because I'm not an expert in...
1
1024
by: abranches | last post by:
Hello everyone. I'm having a problem when extracting data from HTML with regular expressions. This is the source code: You are ready in the next<br /><span id="counter_jt_minutes" style="display: inline;"><span id="counter_jt_minutes_value">12</ span>M</span<span id="counter_jt_seconds" style="display: inline;"><span...
0
7273
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7405
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7574
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
7547
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
4769
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3265
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3252
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
823
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
487
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.