By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,587 Members | 1,692 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,587 IT Pros & Developers. It's quick & easy.

Squezing in replacements into strings

P: n/a
I've got a regular expression that finds certain words from a longer string.
From "Peter Bengtsson PETER, or PeTeR" it finds: 'Peter','PETER','PeTeR'.


What I then want to do is something like this:

def _ok(matchobject):
# more complicated stuff happens here
return 1

def _massage(word):
return "_" + word + "_"

for match in regex.finditer(text):
if not _ok(match):
continue
text = text[:match.start()] +\
_massageMatch(text[match.start():match.end()]) +\
text[match.end():]

This code works and can convert something like "don't allow the fuck swear word"

to "don't allow the _fuck_ swear word".

The problem is when there are more than one matches. The match.start() and

match.end() are for the original string but after the first iteration in the

loop the original string changes (it gains 2 characters in length due to the
"_"'s)

How can I do this this concatenation correctly?

--
Peter Bengtsson,
work www.fry-it.com
home www.peterbe.com
hobby www.issuetrackerproduct.com

Jul 19 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Peter Bengtsson wrote:
I've got a regular expression that finds certain words from a longer
string.
From "Peter Bengtsson PETER, or PeTeR" it finds: 'Peter','PETER','PeTeR'.
The problem is when there are more than one matches. The match.start() and
match.end() are for the original string but after the first iteration in
the loop the original string changes (it gains 2 characters in length due to the "_"'s
How can I do this this concatenation correctly?


I think sub() is more appropriate than finditer() for your problem, e. g.:
def process(match): .... return "_%s_" % match.group(1).title()
.... re.compile("(peter)", re.I).sub(process, "Peter Bengtsson PETER, or PeTeR")
'_Peter_ Bengtsson _Peter_, or _Peter_'


Peter
Jul 19 '05 #2

P: n/a
As Peter Otten said, sub() is probably what you want. Try:

---------------------------------------------------
import re

def _ok(matchobject):
# more complicated stuff happens here
return 1

def _massage(word):
return "_" + word + "_"
def _massage_or_not(matchobj):
if not _ok(matchobj):
return matchobj.group(0)
else:
word = matchobj.group(0)
return _massage(word)
text = "don't allow the fuck swear word"

rtext = re.sub(r'fuck', _massage_or_not, text)
print rtext
---------------------------------------------------

No need to hassle with the changing length of the replaced string.

Best regards,
Adriano.
Jul 19 '05 #3

P: n/a
Peter Otten <__peter__ <at> web.de> writes:
How can I do this this concatenation correctly?


I think sub() is more appropriate than finditer() for your problem, e. g.:
def process(match): ... return "_%s_" % match.group(1).title()
... re.compile("(peter)", re.I).sub(process, "Peter Bengtsson PETER, or PeTeR")
'_Peter_ Bengtsson _Peter_, or _Peter_'


Ahaa! Great. I didn't realise that I can substitute with a callable that gets
the match object. Hadn't thought of it that way.
Will try this now.

Jul 19 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.