By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,930 Members | 633 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,930 IT Pros & Developers. It's quick & easy.

Python Regular Expressions: re.sub(regex, replacement, subject)

P: n/a
Hi Folks,

I put a Regular Expression question on this list a
couple days ago. I would like to rephrase my question
as below:

In the Python re.sub(regex, replacement, subject)
method/function, I need the second argument
'replacement' to be another regular expression ( not a
string) . So when I find a 'certain kind of string' in
the subject, I can replace it with 'another kind of
string' ( not a predefined string ). Note that the
'replacement' may depend on what exact string is found
as a result of match with the first argument 'regex'.

Please let me know if the question is not clear.

Peace.
Vibha

=======
"Things are only impossible until they are not."

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Jul 21 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Vibha Tripathi wrote:
In the Python re.sub(regex, replacement, subject)
method/function, I need the second argument
'replacement' to be another regular expression ( not a
string) . So when I find a 'certain kind of string' in
the subject, I can replace it with 'another kind of
string' ( not a predefined string ). Note that the
'replacement' may depend on what exact string is found
as a result of match with the first argument 'regex'.

Please let me know if the question is not clear.


It's still not very clear, but my guess is you want to supply a
replacement function instead of a replacement string, e.g.:

py> help(re.sub)
Help on function sub in module sre:

sub(pattern, repl, string, count=0)
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a callable, it's passed the match object and must return
a replacement string to be used.

py> def repl(match):
.... print match.group()
.... return '46'
....
py> re.sub(r'x.*?x', repl, 'yxyyyxxyyxyy')
xyyyx
xyyx
'y4646yy'

STeVe
Jul 21 '05 #2

P: n/a
Vibha Tripathi wrote:
Hi Folks,

I put a Regular Expression question on this list a
couple days ago. I would like to rephrase my question
as below:

In the Python re.sub(regex, replacement, subject)
method/function, I need the second argument
'replacement' to be another regular expression ( not a
string) . So when I find a 'certain kind of string' in
the subject, I can replace it with 'another kind of
string' ( not a predefined string ). Note that the
'replacement' may depend on what exact string is found
as a result of match with the first argument 'regex'.


Do mean 'backreferences'?
re.sub(r"this(\d+)that", r"that\1this", "this12that foo13bar")

'that12this foo13bar'

Note that the replacement string r"that\1this" is not a regular expression,
it has completely different semantics as described in the docs. (Just
guessing: are you coming from perl? r"xxx" is not a regular expression in
Python, like /xxx/ in perl. It's is just an ordinary string where
backslashes are not interpreted by the parser, e.g. r"\x" == "\\x". Using
r"" when working with the re module is not required but pretty useful,
because re has it's own rules for backslash handling).

For more details see the docs for re.sub():
http://docs.python.org/lib/node114.html

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
Jul 21 '05 #3

P: n/a
"Vibha Tripathi" <vi*****@yahoo.com> wrote:
Hi Folks,

I put a Regular Expression question on this list a
couple days ago. I would like to rephrase my question
as below:

In the Python re.sub(regex, replacement, subject)
method/function, I need the second argument
'replacement' to be another regular expression ( not a
string) . So when I find a 'certain kind of string' in
the subject, I can replace it with 'another kind of
string' ( not a predefined string ). Note that the
'replacement' may depend on what exact string is found
as a result of match with the first argument 'regex'.


In re.sub, 'replacement' can be either a string, or a callable that
takes a single match argument and should return the replacement string.
So although replacement cannot be a regular expression, it can be
something even more powerful, a function. Here's a toy example of what
you can do that wouldn't be possible with regular expressions alone:
import re
from datetime import datetime
this_year = datetime.now().year
rx = re.compile(r'(born|gratuated|hired) in (\d{4})')
def replace_year(match):
return "%s %d years ago" % (match.group(1), this_year - int(match.group(2)))
rx.sub(replace_year, 'I was born in 1979 and gratuated in 1996.') 'I was born 26 years ago and gratuated 9 years ago'

In cases where you don't have to transform the matched string (such as
calling int() and evaluating an expression as in the example) but only
append or prepend another string, there is a simpler solution that
doesn't require writing a replacement function: backreferences.
Replacement can be a string where \1 denotes the first group of the
match, \2 the second and so on. Continuing the example, you could hide
the dates by:
rx.sub(r'\1 in ****', 'I was hired in 2001 in a company of 2001 employees.')

'I was hired in **** in a company of 2001 employees.'

By the way, run the last example without the 'r' in front of the
replacement string and you'll see why it is there for.

HTH,

George

Jul 21 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.