By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,629 Members | 1,222 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,629 IT Pros & Developers. It's quick & easy.

Some more odd behaviour from the Regexp library

P: n/a
Can anyone explain why it won't give me my captured group?

In [1]: a = 'exit: gkdfjgfjdfsgdjglkghdfgkd'
In [2]: import re
In [3]: b = re.search(r'exit: (.*?)', a)
In [4]: b.group(0)
Out[4]: 'exit: '

In [5]: b.group(1)
Out[5]: ''

In [6]: b.group(2)
IndexError: no such group

Oct 20 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
In article <11**********************@g44g2000cwa.googlegroups .com>,
"David Veerasingam" <vd********@gmail.com> wrote:
Can anyone explain why it won't give me my captured group?

In [1]: a = 'exit: gkdfjgfjdfsgdjglkghdfgkd'
In [2]: import re
In [3]: b = re.search(r'exit: (.*?)', a)
In [4]: b.group(0)
Out[4]: 'exit: '

In [5]: b.group(1)
Out[5]: ''

In [6]: b.group(2)
IndexError: no such group


The ? tells (.*?) to match as little as possible and that is nothing.
If you change it to (.*) it should do what you want.

--
Doug Schwarz
dmschwarz&urgrad,rochester,edu
Make obvious changes to get real email address.
Oct 20 '05 #2

P: n/a
"David Veerasingam" <vd********@gmail.com> writes:
Can anyone explain why it won't give me my captured group?

In [1]: a = 'exit: gkdfjgfjdfsgdjglkghdfgkd'
In [2]: import re
In [3]: b = re.search(r'exit: (.*?)', a)
In [4]: b.group(0)
Out[4]: 'exit: '

In [5]: b.group(1)
Out[5]: ''

In [6]: b.group(2)
IndexError: no such group


It is giving you your captured group. While the * operator matches as
long a string as possible, the *? operator matches as *short* a string
as possible. Since '' matches .*?, that's all it's ever going to
capture. So b.group(1) is '', which is what it's giving you.
a = 'exit: gkdfjgfjdfsgdjglkghdfgkd'
import re
b = re.search(r'exit: (.*)', a)
b.group(0) 'exit: gkdfjgfjdfsgdjglkghdfgkd' b.group(1) 'gkdfjgfjdfsgdjglkghdfgkd'


which I suspect is what you actually want.

Of course, being the founder of SPARE, I have to point out that
a.split(': ') will get you the same two strings as the re I used
above.
<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Oct 20 '05 #3

P: n/a
Mike Meyer wrote:
"David Veerasingam" <vd********@gmail.com> writes: [...] Of course, being the founder of SPARE, I have to point out that
a.split(': ') will get you the same two strings as the re I used
above.

Let me guess: the Society for the Prevention of Abuse of Regular
Expressions?

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Oct 20 '05 #4

P: n/a
Thanks for all your replies.

I guess I've always used .*? as sort of an idiom for a non-greedy
match, but I guess it only works if I specify the end point (which I
didn't in the above case).

e.g. re.search(r'exit: (.*?)$', a)

Thanks for pointing that out!

David

Oct 20 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.