By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,628 Members | 1,175 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,628 IT Pros & Developers. It's quick & easy.

make RE more cleaver to avoid inappropriate : sre_constants.error: redefinition of group name

P: n/a

I want to parse

'foo@bare' or '<foot@bar>' and get the email address foo@bar

the regex is

r'<\w+@\w+>|\w+@\w+'

now, I want to give it a name

r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'

sre_constants.error: redefinition of group name 'email' as group 2;
was group 1

BUT because I use a | , I will get only one group named 'email' !

Any comment ?

PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
\w+@\w+)(?(lt)>)'

Mar 29 '07 #1
Share this Question
Share on Google+
5 Replies


P: n/a
On Mar 29, 7:22 am, "aspineux" <aspin...@gmail.comwrote:
I want to parse

'foo@bare' or '<foot@bar>' and get the email address foo@bar

the regex is

r'<\w+@\w+>|\w+@\w+'

now, I want to give it a name

r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'

sre_constants.error: redefinition of group name 'email' as group 2;
was group 1

BUT because I use a | , I will get only one group named 'email' !

Any comment ?

PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
\w+@\w+)(?(lt)>)'


Regular expressions, alternation, named groups ... oh my!

It tends to get quite complex especially if you need
to reject cases where the string contains a left bracket
and not the right, or visa-versa.
>>pattern = re.compile(r'(?P<email><\w+@\w+>|(?<!<)\b\w+@\w+\b (?!>))')
for email in ('foo@bar' , '<foo@bar>', '<start@without_end_bracket'):
.... matched = pattern.search(email)
.... if matched is not None:
.... print matched.group('email')
....
foo@bar
<foo@bar>
I suggest you try some other solution (maybe pyparsing).

--
Hope this helps,
Steven

Mar 29 '07 #2

P: n/a
On 29 mar, 16:22, "aspineux" <aspin...@gmail.comwrote:
I want to parse

'foo@bare' or '<foot@bar>' and get the email address foo@bar

the regex is

r'<\w+@\w+>|\w+@\w+'

now, if I want to give it a name

r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'

sre_constants.error: redefinition of group name 'email' as group 2;
was group 1

BUT because I use a | , I will get only one group named 'email' !
THEN my regex is meaningful, and the error is meaningless and
somrthing
should be change into 're'

But maybe I'm wrong ?
>
Any comment ?
I'm trying to start a discussion about something that can be improved
in 're',
not looking for a solution about email parsing :-)

>
PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
\w+@\w+)(?(lt)>)'

Mar 29 '07 #3

P: n/a
On Mar 29, 3:22 pm, "aspineux" <aspin...@gmail.comwrote:
I want to parse

'foo@bare' or '<foot@bar>' and get the email address foo@bar

the regex is

r'<\w+@\w+>|\w+@\w+'

now, I want to give it a name

r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'

sre_constants.error: redefinition of group name 'email' as group 2;
was group 1

BUT because I use a | , I will get only one group named 'email' !

Any comment ?

PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
\w+@\w+)(?(lt)>)'
use two group names, one for each alternate form and if you are not
concerned with whichever matched do something like the following:
>>s1 = 'foo@bare'
s2 = '<foo@bare>'
matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\ w+)', s1)
matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']
'foo@bare'
>>matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\ w+)', s2)
matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']
'foo@bare'
>>>
- Paddy.

Mar 29 '07 #4

P: n/a
On 30 mar, 00:13, "Paddy" <paddy3...@googlemail.comwrote:
On Mar 29, 3:22 pm, "aspineux" <aspin...@gmail.comwrote:
I want to parse
'foo@bare' or '<foot@bar>' and get the email address foo@bar
the regex is
r'<\w+@\w+>|\w+@\w+'
now, I want to give it a name
r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'
sre_constants.error: redefinition of group name 'email' as group 2;
was group 1
BUT because I use a | , I will get only one group named 'email' !
Any comment ?
PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
\w+@\w+)(?(lt)>)'

use two group names, one for each alternate form and if you are not
concerned with whichever matched do something like the following:
The problem is the way I create this regex :-)

regex={}
regex['email']=r'(?P<email1>\w+@\w+)'

path=r'<%(email)s>|%(email)s' % regex

Once more, the original question is :
Is it normal to get an error when the same id used on both side of a
|
>
>s1 = 'foo@bare'
s2 = '<foo@bare>'
matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\ w+)', s1)
matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']
'foo@bare'
>matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\ w+)', s2)
matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']
'foo@bare'

- Paddy.

Mar 30 '07 #5

P: n/a
On Mar 30, 1:44 pm, "aspineux" <aspin...@gmail.comwrote:
On 30 mar, 00:13, "Paddy" <paddy3...@googlemail.comwrote:
On Mar 29, 3:22 pm, "aspineux" <aspin...@gmail.comwrote:
I want to parse
'foo@bare' or '<foot@bar>' and get the email address foo@bar
the regex is
r'<\w+@\w+>|\w+@\w+'
now, I want to give it a name
r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'
sre_constants.error: redefinition of group name 'email' as group 2;
was group 1
BUT because I use a | , I will get only one group named 'email' !
Any comment ?
PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
\w+@\w+)(?(lt)>)'
use two group names, one for each alternate form and if you are not
concerned with whichever matched do something like the following:

The problem is the way I create this regex :-)

regex={}
regex['email']=r'(?P<email1>\w+@\w+)'

path=r'<%(email)s>|%(email)s' % regex

Once more, the original question is :
Is it normal to get an error when the same id used on both side of a
|
>>s1 = 'foo@bare'
>>s2 = '<foo@bare>'
>>matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\ w+)', s1)
>>matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']
'foo@bare'
>>matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\ w+)', s2)
>>matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']
'foo@bare'
- Paddy.
Groups are numbered left-to-right irrespective of the expression
contents.
I am quite happy with the names being merely apseudonym for the
positional
group number and don't see a problem with not allowing multiple
occurrences of the same group name.
I did see some article about RE's and their speed. It seems that if
Pythons
RE package distinguished between 'grep style' RE' and the full set of
Python
RE's then their are much faster and efficient algorithms available for
the
grep style subset.

- Paddy.

Mar 30 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.