On Mar 29, 7:22 am, "aspineux" <aspin...@gmail.comwrote:
I want to parse
'foo@bare' or '<foot@bar>' and get the email address foo@bar
the regex is
r'<\w+@\w+>|\w+@\w+'
now, I want to give it a name
r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'
sre_constants.error: redefinition of group name 'email' as group 2;
was group 1
BUT because I use a | , I will get only one group named 'email' !
Any comment ?
PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
\w+@\w+)(?(lt)>)'
Regular expressions, alternation, named groups ... oh my!
It tends to get quite complex especially if you need
to reject cases where the string contains a left bracket
and not the right, or visa-versa.
>>pattern = re.compile(r'(?P<email><\w+@\w+>|(?<!<)\b\w+@\w+\b (?!>))')
for email in ('foo@bar' , '<foo@bar>', '<start@without_end_bracket'):
.... matched = pattern.search(email)
.... if matched is not None:
.... print matched.group('email')
....
foo@bar
<foo@bar>
I suggest you try some other solution (maybe pyparsing).
--
Hope this helps,
Steven