regex in python | | |
I'm trying to compile a perfectly valid regex, but get the error
message:
r =
re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.3/sre.py", line 179, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.3/sre.py", line 230, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat[color=blue][color=green][color=darkred]
>>>[/color][/color][/color]
What does this mean? I know that the regex
([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*
is valid because i'm able to use it in Regex Coach. But is Python's
regex syntax different that an ordinary syntax?
By the way, i'm using it to normalise strings like:
London|country/uk/region/europe/geocoord/32.3244,42,1221244
to:
London|country/uk/region/europe/geocoord/32.32,42,12
By using \1\2\4 as replace. I'm open for other suggestions to achieve
this!
-Gisle- | | | | re: regex in python
> r =[color=blue]
> re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*')[/color]
....[color=blue]
> sre_constants.error: nothing to repeat[/color]
The error gives something away (like any good error message should)
You're attempting to repeat something that may not exist. In
this case, it's the last question-mark. The item before it
(\d*)
could be empty, and thus have "nothing to repeat".
Simply removing the question-mark in question, making it
r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *).*'
has the same effect as desired (AFAIU) without the error.
-tkc | | | | re: regex in python
In article <1148551097.266423.141230@j55g2000cwa.googlegroups .com>,
gisleyt <gisleyt@gmail.com> wrote:[color=blue]
>I'm trying to compile a perfectly valid regex, but get the error
>message:
>
> r =
>re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*')
>Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> File "/usr/lib/python2.3/sre.py", line 179, in compile
> return _compile(pattern, flags)
> File "/usr/lib/python2.3/sre.py", line 230, in _compile
> raise error, v # invalid expression
>sre_constants.error: nothing to repeat[color=green][color=darkred]
>>>>[/color][/color]
>
>What does this mean? I know that the regex
>([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*
>is valid because i'm able to use it in Regex Coach. But is Python's
>regex syntax different that an ordinary syntax?[/color]
Your problem lies right near the end:
[color=blue][color=green][color=darkred]
>>> import re
>>> r = re.compile(r'(\d*)?')[/color][/color][/color]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/local/lib/python2.4/sre.py", line 180, in compile
return _compile(pattern, flags)
File "/usr/local/lib/python2.4/sre.py", line 227, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
Since the term \d* can be matched by the empty string, what would it
mean to ask for 0 or 1 copies of the empty string? How is that
different from 17 copies of the empty string.
So:
r =
re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *).*')
will be accepted.
[color=blue]
>By the way, i'm using it to normalise strings like:
>
>London|country/uk/region/europe/geocoord/32.3244,42,1221244
>to:
>London|country/uk/region/europe/geocoord/32.32,42,12
>
>By using \1\2\4 as replace. I'm open for other suggestions to achieve
>this![/color]
But you're looking for a string followed by two floats and your sample
input is a string, a float, an integer, a comma and another
integer. If you actually mean the input is
London|country/uk/region/europe/geocoord/32.3244,42.1221244
and you want to convert it to:
London|country/uk/region/europe/geocoord/32.32,42.12
then the above regex will work
--
Jim Segrave (jes@jes-2.demon.nl) | | | | re: regex in python
On 25/05/2006 7:58 PM, gisleyt wrote:[color=blue]
> I'm trying to compile a perfectly valid regex, but get the error
> message:
>
> r =
> re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*')
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> File "/usr/lib/python2.3/sre.py", line 179, in compile
> return _compile(pattern, flags)
> File "/usr/lib/python2.3/sre.py", line 230, in _compile
> raise error, v # invalid expression
> sre_constants.error: nothing to repeat
>
> What does this mean? I know that the regex
> ([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*
> is valid because i'm able to use it in Regex Coach.[/color]
Say what??? From the Regex Coach website:
(1) "can be used to experiment with (Perl-compatible) regular expressions"
(2) "PCRE (which is used by projects like Python" -- once upon a time,
way back in the dream-time, when the world was young, ...
The problem is this little snippet near the end of your regex:
[color=blue][color=green][color=darkred]
>>> re.compile(r'(\d*)?')[/color][/color][/color]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "C:\Python24\lib\sre.py", line 180, in compile
return _compile(pattern, flags)
File "C:\Python24\lib\sre.py", line 227, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
The message is a little cryptic, should be something like "a repeat
operator has an operand which may match nothing". In other words, you
have said X? (optional occurrence of X) *BUT* X can already match a
zero-length string. X in this case is (\d*)
This is a theoretically valid regex, but it's equivalent to just plain
X, and leaves the reader (and the re implementors, obviously) wondering
whether you (a) have made a typo (b) are a member of the re
implementation quality assurance inspectorate or (c) just plain confused :-)
BTW, reading your regex was making my eyes bleed, so I did this to find
out which piece was the problem:
import re
pat0 = r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*'
pat1 = r'([^\d]*)'
pat2 = r'(\d{1,3}\.\d{0,2})?'
pat3 = r'(\d*)'
pat4 = r'(\,\d{1,3}\.\d{0,2})?'
pat5 = r'(\d*)?.*'
for k, pat in enumerate([pat1, pat2, pat3, pat4, pat5]):
print k+1
re.compile(pat)
[color=blue]
> But is Python's
> regex syntax different that an ordinary syntax?[/color]
Python aims to lift itself above the ordinary :-)
[color=blue]
>
> By the way, i'm using it to normalise strings like:
>
> London|country/uk/region/europe/geocoord/32.3244,42,1221244
> to:
> London|country/uk/region/europe/geocoord/32.32,42,12
>
> By using \1\2\4 as replace. I'm open for other suggestions to achieve
> this!
>[/color]
Well, you are just about on the right track. You need to avoid the
eye-bleed (by using VERBOSE patterns) and having test data that doesn't
have typos in it, and more test data. You may like to roll your own test
harness, in *Python*, for *Python* regexes, like the following:
C:\junk>type re_demo.py
import re
tests = [
["AA222.22333,444.44555FF", "AA222.22,444.44"],
["foo/geocoord/32.3244,42.1221244", "foo/geocoord/32.32,42.12"], #
what you meant
["foo/geocoord/32.3244,42,1221244", "foo/geocoord/32.32,42,12"], #
what you posted
]
pat0 = r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*'
patx = r"""
([^\d]*) # Grp 1: zero/more non-digits
(\d{1,3}\.\d{0,2})? # Grp 2: 1-3 digits, a dot, 0-2 digits
(optional)
(\d*) # Grp 3: zero/more digits
(\,\d{1,3}\.\d{0,2})? # Grp 4: like grp 2 with comma in front
(optional)
(\d*) # Grp 5: zero/more digits
(.*) # Grp 6: any old rubbish
"""
rx = re.compile(patx, re.VERBOSE)
for testin, expected in tests:
print "\ntestin:", testin
mobj = rx.match(testin)
if not mobj:
print "no match"
continue
for k, grp in enumerate(mobj.groups()):
print "Group %d matched %r" % (k+1, grp)
actual = rx.sub(r"\1\2\4", testin)
print "expected: %r; actual: %r; same: %r" % (expected, actual,
expected ==
actual)
C:\junk>re_demo.py
testin: AA222.22333,444.44555FF
Group 1 matched 'AA'
Group 2 matched '222.22'
Group 3 matched '333'
Group 4 matched ',444.44'
Group 5 matched '555'
Group 6 matched 'FF'
expected: 'AA222.22,444.44'; actual: 'AA222.22,444.44'; same: True
testin: foo/geocoord/32.3244,42.1221244
Group 1 matched 'foo/geocoord/'
Group 2 matched '32.32'
Group 3 matched '44'
Group 4 matched ',42.12'
Group 5 matched '21244'
Group 6 matched ''
expected: 'foo/geocoord/32.32,42.12'; actual:
'foo/geocoord/32.32,42.12'; same:
True
testin: foo/geocoord/32.3244,42,1221244
Group 1 matched 'foo/geocoord/'
Group 2 matched '32.32'
Group 3 matched '44'
Group 4 matched None
Group 5 matched ''
Group 6 matched ',42,1221244'
Traceback (most recent call last):
File "C:\junk\re_demo.py", line 28, in ?
actual = rx.sub(r"\1\2\4", testin)
File "C:\Python24\lib\sre.py", line 260, in filter
return sre_parse.expand_template(template, match)
File "C:\Python24\lib\sre_parse.py", line 782, in expand_template
raise error, "unmatched group"
sre_constants.error: unmatched group
===
HTH,
John |  | |