473,379 Members | 1,379 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,379 software developers and data experts.

regex in python

I'm trying to compile a perfectly valid regex, but get the error
message:

r =
re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.3/sre.py", line 179, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.3/sre.py", line 230, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat


What does this mean? I know that the regex
([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*
is valid because i'm able to use it in Regex Coach. But is Python's
regex syntax different that an ordinary syntax?

By the way, i'm using it to normalise strings like:

London|country/uk/region/europe/geocoord/32.3244,42,1221244
to:
London|country/uk/region/europe/geocoord/32.32,42,12

By using \1\2\4 as replace. I'm open for other suggestions to achieve
this!
-Gisle-

May 25 '06 #1
3 5239
> r =
re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*') .... sre_constants.error: nothing to repeat


The error gives something away (like any good error message should)

You're attempting to repeat something that may not exist. In
this case, it's the last question-mark. The item before it

(\d*)

could be empty, and thus have "nothing to repeat".

Simply removing the question-mark in question, making it

r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *).*'

has the same effect as desired (AFAIU) without the error.

-tkc

May 25 '06 #2
In article <11**********************@j55g2000cwa.googlegroups .com>,
gisleyt <gi*****@gmail.com> wrote:
I'm trying to compile a perfectly valid regex, but get the error
message:

r =
re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.3/sre.py", line 179, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.3/sre.py", line 230, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat

What does this mean? I know that the regex
([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*
is valid because i'm able to use it in Regex Coach. But is Python's
regex syntax different that an ordinary syntax?
Your problem lies right near the end:
import re
r = re.compile(r'(\d*)?')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/local/lib/python2.4/sre.py", line 180, in compile
return _compile(pattern, flags)
File "/usr/local/lib/python2.4/sre.py", line 227, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat

Since the term \d* can be matched by the empty string, what would it
mean to ask for 0 or 1 copies of the empty string? How is that
different from 17 copies of the empty string.

So:
r =
re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *).*')

will be accepted.
By the way, i'm using it to normalise strings like:

London|country/uk/region/europe/geocoord/32.3244,42,1221244
to:
London|country/uk/region/europe/geocoord/32.32,42,12

By using \1\2\4 as replace. I'm open for other suggestions to achieve
this!


But you're looking for a string followed by two floats and your sample
input is a string, a float, an integer, a comma and another
integer. If you actually mean the input is

London|country/uk/region/europe/geocoord/32.3244,42.1221244

and you want to convert it to:

London|country/uk/region/europe/geocoord/32.32,42.12

then the above regex will work

--
Jim Segrave (je*@jes-2.demon.nl)

May 25 '06 #3
On 25/05/2006 7:58 PM, gisleyt wrote:
I'm trying to compile a perfectly valid regex, but get the error
message:

r =
re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.3/sre.py", line 179, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.3/sre.py", line 230, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat

What does this mean? I know that the regex
([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*
is valid because i'm able to use it in Regex Coach.
Say what??? From the Regex Coach website:
(1) "can be used to experiment with (Perl-compatible) regular expressions"
(2) "PCRE (which is used by projects like Python" -- once upon a time,
way back in the dream-time, when the world was young, ...

The problem is this little snippet near the end of your regex:
re.compile(r'(\d*)?')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "C:\Python24\lib\sre.py", line 180, in compile
return _compile(pattern, flags)
File "C:\Python24\lib\sre.py", line 227, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat

The message is a little cryptic, should be something like "a repeat
operator has an operand which may match nothing". In other words, you
have said X? (optional occurrence of X) *BUT* X can already match a
zero-length string. X in this case is (\d*)

This is a theoretically valid regex, but it's equivalent to just plain
X, and leaves the reader (and the re implementors, obviously) wondering
whether you (a) have made a typo (b) are a member of the re
implementation quality assurance inspectorate or (c) just plain confused :-)

BTW, reading your regex was making my eyes bleed, so I did this to find
out which piece was the problem:
import re
pat0 = r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*'
pat1 = r'([^\d]*)'
pat2 = r'(\d{1,3}\.\d{0,2})?'
pat3 = r'(\d*)'
pat4 = r'(\,\d{1,3}\.\d{0,2})?'
pat5 = r'(\d*)?.*'
for k, pat in enumerate([pat1, pat2, pat3, pat4, pat5]):
print k+1
re.compile(pat)
But is Python's
regex syntax different that an ordinary syntax?
Python aims to lift itself above the ordinary :-)

By the way, i'm using it to normalise strings like:

London|country/uk/region/europe/geocoord/32.3244,42,1221244
to:
London|country/uk/region/europe/geocoord/32.32,42,12

By using \1\2\4 as replace. I'm open for other suggestions to achieve
this!


Well, you are just about on the right track. You need to avoid the
eye-bleed (by using VERBOSE patterns) and having test data that doesn't
have typos in it, and more test data. You may like to roll your own test
harness, in *Python*, for *Python* regexes, like the following:

C:\junk>type re_demo.py
import re

tests = [
["AA222.22333,444.44555FF", "AA222.22,444.44"],
["foo/geocoord/32.3244,42.1221244", "foo/geocoord/32.32,42.12"], #
what you meant
["foo/geocoord/32.3244,42,1221244", "foo/geocoord/32.32,42,12"], #
what you posted
]

pat0 = r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*'
patx = r"""
([^\d]*) # Grp 1: zero/more non-digits
(\d{1,3}\.\d{0,2})? # Grp 2: 1-3 digits, a dot, 0-2 digits
(optional)
(\d*) # Grp 3: zero/more digits
(\,\d{1,3}\.\d{0,2})? # Grp 4: like grp 2 with comma in front
(optional)
(\d*) # Grp 5: zero/more digits
(.*) # Grp 6: any old rubbish
"""

rx = re.compile(patx, re.VERBOSE)
for testin, expected in tests:
print "\ntestin:", testin
mobj = rx.match(testin)
if not mobj:
print "no match"
continue
for k, grp in enumerate(mobj.groups()):
print "Group %d matched %r" % (k+1, grp)
actual = rx.sub(r"\1\2\4", testin)
print "expected: %r; actual: %r; same: %r" % (expected, actual,
expected ==
actual)

C:\junk>re_demo.py

testin: AA222.22333,444.44555FF
Group 1 matched 'AA'
Group 2 matched '222.22'
Group 3 matched '333'
Group 4 matched ',444.44'
Group 5 matched '555'
Group 6 matched 'FF'
expected: 'AA222.22,444.44'; actual: 'AA222.22,444.44'; same: True

testin: foo/geocoord/32.3244,42.1221244
Group 1 matched 'foo/geocoord/'
Group 2 matched '32.32'
Group 3 matched '44'
Group 4 matched ',42.12'
Group 5 matched '21244'
Group 6 matched ''
expected: 'foo/geocoord/32.32,42.12'; actual:
'foo/geocoord/32.32,42.12'; same:
True

testin: foo/geocoord/32.3244,42,1221244
Group 1 matched 'foo/geocoord/'
Group 2 matched '32.32'
Group 3 matched '44'
Group 4 matched None
Group 5 matched ''
Group 6 matched ',42,1221244'
Traceback (most recent call last):
File "C:\junk\re_demo.py", line 28, in ?
actual = rx.sub(r"\1\2\4", testin)
File "C:\Python24\lib\sre.py", line 260, in filter
return sre_parse.expand_template(template, match)
File "C:\Python24\lib\sre_parse.py", line 782, in expand_template
raise error, "unmatched group"
sre_constants.error: unmatched group

===

HTH,
John
May 25 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: John Hunter | last post by:
In trying to sdebug why a certain regex wasn't working like I expected it to, I came across this strange (to me) behavior. The file I am trying to match definitely contains many instances of the...
4
by: Josef Sachs | last post by:
Is Andrew Kuchling's regex-to-re HOWTO available anywhere? I've found the following (dead) links on various Web pages: http://py-howto.sourceforge.net/regex-to-re/regex-to-re.html...
75
by: Xah Lee | last post by:
http://python.org/doc/2.4.1/lib/module-re.html http://python.org/doc/2.4.1/lib/node114.html --------- QUOTE The module defines several functions, constants, and an exception. Some of the...
6
by: Martin Evans | last post by:
Sorry, yet another REGEX question. I've been struggling with trying to get a regular expression to do the following example in Python: Search and replace all instances of "sleeping" with "dead"....
8
by: Xah Lee | last post by:
the Python regex documentation is available at: http://xahlee.org/perl-python/python_re-write/lib/module-re.html Note that, i've just made the terms of use clear. Also, can anyone answer what...
10
by: igor.kulkin | last post by:
I have a small utility program written in Python which works pretty slow so I've decided to implement it in C. I did some benchmarking of Python's code performance. One of the parts of the program...
2
by: voxiac | last post by:
Could someone tell me why: Fails with message: Traceback (most recent call last): File "<pyshell#12>", line 1, in <module> re.compile('\\dir\\(file)') File "C:\Python25\lib\re.py", line 180,...
0
by: Support Desk | last post by:
Thatís it exactly..thx -----Original Message----- From: Reedick, Andrew Sent: Tuesday, June 03, 2008 9:26 AM To: Support Desk Subject: RE: regex help The regex will now skip anything with...
4
by: seberino | last post by:
I'm looking over the docs for the re module and can't find how to "NOT" an entire regex. For example..... How make regex that means "contains regex#1 but NOT regex#2" ? Chris
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.