473,396 Members | 1,814 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

regex in python

I'm trying to compile a perfectly valid regex, but get the error
message:

r =
re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.3/sre.py", line 179, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.3/sre.py", line 230, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat


What does this mean? I know that the regex
([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*
is valid because i'm able to use it in Regex Coach. But is Python's
regex syntax different that an ordinary syntax?

By the way, i'm using it to normalise strings like:

London|country/uk/region/europe/geocoord/32.3244,42,1221244
to:
London|country/uk/region/europe/geocoord/32.32,42,12

By using \1\2\4 as replace. I'm open for other suggestions to achieve
this!
-Gisle-

May 25 '06 #1
3 5243
> r =
re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*') .... sre_constants.error: nothing to repeat


The error gives something away (like any good error message should)

You're attempting to repeat something that may not exist. In
this case, it's the last question-mark. The item before it

(\d*)

could be empty, and thus have "nothing to repeat".

Simply removing the question-mark in question, making it

r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *).*'

has the same effect as desired (AFAIU) without the error.

-tkc

May 25 '06 #2
In article <11**********************@j55g2000cwa.googlegroups .com>,
gisleyt <gi*****@gmail.com> wrote:
I'm trying to compile a perfectly valid regex, but get the error
message:

r =
re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.3/sre.py", line 179, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.3/sre.py", line 230, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat

What does this mean? I know that the regex
([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*
is valid because i'm able to use it in Regex Coach. But is Python's
regex syntax different that an ordinary syntax?
Your problem lies right near the end:
import re
r = re.compile(r'(\d*)?')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/local/lib/python2.4/sre.py", line 180, in compile
return _compile(pattern, flags)
File "/usr/local/lib/python2.4/sre.py", line 227, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat

Since the term \d* can be matched by the empty string, what would it
mean to ask for 0 or 1 copies of the empty string? How is that
different from 17 copies of the empty string.

So:
r =
re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *).*')

will be accepted.
By the way, i'm using it to normalise strings like:

London|country/uk/region/europe/geocoord/32.3244,42,1221244
to:
London|country/uk/region/europe/geocoord/32.32,42,12

By using \1\2\4 as replace. I'm open for other suggestions to achieve
this!


But you're looking for a string followed by two floats and your sample
input is a string, a float, an integer, a comma and another
integer. If you actually mean the input is

London|country/uk/region/europe/geocoord/32.3244,42.1221244

and you want to convert it to:

London|country/uk/region/europe/geocoord/32.32,42.12

then the above regex will work

--
Jim Segrave (je*@jes-2.demon.nl)

May 25 '06 #3
On 25/05/2006 7:58 PM, gisleyt wrote:
I'm trying to compile a perfectly valid regex, but get the error
message:

r =
re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.3/sre.py", line 179, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.3/sre.py", line 230, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat

What does this mean? I know that the regex
([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*
is valid because i'm able to use it in Regex Coach.
Say what??? From the Regex Coach website:
(1) "can be used to experiment with (Perl-compatible) regular expressions"
(2) "PCRE (which is used by projects like Python" -- once upon a time,
way back in the dream-time, when the world was young, ...

The problem is this little snippet near the end of your regex:
re.compile(r'(\d*)?')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "C:\Python24\lib\sre.py", line 180, in compile
return _compile(pattern, flags)
File "C:\Python24\lib\sre.py", line 227, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat

The message is a little cryptic, should be something like "a repeat
operator has an operand which may match nothing". In other words, you
have said X? (optional occurrence of X) *BUT* X can already match a
zero-length string. X in this case is (\d*)

This is a theoretically valid regex, but it's equivalent to just plain
X, and leaves the reader (and the re implementors, obviously) wondering
whether you (a) have made a typo (b) are a member of the re
implementation quality assurance inspectorate or (c) just plain confused :-)

BTW, reading your regex was making my eyes bleed, so I did this to find
out which piece was the problem:
import re
pat0 = r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*'
pat1 = r'([^\d]*)'
pat2 = r'(\d{1,3}\.\d{0,2})?'
pat3 = r'(\d*)'
pat4 = r'(\,\d{1,3}\.\d{0,2})?'
pat5 = r'(\d*)?.*'
for k, pat in enumerate([pat1, pat2, pat3, pat4, pat5]):
print k+1
re.compile(pat)
But is Python's
regex syntax different that an ordinary syntax?
Python aims to lift itself above the ordinary :-)

By the way, i'm using it to normalise strings like:

London|country/uk/region/europe/geocoord/32.3244,42,1221244
to:
London|country/uk/region/europe/geocoord/32.32,42,12

By using \1\2\4 as replace. I'm open for other suggestions to achieve
this!


Well, you are just about on the right track. You need to avoid the
eye-bleed (by using VERBOSE patterns) and having test data that doesn't
have typos in it, and more test data. You may like to roll your own test
harness, in *Python*, for *Python* regexes, like the following:

C:\junk>type re_demo.py
import re

tests = [
["AA222.22333,444.44555FF", "AA222.22,444.44"],
["foo/geocoord/32.3244,42.1221244", "foo/geocoord/32.32,42.12"], #
what you meant
["foo/geocoord/32.3244,42,1221244", "foo/geocoord/32.32,42,12"], #
what you posted
]

pat0 = r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d *)?.*'
patx = r"""
([^\d]*) # Grp 1: zero/more non-digits
(\d{1,3}\.\d{0,2})? # Grp 2: 1-3 digits, a dot, 0-2 digits
(optional)
(\d*) # Grp 3: zero/more digits
(\,\d{1,3}\.\d{0,2})? # Grp 4: like grp 2 with comma in front
(optional)
(\d*) # Grp 5: zero/more digits
(.*) # Grp 6: any old rubbish
"""

rx = re.compile(patx, re.VERBOSE)
for testin, expected in tests:
print "\ntestin:", testin
mobj = rx.match(testin)
if not mobj:
print "no match"
continue
for k, grp in enumerate(mobj.groups()):
print "Group %d matched %r" % (k+1, grp)
actual = rx.sub(r"\1\2\4", testin)
print "expected: %r; actual: %r; same: %r" % (expected, actual,
expected ==
actual)

C:\junk>re_demo.py

testin: AA222.22333,444.44555FF
Group 1 matched 'AA'
Group 2 matched '222.22'
Group 3 matched '333'
Group 4 matched ',444.44'
Group 5 matched '555'
Group 6 matched 'FF'
expected: 'AA222.22,444.44'; actual: 'AA222.22,444.44'; same: True

testin: foo/geocoord/32.3244,42.1221244
Group 1 matched 'foo/geocoord/'
Group 2 matched '32.32'
Group 3 matched '44'
Group 4 matched ',42.12'
Group 5 matched '21244'
Group 6 matched ''
expected: 'foo/geocoord/32.32,42.12'; actual:
'foo/geocoord/32.32,42.12'; same:
True

testin: foo/geocoord/32.3244,42,1221244
Group 1 matched 'foo/geocoord/'
Group 2 matched '32.32'
Group 3 matched '44'
Group 4 matched None
Group 5 matched ''
Group 6 matched ',42,1221244'
Traceback (most recent call last):
File "C:\junk\re_demo.py", line 28, in ?
actual = rx.sub(r"\1\2\4", testin)
File "C:\Python24\lib\sre.py", line 260, in filter
return sre_parse.expand_template(template, match)
File "C:\Python24\lib\sre_parse.py", line 782, in expand_template
raise error, "unmatched group"
sre_constants.error: unmatched group

===

HTH,
John
May 25 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: John Hunter | last post by:
In trying to sdebug why a certain regex wasn't working like I expected it to, I came across this strange (to me) behavior. The file I am trying to match definitely contains many instances of the...
4
by: Josef Sachs | last post by:
Is Andrew Kuchling's regex-to-re HOWTO available anywhere? I've found the following (dead) links on various Web pages: http://py-howto.sourceforge.net/regex-to-re/regex-to-re.html...
75
by: Xah Lee | last post by:
http://python.org/doc/2.4.1/lib/module-re.html http://python.org/doc/2.4.1/lib/node114.html --------- QUOTE The module defines several functions, constants, and an exception. Some of the...
6
by: Martin Evans | last post by:
Sorry, yet another REGEX question. I've been struggling with trying to get a regular expression to do the following example in Python: Search and replace all instances of "sleeping" with "dead"....
8
by: Xah Lee | last post by:
the Python regex documentation is available at: http://xahlee.org/perl-python/python_re-write/lib/module-re.html Note that, i've just made the terms of use clear. Also, can anyone answer what...
10
by: igor.kulkin | last post by:
I have a small utility program written in Python which works pretty slow so I've decided to implement it in C. I did some benchmarking of Python's code performance. One of the parts of the program...
2
by: voxiac | last post by:
Could someone tell me why: Fails with message: Traceback (most recent call last): File "<pyshell#12>", line 1, in <module> re.compile('\\dir\\(file)') File "C:\Python25\lib\re.py", line 180,...
0
by: Support Desk | last post by:
That’s it exactly..thx -----Original Message----- From: Reedick, Andrew Sent: Tuesday, June 03, 2008 9:26 AM To: Support Desk Subject: RE: regex help The regex will now skip anything with...
4
by: seberino | last post by:
I'm looking over the docs for the re module and can't find how to "NOT" an entire regex. For example..... How make regex that means "contains regex#1 but NOT regex#2" ? Chris
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.