473,625 Members | 3,384 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Freeze problem with Regular Expression

Hi All,
the following regular expression matching seems to enter in a infinite
loop:

############### #
import re
text = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA)
una '
re.findall('[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9]
*[A-Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$', text)
############### ##

No problem with perl with the same expression:

############### ##
$s = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA) una
';
$s =~ /[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9]*[A-
Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$/;
print $1;
############### ##

I've python 2.5.2 on Ubuntu 8.04.
any idea?
Thanks!

--
Kirk
Jun 27 '08 #1
9 1948
On 25 Juni, 17:20, Kirk <nore...@yahoo. comwrote:
Hi All,
the following regular expression matching seems to enter in a infinite
loop:

############### #
import re
text = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA)
una '
re.findall('[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9]
*[A-Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$', text)
############### ##

No problem with perl with the same expression:

############### ##
$s = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA) una
';
$s =~ /[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9]*[A-
Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$/;
print $1;
############### ##

I've python 2.5.2 on Ubuntu 8.04.
any idea?
Thanks!

--
Kirk

what are you trying to do?
Jun 27 '08 #2
-----Original Message-----
From: py************* *************** ****@python.org [mailto:python-
li************* ************@py thon.org] On Behalf Of Kirk
Sent: Wednesday, June 25, 2008 11:20 AM
To: py*********@pyt hon.org
Subject: Freeze problem with Regular Expression

Hi All,
the following regular expression matching seems to enter in a infinite
loop:

############### #
import re
text = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA)
una '
re.findall('[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-
z]*\s*(?:[0-9]
*[A-Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$', text)
############### ##

No problem with perl with the same expression:

############### ##
$s = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA)
una
';
$s =~ /[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-
9]*[A-
Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$/;
print $1;
############### ##

I've python 2.5.2 on Ubuntu 8.04.
any idea?
Thanks!

It locks up on 2.5.2 on windows also. Probably too much recursion going
on.
What's with the |'s in [0-9|a-z|\-]? The '|' is a character not an 'or'
operator. I think you meant to say either '[0-9a-z\-]' or '[0-9a-z\-|]'

*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA621
Jun 27 '08 #3
Le Wednesday 25 June 2008 18:40:08 cirfu, vous avez écrit*:
On 25 Juni, 17:20, Kirk <nore...@yahoo. comwrote:
Hi All,
the following regular expression matching seems to enter in a infinite
loop:

############### #
import re
text = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA)
una '
re.findall('[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9
] *[A-Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$', text)
############### ##

No problem with perl with the same expression:

############### ##
$s = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA) una
';
$s =~ /[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9]*[A-
Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$/;
print $1;
############### ##

I've python 2.5.2 on Ubuntu 8.04.
any idea?
Thanks!

--
Kirk

what are you trying to do?
This is indeed the good question.

Whatever the implementation/language is, something like that can work with
happiness, but I doubt you'll find one to tell you if it *should* work or if
it shouldn't, my brain-embedded parser is doing some infinite loop too...

That said, "[0-9|a-z|\-]" is by itself strange, pipe (|) between square
brackets is the character '|', so there is no reason for it to appears twice.

Very complicated regexps are always evil, and a two or three stage filtering
is likely to do the job with good, or at least better, readability.

But once more, what are you trying to do ? This is not even clear that regexp
matching is the best tool for it.

--
_____________

Maric Michaud
Jun 27 '08 #4
On Jun 26, 1:20 am, Kirk <nore...@yahoo. comwrote:
Hi All,
the following regular expression matching seems to enter in a infinite
loop:

############### #
import re
text = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA)
una '
re.findall('[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9]
*[A-Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$', text)
############### ##
[expletives deleted]
>
I've python 2.5.2 on Ubuntu 8.04.
any idea?
Several problems:
(1) lose the vertical bars (as advised by others)
(2) ALWAYS use a raw string for regexes; your \s* will match on lower-
case 's', not on spaces
(3) why are you using findall on a pattern that ends in "$"?
(4) using non-verbose regexes of that length means you haven't got a
petrol drum's hope in hell of understanding what's going on
(5) too many variable-length patterns, will take a finite (but very
long) time to evaluate
(6) as remarked by others, you haven't said what you are trying to do;
what it actually is doing doesn't look sensible (see below).

Following code is after fixing problems 1,2,3,4:

C:\junk>type infinitere.py
import re
text = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA)
una '
regex0 = r"""
[^A-Z0-9]* # match leading space
(
(?:
[0-9]* # match nothing
[A-Z]+ # match "MSX"
[0-9a-z\-]* # match nothing
)+ # match "MSX"
\s* # match " "
[a-z]* # match nothing
\s* # match nothing
(?:
[0-9]*
[A-Z]+
[0-9a-z\-]*
\s*
)* # match "INTERNATIO NAL HOLDINGS ITALIA "
)
([^A-Z]*) # match "srl (di sequito "
"""
regex1 = regex0 + "$"
for rxno, rx in enumerate([regex0, regex1]):
mobj = re.compile(rx, re.VERBOSE).mat ch(text)
if mobj:
print rxno, mobj.groups()
else:
print rxno, "failed"

C:\junk>infinit ere.py
0 ('MSX INTERNATIONAL HOLDINGS ITALIA ', 'srl (di seguito ')
### taking a long time, interrupted

HTH,
John
Jun 27 '08 #5
On Jun 26, 8:29*am, John Machin <sjmac...@lexic on.netwrote:
(2) ALWAYS use a raw string for regexes; your \s* will match on lower-
case 's', not on spaces
and should have written:
(2) ALWAYS use a raw string for regexes. <<<=== Big fat full stop
aka period.
but he was at the time only half-way through the first cup of coffee
for the day :-)
Jun 27 '08 #6
On 25 Jun 2008 15:20:04 GMT, Kirk <no*****@yahoo. comwrote:
Hi All,
the following regular expression matching seems to enter in a infinite
loop:

############### #
import re
text = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA)
una '
re.findall('[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9]
*[A-Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$', text)
############### ##

No problem with perl with the same expression:

############### ##
$s = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA) una
';
$s =~ /[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9]*[A-
Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$/;
print $1;
############### ##

I've python 2.5.2 on Ubuntu 8.04.
any idea?
If it will help some smarter person identify the problem, it can
be simplified to this:

re.findall('[^X]*((?:0*X+0*)+\s *a*\s*(?:0*X+0* \s*)*)([^X]*)$',
"XXXXXXXXXXXXXX XXX (X" )

This doesn't actually hang, it just takes a long time. The
time taken increases quickly as the chain of X's gets longer.

HTH

--
To email me, substitute nowhere->spamcop, invalid->net.
Jun 27 '08 #7
On Wed, 25 Jun 2008 15:29:38 -0700, John Machin wrote:
Several problems:
Ciao John (and All partecipating in this thread),
first of all I'm sorry for the delay but I was out for business.
(1) lose the vertical bars (as advised by others) (2) ALWAYS use a raw
string for regexes; your \s* will match on lower- case 's', not on
spaces
right! thanks!
(3) why are you using findall on a pattern that ends in "$"?
Yes, you are right, I started with a different need and then it changed
over time...
(6) as remarked by others, you haven't said what you are trying to do;
I reply here to all of you about such point: that's not important,
although I appreciate very much your suggestions!
My point was 'something that works in Perl, has problems in Python'.
In respect to this, I thank Peter for his analysis.
Probably Perl has a different pattern matching algorithm.

Thanks again to all of you!

Bye!

--
Kirk
Jun 30 '08 #8
On Jul 1, 12:45 am, Kirk <nore...@yahoo. comwrote:
On Wed, 25 Jun 2008 15:29:38 -0700, John Machin wrote:
Several problems:

Ciao John (and All partecipating in this thread),
first of all I'm sorry for the delay but I was out for business.
(1) lose the vertical bars (as advised by others) (2) ALWAYS use a raw
string for regexes; your \s* will match on lower- case 's', not on
spaces

right! thanks!
(3) why are you using findall on a pattern that ends in "$"?

Yes, you are right, I started with a different need and then it changed
over time...
(6) as remarked by others, you haven't said what you are trying to do;

I reply here to all of you about such point: that's not important,
although I appreciate very much your suggestions!
My point was 'something that works in Perl, has problems in Python'.
It *is* important; our point was 'you didn't define "works", and it
was near-impossible (without transcribing your regex into verbose
mode) to guess at what you suppose it might do sometimes'.
Jun 30 '08 #9
On Mon, 30 Jun 2008 13:43:22 -0700, John Machin wrote:
>I reply here to all of you about such point: that's not important,
although I appreciate very much your suggestions! My point was
'something that works in Perl, has problems in Python'.

It *is* important; our point was 'you didn't define "works", and it was
ok...
near-impossible (without transcribing your regex into verbose mode) to
guess at what you suppose it might do sometimes'.
fine: it's supposed to terminate! :-)

Do you think that hanging is an *admissible* behavior? Couldn't we learn
something from Perl implementation?

This is my point.

Bye

--
Kirk
Jul 1 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
2516
by: Bradley Plett | last post by:
I'm hopeless at regular expressions (I just don't use them often enough to gain/maintain knowledge), but I need one now and am looking for help. I need to parse through a document to find a URL, and then reconstruct another URL based on it. For example, I need to scan a web page looking for something like <a href="some_dir/list_20050815100225.csv">. I don't know in advance what the date/time in the file name will be. I need to take the...
4
3218
by: Neri | last post by:
Some document processing program I write has to deal with documents that have headers and footers that are unnecessary for the main processing part. Therefore, I'm using a regular expression to go over each document, find out if it contains a header and/or a footer and extract only the main content part. The headers and the footers have no specific format and I have to detect and remove them using a list of strings that may appear as...
6
2287
by: Ludwig | last post by:
Hi, i'm using the regular expression \b\w to find the beginning of a word, in my C# application. If the word is 'public', for example, it works. However, if the word is '<public', it does not work: it seems that < is not a valid character, so the beginning of the word starts at theletter 'p' instead of '<'. Because I'm not an expert in regular expressions, maybe someone of you guys can help me? I need the correct regex to find the...
17
2780
by: Mark | last post by:
I must create a routine that finds tokens in small, arbitrary VB code snippets. For example, it might have to find all occurrences of {Formula} I was thinking that using regular expressions might be a neat way to solve this, but I am new to them. Can anyone give me a hint here? The catch is, it must only find tokens that are not quoted and not commented; examples follow
1
3176
by: jmalone | last post by:
I have a python script that I need to freeze on AIX 5.1 (customer has AIX and does not want to install Python). The python script is pretty simple (the only things it imports are sys and socket). The README file in the Tools/freeze directory of the Python-2.4.4 distribution says the following (and many other things): Previous versions of Freeze used a pretty simple-minded algorithm to
6
2853
by: rorymo | last post by:
I have a regular expression that allows only certain characters to be valid in an xml doc as follows: <xs:pattern value="^*" /> What I want to do is also allow any unicode character that is enclosed in single quotes to also be valid, no matter where they appear. I tried the following: <xs:pattern value="^*('*)*" />
5
3785
by: shawnmkramer | last post by:
Anyone every heard of the Regex.IsMatch and Regex.Match methods just hanging and eventually getting a message "Requested Service not found"? I have the following pattern: ^(?<OrgCity>(+)+), City of, (?<OrgState>(()|( +\.)))( \((?<OrgCountry>{2,})\))?$ (ignore the line wrap)
0
6180
by: altavim | last post by:
Usually when you make regular expression to extract text you are starting from simple expression. When you got to know target text, you are extending your expression. Subsequently very hard to ready long set of special symbols and impossible to improve such expression. We have to create ’smart’ regular expression. Instead of write one line expression we prepare multi line text from which we shall generate our long expression. Here is a simple...
1
3386
by: NvrBst | last post by:
I want to use the .replace() method with the regular expression /^ %VAR % =,($|&)/. The following DOESN'T replace the "^default.aspx=,($|&)" regular expression with "": --------------------------------- myStringVar = myStringVar.replace("^" + iName + "=,($|&)", ""); --------------------------------- The following DOES replace it though: --------------------------------- var match = myStringVar.match("^" + iName + "=,($|&)");
0
8256
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8189
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8635
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8497
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7184
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6118
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4193
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2621
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1500
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.