Python RegExp - Python

davidchipping

I have a string which I wish to match using RE, however when I run my
comparison (using the match method) on my machine it never returns,
using the CPU fully.

The code is (very simply):

------------------------------
import re

buffer = r"#1 1 550 111 SYNC_PEER RES
<YP>syncpeers=(id=54325432;add=10." \

"0.0.1;port=89;up=89),(id=97899;add=10.0.0.1;port= 543;up=543)," \
"(id=54325432;add=10.0.0.1;port=89;up=8"

p = re.compile(r'#(?P<tran>[0-9]+\s)(?P<sess>[0-9]+\s)' \
+ '(?P<ndto>[0-9]+\s)(?P<ndfr>[0-9]+\s)' \
+ '(?P<cmd>[a-zA-Z_]+\s)(?P<dir>[a-zA-Z_]+)' \
+ '(?P<parm>(\s<[a-zA-Z]+>(\s*[a-zA-Z]+=[a-zA' \
+ '-Z0-9.,=();]+\s*)+<[a-zA-Z]+>)*)#(?P<left>.*$)')

print "starting the match"
rst = (p.match (buffer) != None)
print "finishing the match"
--------------------------------

Should this every actually happen? Regardless of whether the statment
matches or not, surely it should do this?

Jul 18 '05 #1

Subscribe Post Reply

1808

Fredrik Lundh

da***********@gmail.com wrote:

I have a string which I wish to match using RE, however when I run my
comparison (using the match method) on my machine it never returns,
using the CPU fully.

The code is (very simply):

------------------------------
import re

buffer = r"#1 1 550 111 SYNC_PEER RES
<YP>syncpeers=(id=54325432;add=10." \

"0.0.1;port=89;up=89),(id=97899;add=10.0.0.1;port= 543;up=543)," \
"(id=54325432;add=10.0.0.1;port=89;up=8"

p = re.compile(r'#(?P<tran>[0-9]+\s)(?P<sess>[0-9]+\s)' \
+ '(?P<ndto>[0-9]+\s)(?P<ndfr>[0-9]+\s)' \
+ '(?P<cmd>[a-zA-Z_]+\s)(?P<dir>[a-zA-Z_]+)' \
+ '(?P<parm>(\s<[a-zA-Z]+>(\s*[a-zA-Z]+=[a-zA' \
+ '-Z0-9.,=();]+\s*)+<[a-zA-Z]+>)*)#(?P<left>.*$)')

print "starting the match"
rst = (p.match (buffer) != None)
print "finishing the match"
--------------------------------

Should this every actually happen? Regardless of whether the statment
matches or not, surely it should do this?

surely the following should never be slow?

n = len(s)
for a in range(n):
for b in range(n):
for c in range(n):
for d in range(n):
for e in range(n):
... etc ...

(your RE contains several nested and/or related repeat operators,
which forces the poor engine to test zillions of combinations before
giving up. see Friedl's RE book for more information on this)

I suggest using a real parser for this task (or using the RE engine as
a tokenizer, and doing the rest in code; see
http://effbot.org/zone/xml-scanner.htm for more on this)

</F>

Jul 18 '05 #2

Jeff Epler

On my machine the program finishes in 30 seconds. (it's a 1.5GHz machine)
If the 'parm' group is removed, or if the buffer is shortened, the time
is reduced considerably.

There are "pathological cases" for regular expressions which can take
quite a long time. In the case of your expression, it's happening for
the group 'parm'. I think, but don't know, that each time a candidate
for 'parm' is found, the following '#' (or maybe the second '<'?) is not
found, and it backtracks to try to match 'parm' in a different way,
which involves considering many different combinations (basically, each
'name=' could be the start of a new instance of the first parenthsized
subgroup of <parm>, or it could be part of the character class that
includes [a-zA-Z=])

You may wish to consider using other approaches for parsing this text
than regular expressions.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFCQK+2Jd01MZaTXX0RApByAJ4hIU1RVFWwdOeX28RLwQ H1rSk5awCeJXrE
mhBvK2/gMBghI/VamQfVFYc=
=8+da
-----END PGP SIGNATURE-----

Jul 18 '05 #3

David Chipping

Jeff,

Thanks very much for that, I didn't even consider the option of it
finishing, considering I'm using a much slower machine it was running
for over 2 minutes when I just killed it! I think I get the rest now.

Cheers again,

-David
On Tue, 22 Mar 2005 17:52:22 -0600, Jeff Epler <je****@unpythonic.net> wrote:

On my machine the program finishes in 30 seconds. (it's a 1.5GHz machine)
If the 'parm' group is removed, or if the buffer is shortened, the time
is reduced considerably.

There are "pathological cases" for regular expressions which can take
quite a long time. In the case of your expression, it's happening for
the group 'parm'. I think, but don't know, that each time a candidate
for 'parm' is found, the following '#' (or maybe the second '<'?) is not
found, and it backtracks to try to match 'parm' in a different way,
which involves considering many different combinations (basically, each
'name=' could be the start of a new instance of the first parenthsized
subgroup of <parm>, or it could be part of the character class that
includes [a-zA-Z=])

You may wish to consider using other approaches for parsing this text
than regular expressions.

Jeff

Jul 18 '05 #4

D H

da***********@gmail.com wrote:

I have a string which I wish to match using RE, however when I run my
comparison (using the match method) on my machine it never returns,
using the CPU fully.

In your case it may be simpler to just split the string into groups.
You don't even need regular expressions or a parser.

buff = r"#1 1 550 111 SYNC_PEER RES <YP>syncpeers=(id=54325432;add=10." \
"0.0.1;port=89;up=89),(id=97899;add=10.0.0.1;port= 543;up=543)," \
"(id=54325432;add=10.0.0.1;port=89;up=8)"

tran, sess, ndto, ndf, cmd, dirr, rest = buff.split()

eq = rest.find("=")
parmname = rest[0:eq]
parms = rest[eq+1:].split(",")

for parm in parms:
parmitems = parm[1:-1].split(";")
for item in parmitems:
name, val = item.split("=")
print name, val
print "---"

Jul 18 '05 #5

Similar topics

Python versus VB

by: Simon Foster | last post by:

I am meeting with a potential client next week to discuss starting a new project. His preference is for VB whereas I would prefer Python for obvious reasons ;-) Could anybody come up with a...

Python

python tool for ultraedit

by: Mark S Pryor | last post by:

dear c.l.p, I'm posting a Py script to change the wordfile from inside the UE editor. Define this into the adv. tool menu. I found 100 posts in groups.google discussing Python and UltraEdit....

Python

Why Python?

by: Todd7 | last post by:

I am looking at learning Python, but I would like to know what its strengths and weaknesses are before I invest much time in learning it. My first thought is what is it good for programming,...

Python

IDE

by: Thomas Lindgaard | last post by:

Hello I am probably going to start a war now... but so be it :) I just want to hear what all you guys who eat pythons for breakfast use for python coding. Currently I use Kate, but I would...

Python

Help needed: cryptic perl regular expression in python syntax

by: pekka niiranen | last post by:

Hi there, I have perl script that uses dynamically constructed regular in this way: ------perl code starts ---- $result ""; $key = AAA\?01; $key = quotemeta $key; $line = " ...

Python

Python's regular expression?

by: Davy | last post by:

Hi all, I am a C/C++/Perl user and want to switch to Python (I found Python is more similar to C). Does Python support robust regular expression like Perl? And Python and Perl's File...

Python

Python sqlite and regex.

by: Julien ARNOUX | last post by:

Hi, I'd like to use regular expressions in sqlite query, I using apsw module but it doesn't work...Can you help me ? My script: import apsw import re path = 'db/db.db3'

Python

unexpected behaviour for python regexp: caret symbol almost useless?

by: conan | last post by:

This regexp '<widget class=".*" id=".*">' works well with 'grep' for matching lines of the kind <widget class="GtkWindow" id="window1"> on a XML .glade file However that's not true for the...

Python

How to get the "longest possible" match with Python's RE module?

by: Licheng Fang | last post by:

Basically, the problem is this: 'do' Python's NFA regexp engine trys only the first option, and happily rests on that. There's another example: 'oneself' The Python regular expression...

Python

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server