catastrophic regexp, help!

cirfu

pat = re.compile("(\w* *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".

i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.
patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.

Jun 27 '08 #1

Subscribe Post Reply

1065

Maric Michaud

Le Wednesday 11 June 2008 06:20:14 cirfu, vous avez écrit*:

pat = re.compile("(\w* *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".

i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.
patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.

This kind of regexp are quite often harmfull, while perfectly valid, if you
take the time it will return, this check too many things to be practical.

Read it, sequentially to make it sensible : for each sequence of word + space,
trying with the longest first, does the string 'zlatan' follow ?

"this is zlatan example.'
compare with 'this is zlatan example', 'z'=='.', false
compare with 'this is zlatan ', 'z'=='e', false
compare with 'this is zlatan', 'z'==' ', false
compare with 'this is ', "zlatan"=="zlatan", true
compare with 'this is', 'z'==' ', false
compare with 'this ', 'z'=='i', false
compare with 'this', 'z'==' ', false
...

ouch !

The most simple are your regex, better they are, two short regex are better
then one big, etc...
Don't do premature optimization (especially with regexp).

In [161]: s="""pat = re.compile("(\w* *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".
i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.
patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.
"""

In [172]: list(e[0] for e in re.findall("((\w+\s*)+)", s, re.M) if
re.findall('zlatan\s+ibrahimovic', e[0], re.I))
Out[172]:
['i want to find a in a big string a sentence containing Zlatan\nIbrahimovic
and some other text',
'ie return the first sentence containing the name Zlatan Ibrahimovic',
'zlatan ibrahimovic ']

--
_____________

Maric Michaud

Jun 27 '08 #2

Maric Michaud

Le Wednesday 11 June 2008 09:08:53 Maric Michaud, vous avez écrit*:

"this is zlatan example.'
compare with 'this is zlatan example', 'z'=='.', false
compare with 'this is zlatan ', 'z'=='e', false
compare with 'this is zlatan', 'z'==' ', false
compare with 'this is ', "zlatan"=="zlatan", true

Ah no ! it stops here, but would have continued on the entire string upto the
empty string if it doesn't contain zlatan at all.

compare with 'this is', 'z'==' ', false
compare with 'this ', 'z'=='i', false
compare with 'this', 'z'==' ', false

--
_____________

Maric Michaud

Jun 27 '08 #3

Chris

On Jun 11, 6:20*am, cirfu <circularf...@yahoo.sewrote:

pat = re.compile("(\w* *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".

i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.

patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.

Maybe something like this would be of use...

def sentence_locator(s, sub):
cnt = s.upper().count(sub.upper())
if not cnt:
return None
tmp = []
idx = -1
while cnt:
idx = s.upper().find(sub.upper(), (idx+1))
a = -1
while True:
b = s.find('.', (a+1), idx)
if b == -1:
b = s.find('.', idx)
if b == -1:
tmp.append(s[a+1:])
break
tmp.append(s[a+1:b+1])
break
a = b
cnt -= 1
return tmp

Jun 27 '08 #4

TheSaint

On 12:20, mercoledÃ¬ 11 giugno 2008 cirfu wrote:

patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")

I think that I shouldn't put anything around the phrase you want to find.

patzln = re.compile(r'.*(zlatan ibrahimovic){1,1}.*')

this should do it for you. Unless searching into a special position.

In the other hand, I'd like to understand how I can substitute a variable
inside a pattern.

if I do:
import os, re
EOL= os.linesep

re_EOL= re.compile(r'[?P<EOL>\s+2\t]'))

for line in open('myfile','r').readlines():
print re_EOL.sub('',line)

Will it remove tabs, spaces and end-of-line ?
It's doing but no EOL :(

--
Mailsweeper Home : http://it.geocities.com/call_me_not_now/index.html

Jun 27 '08 #5

cirfu

On 11 Juni, 17:04, TheSaint <fc14301...@icqmail.comwrote:

On 12:20, mercoledì 11 giugno 2008 cirfu wrote:

patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")

I think that I shouldn't put anything around the phrase you want to find.

patzln = re.compile(r'.*(zlatan ibrahimovic){1,1}.*')

this should do it for you. Unless searching into a special position.

In the other hand, I'd like to understand how I can substitute a variable
inside a pattern.

if I do:
import os, re
EOL= os.linesep

re_EOL= re.compile(r'[?P<EOL>\s+2\t]'))

for line in open('myfile','r').readlines():
print re_EOL.sub('',line)

Will it remove tabs, spaces and end-of-line ?
It's doing but no EOL :(

--
Mailsweeper Home :http://it.geocities.com/call_me_not_now/index.html

it returns all the sentences. i just want the one containing zlatan
ibrahimovic.

Jun 27 '08 #6

cirfu

On 11 Juni, 10:25, Chris <cwi...@gmail.comwrote:

On Jun 11, 6:20 am, cirfu <circularf...@yahoo.sewrote:

pat = re.compile("(\w* *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".

i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.

patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.

Maybe something like this would be of use...

def sentence_locator(s, sub):
cnt = s.upper().count(sub.upper())
if not cnt:
return None
tmp = []
idx = -1
while cnt:
idx = s.upper().find(sub.upper(), (idx+1))
a = -1
while True:
b = s.find('.', (a+1), idx)
if b == -1:
b = s.find('.', idx)
if b == -1:
tmp.append(s[a+1:])
break
tmp.append(s[a+1:b+1])
break
a = b
cnt -= 1
return tmp

yes, seems very unpythonic though :)
must be a simpler way that isnt slow as hell.

Jun 27 '08 #7

alfasub000

On Jun 11, 11:07 pm, cirfu <circularf...@yahoo.sewrote:

On 11 Juni, 10:25, Chris <cwi...@gmail.comwrote:

On Jun 11, 6:20 am, cirfu <circularf...@yahoo.sewrote:

pat = re.compile("(\w* *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".

i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.

patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.

Maybe something like this would be of use...

def sentence_locator(s, sub):
cnt = s.upper().count(sub.upper())
if not cnt:
return None
tmp = []
idx = -1
while cnt:
idx = s.upper().find(sub.upper(), (idx+1))
a = -1
while True:
b = s.find('.', (a+1), idx)
if b == -1:
b = s.find('.', idx)
if b == -1:
tmp.append(s[a+1:])
break
tmp.append(s[a+1:b+1])
break
a = b
cnt -= 1
return tmp

yes, seems very unpythonic though :)
must be a simpler way that isnt slow as hell.

Why wouldn't you use character classes instead of groups? i.e:

pat = re.compile(r'([ \w]*Zlatan Ibrahimivoc[ \w]*)')
sentence = re.match(text).groups()

As has been mentioned earlier, certain evil combinations of regular
expressions and groups will cause python's regular expression engine
to go (righteously) crazy as they require the internal state machine
to branch out exponentially.

Jun 27 '08 #8

Similar topics

Catastrophic failure

by: J. Marshall Latham | last post by:

I have written an ASP.NET web app in C# that is trying to connect to a database using OleDb. I put code in a dll that uses another dll to create a connection object (and open it if requested) to...

.NET Framework

Catastrophic noob... can anyone help?

by: pointBoarder | last post by:

So I'm new to access and I keep getting this "catastrophic error" when running this function. I my mind, all I'm doing is a nested for each loop with RecordSets Vs collections. The kicker is,...

Microsoft Access / VBA

Catastrophic failure

by: miqbal | last post by:

IM getting this error when i try to run Teecharts ActiveX control in asp.net environment can some one help me in it Regards Moid Iqbal Catastrophic failure Description: An unhandled...

ASP.NET

Catastrophic failure

by: J. Marshall Latham | last post by:

I have written an ASP.NET web app in C# that is trying to connect to a database using OleDb. I put code in a dll that uses another dll to create a connection object (and open it if requested) to...

.NET Framework

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing