472,358 Members | 1,979 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,358 software developers and data experts.

catastrophic regexp, help!

pat = re.compile("(\w* *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".

i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.
patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.

Jun 27 '08 #1
7 1014
Le Wednesday 11 June 2008 06:20:14 cirfu, vous avez écrit*:
pat = re.compile("(\w* *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".

i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.
patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.
This kind of regexp are quite often harmfull, while perfectly valid, if you
take the time it will return, this check too many things to be practical.

Read it, sequentially to make it sensible : for each sequence of word + space,
trying with the longest first, does the string 'zlatan' follow ?

"this is zlatan example.'
compare with 'this is zlatan example', 'z'=='.', false
compare with 'this is zlatan ', 'z'=='e', false
compare with 'this is zlatan', 'z'==' ', false
compare with 'this is ', "zlatan"=="zlatan", true
compare with 'this is', 'z'==' ', false
compare with 'this ', 'z'=='i', false
compare with 'this', 'z'==' ', false
...

ouch !

The most simple are your regex, better they are, two short regex are better
then one big, etc...
Don't do premature optimization (especially with regexp).

In [161]: s="""pat = re.compile("(\w* *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".
i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.
patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.
"""

In [172]: list(e[0] for e in re.findall("((\w+\s*)+)", s, re.M) if
re.findall('zlatan\s+ibrahimovic', e[0], re.I))
Out[172]:
['i want to find a in a big string a sentence containing Zlatan\nIbrahimovic
and some other text',
'ie return the first sentence containing the name Zlatan Ibrahimovic',
'zlatan ibrahimovic ']

--
_____________

Maric Michaud
Jun 27 '08 #2
Le Wednesday 11 June 2008 09:08:53 Maric Michaud, vous avez écrit*:
"this is zlatan example.'
compare with 'this is zlatan example', 'z'=='.', false
compare with 'this is zlatan ', 'z'=='e', false
compare with 'this is zlatan', 'z'==' ', false
compare with 'this is ', "zlatan"=="zlatan", true
Ah no ! it stops here, but would have continued on the entire string upto the
empty string if it doesn't contain zlatan at all.
compare with 'this is', 'z'==' ', false
compare with 'this ', 'z'=='i', false
compare with 'this', 'z'==' ', false


--
_____________

Maric Michaud
Jun 27 '08 #3
On Jun 11, 6:20*am, cirfu <circularf...@yahoo.sewrote:
pat = re.compile("(\w* *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".

i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.

patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.
Maybe something like this would be of use...

def sentence_locator(s, sub):
cnt = s.upper().count(sub.upper())
if not cnt:
return None
tmp = []
idx = -1
while cnt:
idx = s.upper().find(sub.upper(), (idx+1))
a = -1
while True:
b = s.find('.', (a+1), idx)
if b == -1:
b = s.find('.', idx)
if b == -1:
tmp.append(s[a+1:])
break
tmp.append(s[a+1:b+1])
break
a = b
cnt -= 1
return tmp
Jun 27 '08 #4
On 12:20, mercoledì 11 giugno 2008 cirfu wrote:
patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
I think that I shouldn't put anything around the phrase you want to find.

patzln = re.compile(r'.*(zlatan ibrahimovic){1,1}.*')

this should do it for you. Unless searching into a special position.

In the other hand, I'd like to understand how I can substitute a variable
inside a pattern.

if I do:
import os, re
EOL= os.linesep

re_EOL= re.compile(r'[?P<EOL>\s+2\t]'))

for line in open('myfile','r').readlines():
print re_EOL.sub('',line)

Will it remove tabs, spaces and end-of-line ?
It's doing but no EOL :(

--
Mailsweeper Home : http://it.geocities.com/call_me_not_now/index.html
Jun 27 '08 #5
On 11 Juni, 17:04, TheSaint <fc14301...@icqmail.comwrote:
On 12:20, mercoledì 11 giugno 2008 cirfu wrote:
patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")

I think that I shouldn't put anything around the phrase you want to find.

patzln = re.compile(r'.*(zlatan ibrahimovic){1,1}.*')

this should do it for you. Unless searching into a special position.

In the other hand, I'd like to understand how I can substitute a variable
inside a pattern.

if I do:
import os, re
EOL= os.linesep

re_EOL= re.compile(r'[?P<EOL>\s+2\t]'))

for line in open('myfile','r').readlines():
print re_EOL.sub('',line)

Will it remove tabs, spaces and end-of-line ?
It's doing but no EOL :(

--
Mailsweeper Home :http://it.geocities.com/call_me_not_now/index.html


it returns all the sentences. i just want the one containing zlatan
ibrahimovic.
Jun 27 '08 #6
On 11 Juni, 10:25, Chris <cwi...@gmail.comwrote:
On Jun 11, 6:20 am, cirfu <circularf...@yahoo.sewrote:
pat = re.compile("(\w* *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".
i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.
patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.

Maybe something like this would be of use...

def sentence_locator(s, sub):
cnt = s.upper().count(sub.upper())
if not cnt:
return None
tmp = []
idx = -1
while cnt:
idx = s.upper().find(sub.upper(), (idx+1))
a = -1
while True:
b = s.find('.', (a+1), idx)
if b == -1:
b = s.find('.', idx)
if b == -1:
tmp.append(s[a+1:])
break
tmp.append(s[a+1:b+1])
break
a = b
cnt -= 1
return tmp

yes, seems very unpythonic though :)
must be a simpler way that isnt slow as hell.
Jun 27 '08 #7
On Jun 11, 11:07 pm, cirfu <circularf...@yahoo.sewrote:
On 11 Juni, 10:25, Chris <cwi...@gmail.comwrote:
On Jun 11, 6:20 am, cirfu <circularf...@yahoo.sewrote:
pat = re.compile("(\w* *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".
i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.
patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.
Maybe something like this would be of use...
def sentence_locator(s, sub):
cnt = s.upper().count(sub.upper())
if not cnt:
return None
tmp = []
idx = -1
while cnt:
idx = s.upper().find(sub.upper(), (idx+1))
a = -1
while True:
b = s.find('.', (a+1), idx)
if b == -1:
b = s.find('.', idx)
if b == -1:
tmp.append(s[a+1:])
break
tmp.append(s[a+1:b+1])
break
a = b
cnt -= 1
return tmp

yes, seems very unpythonic though :)
must be a simpler way that isnt slow as hell.
Why wouldn't you use character classes instead of groups? i.e:

pat = re.compile(r'([ \w]*Zlatan Ibrahimivoc[ \w]*)')
sentence = re.match(text).groups()

As has been mentioned earlier, certain evil combinations of regular
expressions and groups will cause python's regular expression engine
to go (righteously) crazy as they require the internal state machine
to branch out exponentially.
Jun 27 '08 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: J. Marshall Latham | last post by:
I have written an ASP.NET web app in C# that is trying to connect to a database using OleDb. I put code in a dll that uses another dll to create a connection object (and open it if requested) to...
0
by: pointBoarder | last post by:
So I'm new to access and I keep getting this "catastrophic error" when running this function. I my mind, all I'm doing is a nested for each loop with RecordSets Vs collections. The kicker is,...
0
by: miqbal | last post by:
IM getting this error when i try to run Teecharts ActiveX control in asp.net environment can some one help me in it Regards Moid Iqbal Catastrophic failure Description: An unhandled...
4
by: J. Marshall Latham | last post by:
I have written an ASP.NET web app in C# that is trying to connect to a database using OleDb. I put code in a dll that uses another dll to create a connection object (and open it if requested) to...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge required to effectively administer and manage Oracle...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
1
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. header("Location:".$urlback); Is this the right layout the...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and credentials and received a successful connection...
1
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web server and have made sure to enable curl. I get a...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
BLUEPANDA
by: BLUEPANDA | last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS starter kit that's not only easy to use but also...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.