473,654 Members | 3,033 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

catastrophic regexp, help!

pat = re.compile("(\w * *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".

i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.
patzln = re.compile("(\w * *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.

Jun 27 '08 #1
7 1075
Le Wednesday 11 June 2008 06:20:14 cirfu, vous avez écrit*:
pat = re.compile("(\w * *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".

i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.
patzln = re.compile("(\w * *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.
This kind of regexp are quite often harmfull, while perfectly valid, if you
take the time it will return, this check too many things to be practical.

Read it, sequentially to make it sensible : for each sequence of word + space,
trying with the longest first, does the string 'zlatan' follow ?

"this is zlatan example.'
compare with 'this is zlatan example', 'z'=='.', false
compare with 'this is zlatan ', 'z'=='e', false
compare with 'this is zlatan', 'z'==' ', false
compare with 'this is ', "zlatan"=="zlat an", true
compare with 'this is', 'z'==' ', false
compare with 'this ', 'z'=='i', false
compare with 'this', 'z'==' ', false
...

ouch !

The most simple are your regex, better they are, two short regex are better
then one big, etc...
Don't do premature optimization (especially with regexp).

In [161]: s="""pat = re.compile("(\w * *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".
i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.
patzln = re.compile("(\w * *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.
"""

In [172]: list(e[0] for e in re.findall("((\ w+\s*)+)", s, re.M) if
re.findall('zla tan\s+ibrahimov ic', e[0], re.I))
Out[172]:
['i want to find a in a big string a sentence containing Zlatan\nIbrahim ovic
and some other text',
'ie return the first sentence containing the name Zlatan Ibrahimovic',
'zlatan ibrahimovic ']

--
_____________

Maric Michaud
Jun 27 '08 #2
Le Wednesday 11 June 2008 09:08:53 Maric Michaud, vous avez écrit*:
"this is zlatan example.'
compare with 'this is zlatan example', 'z'=='.', false
compare with 'this is zlatan ', 'z'=='e', false
compare with 'this is zlatan', 'z'==' ', false
compare with 'this is ', "zlatan"=="zlat an", true
Ah no ! it stops here, but would have continued on the entire string upto the
empty string if it doesn't contain zlatan at all.
compare with 'this is', 'z'==' ', false
compare with 'this ', 'z'=='i', false
compare with 'this', 'z'==' ', false


--
_____________

Maric Michaud
Jun 27 '08 #3
On Jun 11, 6:20*am, cirfu <circularf...@y ahoo.sewrote:
pat = re.compile("(\w * *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".

i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.

patzln = re.compile("(\w * *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.
Maybe something like this would be of use...

def sentence_locato r(s, sub):
cnt = s.upper().count (sub.upper())
if not cnt:
return None
tmp = []
idx = -1
while cnt:
idx = s.upper().find( sub.upper(), (idx+1))
a = -1
while True:
b = s.find('.', (a+1), idx)
if b == -1:
b = s.find('.', idx)
if b == -1:
tmp.append(s[a+1:])
break
tmp.append(s[a+1:b+1])
break
a = b
cnt -= 1
return tmp
Jun 27 '08 #4
On 12:20, mercoledì 11 giugno 2008 cirfu wrote:
patzln = re.compile("(\w * *)* zlatan ibrahimovic (\w* *)*")
I think that I shouldn't put anything around the phrase you want to find.

patzln = re.compile(r'.* (zlatan ibrahimovic){1, 1}.*')

this should do it for you. Unless searching into a special position.

In the other hand, I'd like to understand how I can substitute a variable
inside a pattern.

if I do:
import os, re
EOL= os.linesep

re_EOL= re.compile(r'[?P<EOL>\s+2\t]'))

for line in open('myfile',' r').readlines() :
print re_EOL.sub('',l ine)

Will it remove tabs, spaces and end-of-line ?
It's doing but no EOL :(

--
Mailsweeper Home : http://it.geocities.com/call_me_not_now/index.html
Jun 27 '08 #5
On 11 Juni, 17:04, TheSaint <fc14301...@icq mail.comwrote:
On 12:20, mercoledì 11 giugno 2008 cirfu wrote:
patzln = re.compile("(\w * *)* zlatan ibrahimovic (\w* *)*")

I think that I shouldn't put anything around the phrase you want to find.

patzln = re.compile(r'.* (zlatan ibrahimovic){1, 1}.*')

this should do it for you. Unless searching into a special position.

In the other hand, I'd like to understand how I can substitute a variable
inside a pattern.

if I do:
import os, re
EOL= os.linesep

re_EOL= re.compile(r'[?P<EOL>\s+2\t]'))

for line in open('myfile',' r').readlines() :
print re_EOL.sub('',l ine)

Will it remove tabs, spaces and end-of-line ?
It's doing but no EOL :(

--
Mailsweeper Home :http://it.geocities.com/call_me_not_now/index.html


it returns all the sentences. i just want the one containing zlatan
ibrahimovic.
Jun 27 '08 #6
On 11 Juni, 10:25, Chris <cwi...@gmail.c omwrote:
On Jun 11, 6:20 am, cirfu <circularf...@y ahoo.sewrote:
pat = re.compile("(\w * *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".
i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.
patzln = re.compile("(\w * *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.

Maybe something like this would be of use...

def sentence_locato r(s, sub):
cnt = s.upper().count (sub.upper())
if not cnt:
return None
tmp = []
idx = -1
while cnt:
idx = s.upper().find( sub.upper(), (idx+1))
a = -1
while True:
b = s.find('.', (a+1), idx)
if b == -1:
b = s.find('.', idx)
if b == -1:
tmp.append(s[a+1:])
break
tmp.append(s[a+1:b+1])
break
a = b
cnt -= 1
return tmp

yes, seems very unpythonic though :)
must be a simpler way that isnt slow as hell.
Jun 27 '08 #7
On Jun 11, 11:07 pm, cirfu <circularf...@y ahoo.sewrote:
On 11 Juni, 10:25, Chris <cwi...@gmail.c omwrote:
On Jun 11, 6:20 am, cirfu <circularf...@y ahoo.sewrote:
pat = re.compile("(\w * *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".
i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.
patzln = re.compile("(\w * *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.
Maybe something like this would be of use...
def sentence_locato r(s, sub):
cnt = s.upper().count (sub.upper())
if not cnt:
return None
tmp = []
idx = -1
while cnt:
idx = s.upper().find( sub.upper(), (idx+1))
a = -1
while True:
b = s.find('.', (a+1), idx)
if b == -1:
b = s.find('.', idx)
if b == -1:
tmp.append(s[a+1:])
break
tmp.append(s[a+1:b+1])
break
a = b
cnt -= 1
return tmp

yes, seems very unpythonic though :)
must be a simpler way that isnt slow as hell.
Why wouldn't you use character classes instead of groups? i.e:

pat = re.compile(r'([ \w]*Zlatan Ibrahimivoc[ \w]*)')
sentence = re.match(text). groups()

As has been mentioned earlier, certain evil combinations of regular
expressions and groups will cause python's regular expression engine
to go (righteously) crazy as they require the internal state machine
to branch out exponentially.
Jun 27 '08 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
8116
by: J. Marshall Latham | last post by:
I have written an ASP.NET web app in C# that is trying to connect to a database using OleDb. I put code in a dll that uses another dll to create a connection object (and open it if requested) to send back to the web app to connect to the database. I am getting the following error when I change anything in my web app and recompile. Catastrophic failure Description: An unhandled exception occurred during the execution of the current web...
0
1212
by: pointBoarder | last post by:
So I'm new to access and I keep getting this "catastrophic error" when running this function. I my mind, all I'm doing is a nested for each loop with RecordSets Vs collections. The kicker is, this string will work fine: ' curLine = "call AddBom(""" + inParent + """, """ + curChild + """,""" + curQty + """)" But this one give me a "catastrophic error" 'curLine = "call AddBom(""" + inParent + """, """ + curChild +
0
2186
by: miqbal | last post by:
IM getting this error when i try to run Teecharts ActiveX control in asp.net environment can some one help me in it Regards Moid Iqbal Catastrophic failure Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information
4
338
by: J. Marshall Latham | last post by:
I have written an ASP.NET web app in C# that is trying to connect to a database using OleDb. I put code in a dll that uses another dll to create a connection object (and open it if requested) to send back to the web app to connect to the database. I am getting the following error when I change anything in my web app and recompile. Catastrophic failure Description: An unhandled exception occurred during the execution of the current web...
0
8814
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
8475
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8591
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7304
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6160
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4149
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4293
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2709
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1915
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.