By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,222 Members | 2,354 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,222 IT Pros & Developers. It's quick & easy.

regex searching hangs

P: n/a
Could someone explains why the following code hangs (Python 2.3.3)?

--- CODE STARTS ---
import re

p=re.compile(r'\S+(?:\.\S+)+\.com')

t='......................................'
p.search(t)
--- CODE ENDS ---

From the syntax and semantics of regex, the entire t should be
consumed by the pattern '\S+(?:\.\S+)+' and the search() call should
return None. Is search() doing a depth-first search and can't pulling
itself out? It doesn't seem right to me, or I'm missing something?

BTW, I don't think any regex matching should ever hang in any
circumstances - correct me if I'm wrong: is halting problem a problem
here?

Thanks!
Jul 18 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a

"Fortepianissimo" <fo*************@yahoo.com.tw> wrote in message
news:ef**************************@posting.google.c om...
Could someone explains why the following code hangs (Python 2.3.3)?

--- CODE STARTS ---
import re

p=re.compile(r'\S+(?:\.\S+)+\.com')

t='......................................'
p.search(t)
--- CODE ENDS ---


The search doesn't hang, it just takes a long time to compute - the time it
takes almost doubles for each dot you add to t.
If you replace \S in the expression with [^ \t\n\r\f\v.] (ie. no whitespace
and no dots) you'll get the computation time you expect.
Jul 18 '05 #2

P: n/a
fo*************@yahoo.com.tw (Fortepianissimo) wrote in message news:<ef**************************@posting.google. com>...
Could someone explains why the following code hangs (Python 2.3.3)?

--- CODE STARTS ---
import re

p=re.compile(r'\S+(?:\.\S+)+\.com')

t='......................................'
p.search(t)
--- CODE ENDS ---

From the syntax and semantics of regex, the entire t should be
consumed by the pattern '\S+(?:\.\S+)+' and the search() call should
return None. Is search() doing a depth-first search and can't pulling
itself out? It doesn't seem right to me, or I'm missing something?

BTW, I don't think any regex matching should ever hang in any
circumstances - correct me if I'm wrong: is halting problem a problem
here?

Thanks!

It doesn't halt, it just seems you found an exponential time search,
searching at length i seems to take about 1.6 times the time it takes
to search at length i-1.

import re
import time
p=re.compile(r'\S+(?:\.\S+)+\.com')
t = '.'
for i in range(100):
start = time.time()
p.search(t*i)
print i, time.time() - start

How should you improve it? I don't know, I can't even read the regexp
:)
Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.