Sorry for being too brief!
I was talking about a function which 'counts' the number
of occurences using string & regexp.
I wrote the code for the regexp search as well as the function
search and tested it on a rather large file (800 KB) for
occurences of a certain word. I find that the string search
is at least 2 times faster than the one with regexp, excluding
the time for the regexp.compile() method. This is particularly
noticeable when the file becomes quite large and the word is
spread out.
I also thought the regexp would beat string thumbs down and I
am suprised at the result that it is the other way around.
Here is the code. Note that I am using the 'count' methods that
count the number of occurences rather than the 'find' methods.
# Test to find out whether string search in a data
# is faster than regexp search.
# Results: String search is much faster when it comes
# to many occurences of the sub string.
import time
def strsearch1(s, substr):
t1 = time.time()
print 'Count 1 =>', s.count(substr)
t2 = time.time()
print 'Searching using string, Time taken => ', t2 - t1
def strsearch2(s, substr):
import re
r=re.compile(substr, re.IGNORECASE)
t1 = time.time()
print 'Count 2 =>', len(r.findall(s))
t2 = time.time()
print 'Searching using regexp, Time taken => ', t2 - t1
data=open("test.html", "r").read()
strsearch1(data, "Miriam")
strsearch2(data, "Miriam")
# Output here...
D:\Programming\python>python strsearch.py
Count 1 => 45
Searching using string, Time taken => 0.0599999427795
Count 2 => 45
Searching using regexp, Time taken => 0.110000014305
Test was done on a windows 98 machine using Python 2.3, running
on 248 MB RAM, Intel 1.7 GHz chipset.
I was thinking of using regexp searches in my code, but this convinces
me to stick on to the good old string search.
Thanks for the replies.
-Anand
Duncan Booth <du****@NOSPAMrcp.co.uk> wrote in message news:<Xn***************************@127.0.0.1>...
py*******@Hotpop.com (Anand Pillai) wrote in
news:84*************************@posting.google.co m:
To search a word in a group of words, say a paragraph or a web page,
would a string search or a regexp search be faster?
The string search would of course be,
if str.find(substr) != -1:
domything()
And the regexp search assuming no case restriction would be,
strre=re.compile(substr, re.IGNORECASE)
m=strre.search(str)
if m:
domything()
I was about to do a test, then I thought someone here might have
some data on this already.
Yes. The answer is 'it all depends'.
Things it depends on include:
Your two bits of code do different things, one is case sensitive, one
ignores case. Which did you need?
How long is the string you are searching? How long is the substring?
Is the substring the same every time, or are you always searching for
different strings. Can the substring contain characters with special
meanings for regular expressions?
The regular expression code has a startup penalty since it has to compile
the regular expression at least once, however the actual searching may be
faster than the naive str.find. If the time spent doing the search is
sufficiently long compared with the time doing the compile, the regular
expression may win out.
Bottom line: write the code so it is as clean and maintainable as possible.
Only worry about optimising this if you have timed it and know that your
searches are a bottleneck.