By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,319 Members | 1,963 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,319 IT Pros & Developers. It's quick & easy.

string search function

P: n/a
Hi,
I have written a function that searches a text string for various
words. The text is searched using a boolean 'and' or a boolean 'or' of
the input list of search terms.

Since I need to use this function for many long strings and many
search words, I would like to use as efficient a method as possible.

Are there improvements that can be made to the code below? Are there
better alternatives?
I am currently using Python 2.3.

Thanks.

-g
-----

def bsearch(fterms, stext, btype='AND'):
if btype == 'AND': # boolean 'and' search
found = True
for f in fterms:
if f not in stext:
found = False
break
else: # boolean 'or' search
found = False
for f in fterms:
if f in stext:
found = True
break

return found

if __name__ == '__main__':
stext = "hello to you all"
terms = ['hello', 'goodbye']
btype = 'OR'

if bsearch(terms, stext, btype):
print 'found'
else:
print 'not found'
Jul 18 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
On 23 Jul 2004, gyromagnetic wrote:
Are there improvements that can be made to the code below? Are there
better alternatives?


There are two ways I can think of:

1)

You can replace the 'or' search with a regular expression:

import re

def bsearch(fterms, stext, btype='AND'):
if btype == 'AND': # boolean 'and' search
...
else: # boolean 'or' search
found = bool(re.search(fterms.join('|'),stext))

return found

Assuming Python's regexps are somewhat optimized, this should be about the
fastest search you can get with arbitrary strings. Unfortunately, there's
no direct 'and' equivalent with regexps (it can be done, but it's helluva
ugly). If you want to optimize 'and', you'll need to use whatever
advanced algorithms our friends at Google use ;)

2)

Assuming your search terms don't contain whitespace, and you want to
compare only entire words (not substrings), replace the string search with
a list search, simply by adding the line 'stext = stext.split()' to the
top of the search function. This will greatly speed up the search by
avoiding checking partial words. The same effect can be achieved with the
regexp above by appending '\s' to the beginning and end of it.

Continuing along this line (whole words, no whitespace), you can replace
the lists with sets for an even greater speedup. The resulting code would
look like this (with a bit of code stolen from sets.py):

from sets import Set

def ispartialsubset(self, other):
"""Report whether another set contains at least one member of this set."""
self._binary_sanity_check(other)
for elt in ifilter(other._data.has_key, self):
return True
return False

Set.ispartialsubset = ispartialsubset

def bsearch(fterms, stext, btype='AND'):
stext = stext.split()
if btype == 'AND': # boolean 'and' search
found = Set(fterms).issubset(Set(stext))
else: # boolean 'or' search
found = Set(fterms).ispartialsubset(Set(stext))
return found

If the same stext or fterms is used multiple times, you could move the
..split() and/or the Set() transformations outside of the function, so they
only have to be done once.

Again, this method (using sets) only works if your words contain no
whitespace, and you're not interested in matching parts of words. It's
/much/ faster than a string search, though.

Hope this helps!

Jul 18 '05 #2

P: n/a
You should take a look at regular expressions (re)
functions. I think you will find that they are
much more efficient and can handle things your
routines can't handle.

http://www.amk.ca/python/howto/regex/

HTH,
Larry Bates
Syscon, Inc.

"gyromagnetic" <gy**********@excite.com> wrote in message
news:46**************************@posting.google.c om...
Hi,
I have written a function that searches a text string for various
words. The text is searched using a boolean 'and' or a boolean 'or' of
the input list of search terms.

Since I need to use this function for many long strings and many
search words, I would like to use as efficient a method as possible.

Are there improvements that can be made to the code below? Are there
better alternatives?
I am currently using Python 2.3.

Thanks.

-g
-----

def bsearch(fterms, stext, btype='AND'):
if btype == 'AND': # boolean 'and' search
found = True
for f in fterms:
if f not in stext:
found = False
break
else: # boolean 'or' search
found = False
for f in fterms:
if f in stext:
found = True
break

return found

if __name__ == '__main__':
stext = "hello to you all"
terms = ['hello', 'goodbye']
btype = 'OR'

if bsearch(terms, stext, btype):
print 'found'
else:
print 'not found'

Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.