By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,285 Members | 1,647 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,285 IT Pros & Developers. It's quick & easy.

Efficient way of testing for substring being one of a set?

P: n/a
What's the neatest and/or most efficient way of testing if one of a
set of strings (contained in a dictionary, list or similar) is a
sub-string of a given string?

I.e. I have a string delivered into my program and I want to see if
any of a set of strings is a substring of the string I have been
given. It's quite OK to stop at the first one found. Ideally the
strings being searched through will be the keys of a dictionary but
this isn't a necessity, they can just be in a list if it could be done
more efficiently using a list.
Is this the best one can do (ignoring the likelihood that I've got
some syntax wrong) :-

# l is the list
# str is the incoming string
answer = ""
for x in l:
if str.find(x) < 0:
continue
answer = x
--
Chris Green
Apr 3 '08 #1
Share this Question
Share on Google+
9 Replies


P: n/a
On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:
What's the neatest and/or most efficient way of testing if one of a
set of strings (contained in a dictionary, list or similar) is a
sub-string of a given string?

I.e. I have a string delivered into my program and I want to see if
any of a set of strings is a substring of the string I have been
given. It's quite OK to stop at the first one found. Ideally the
strings being searched through will be the keys of a dictionary but
this isn't a necessity, they can just be in a list if it could be done
more efficiently using a list.

Is this the best one can do (ignoring the likelihood that I've got
some syntax wrong) :-

# l is the list
# str is the incoming string
answer = ""
for x in l:
if str.find(x) < 0:
continue
answer = x
I'd not use 'l' (confused with '1') or 'str' (a standard module) as
variable names. Your code checks every string in the list even when
it's found one... you can reverse the test and break when the first
one is found. Using 'in' rather than testing the return value of find
is nicer as a substring test. Finally, using the 'else' clause lets
you make it clear that answer is set to the empty string when no match
is found.

for answer in l:
if str in answer: break
else:
answer = ''

--
Paul Hankin
Apr 3 '08 #2

P: n/a
Paul Hankin <pa*********@gmail.comwrote:
On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:
What's the neatest and/or most efficient way of testing if one of a
set of strings (contained in a dictionary, list or similar) is a
sub-string of a given string?

I.e. I have a string delivered into my program and I want to see if
any of a set of strings is a substring of the string I have been
given. It's quite OK to stop at the first one found. Ideally the
strings being searched through will be the keys of a dictionary but
this isn't a necessity, they can just be in a list if it could be done
more efficiently using a list.

Is this the best one can do (ignoring the likelihood that I've got
some syntax wrong) :-

# l is the list
# str is the incoming string
answer = ""
for x in l:
if str.find(x) < 0:
continue
answer = x

I'd not use 'l' (confused with '1') or 'str' (a standard module) as
variable names.
Neither would I, it was just thrown together to show how I was
thinking.
Your code checks every string in the list even when
it's found one... you can reverse the test and break when the first
one is found. Using 'in' rather than testing the return value of find
is nicer as a substring test. Finally, using the 'else' clause lets
you make it clear that answer is set to the empty string when no match
is found.

for answer in l:
if str in answer: break
else:
answer = ''
OK, that does improve things somewhat, thanks.

--
Chris Green
Apr 3 '08 #3

P: n/a
Ant
On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:
What's the neatest and/or most efficient way of testing if one of a
A different approach:
>>words = ["he", "sh", "bla"]
name = "blah"
True in (word in name for word in words)
True
>>name = "bling"
True in (word in name for word in words)
False

Perhaps not as obvious or readable as Jeff's example, but is
essentially doing the same thing using generator syntax.
Apr 3 '08 #4

P: n/a
On Apr 3, 8:44*am, Ant <ant...@gmail.comwrote:
On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:
What's the neatest and/or most efficient way of testing if one of a

A different approach:
>words = ["he", "sh", "bla"]
name = "blah"
True in (word in name for word in words)

True
>name = "bling"
True in (word in name for word in words)

False

Perhaps not as obvious or readable as Jeff's example, but is
essentially doing the same thing using generator syntax.
That's pretty :)
Apr 3 '08 #5

P: n/a
On Apr 3, 1:37 pm, tinn...@isbd.co.uk wrote:
What's the neatest and/or most efficient way of testing if one of a
set of strings (contained in a dictionary, list or similar) is a
sub-string of a given string?
[...]
You could use the Aho-Corasick algorithm <http://en.wikipedia.org/wiki/
Aho-Corasick_algorithm>.
I don't know if there's a Python implementation yet.

Dennis Benzinger
Apr 3 '08 #6

P: n/a
Dennis Benzinger:
You could use the Aho-Corasick algorithm <http://en.wikipedia.org/wiki/
Aho-Corasick_algorithm>.
I don't know if there's a Python implementation yet.
http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/

Bye,
bearophile
Apr 3 '08 #7

P: n/a
On Apr 3, 9:00 am, Jeff <jeffo...@gmail.comwrote:
On Apr 3, 8:44 am, Ant <ant...@gmail.comwrote:
On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:
What's the neatest and/or most efficient way of testing if one of a
A different approach:
>>words = ["he", "sh", "bla"]
>>name = "blah"
>>True in (word in name for word in words)
True
>>name = "bling"
>>True in (word in name for word in words)
False
Perhaps not as obvious or readable as Jeff's example, but is
essentially doing the same thing using generator syntax.

That's pretty :)
It's even prettier in 2.5:

any(word in name for word in words)

George
Apr 3 '08 #8

P: n/a
Ant
On Apr 3, 2:27 pm, George Sakkis <george.sak...@gmail.comwrote:
....
It's even prettier in 2.5:

any(word in name for word in words)

George
And arguably the most readable yet!
Apr 3 '08 #9

P: n/a
be************@lycos.com wrote:
Dennis Benzinger:
>You could use the Aho-Corasick algorithm <http://en.wikipedia.org/wiki/
Aho-Corasick_algorithm>.
I don't know if there's a Python implementation yet.

http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/
http://nicolas.lehuen.com/download/pytst/ can do it as well.
Apr 3 '08 #10

This discussion thread is closed

Replies have been disabled for this discussion.