473,396 Members | 1,894 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Efficient way of testing for substring being one of a set?

What's the neatest and/or most efficient way of testing if one of a
set of strings (contained in a dictionary, list or similar) is a
sub-string of a given string?

I.e. I have a string delivered into my program and I want to see if
any of a set of strings is a substring of the string I have been
given. It's quite OK to stop at the first one found. Ideally the
strings being searched through will be the keys of a dictionary but
this isn't a necessity, they can just be in a list if it could be done
more efficiently using a list.
Is this the best one can do (ignoring the likelihood that I've got
some syntax wrong) :-

# l is the list
# str is the incoming string
answer = ""
for x in l:
if str.find(x) < 0:
continue
answer = x
--
Chris Green
Apr 3 '08 #1
9 2510
On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:
What's the neatest and/or most efficient way of testing if one of a
set of strings (contained in a dictionary, list or similar) is a
sub-string of a given string?

I.e. I have a string delivered into my program and I want to see if
any of a set of strings is a substring of the string I have been
given. It's quite OK to stop at the first one found. Ideally the
strings being searched through will be the keys of a dictionary but
this isn't a necessity, they can just be in a list if it could be done
more efficiently using a list.

Is this the best one can do (ignoring the likelihood that I've got
some syntax wrong) :-

# l is the list
# str is the incoming string
answer = ""
for x in l:
if str.find(x) < 0:
continue
answer = x
I'd not use 'l' (confused with '1') or 'str' (a standard module) as
variable names. Your code checks every string in the list even when
it's found one... you can reverse the test and break when the first
one is found. Using 'in' rather than testing the return value of find
is nicer as a substring test. Finally, using the 'else' clause lets
you make it clear that answer is set to the empty string when no match
is found.

for answer in l:
if str in answer: break
else:
answer = ''

--
Paul Hankin
Apr 3 '08 #2
Paul Hankin <pa*********@gmail.comwrote:
On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:
What's the neatest and/or most efficient way of testing if one of a
set of strings (contained in a dictionary, list or similar) is a
sub-string of a given string?

I.e. I have a string delivered into my program and I want to see if
any of a set of strings is a substring of the string I have been
given. It's quite OK to stop at the first one found. Ideally the
strings being searched through will be the keys of a dictionary but
this isn't a necessity, they can just be in a list if it could be done
more efficiently using a list.

Is this the best one can do (ignoring the likelihood that I've got
some syntax wrong) :-

# l is the list
# str is the incoming string
answer = ""
for x in l:
if str.find(x) < 0:
continue
answer = x

I'd not use 'l' (confused with '1') or 'str' (a standard module) as
variable names.
Neither would I, it was just thrown together to show how I was
thinking.
Your code checks every string in the list even when
it's found one... you can reverse the test and break when the first
one is found. Using 'in' rather than testing the return value of find
is nicer as a substring test. Finally, using the 'else' clause lets
you make it clear that answer is set to the empty string when no match
is found.

for answer in l:
if str in answer: break
else:
answer = ''
OK, that does improve things somewhat, thanks.

--
Chris Green
Apr 3 '08 #3
Ant
On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:
What's the neatest and/or most efficient way of testing if one of a
A different approach:
>>words = ["he", "sh", "bla"]
name = "blah"
True in (word in name for word in words)
True
>>name = "bling"
True in (word in name for word in words)
False

Perhaps not as obvious or readable as Jeff's example, but is
essentially doing the same thing using generator syntax.
Apr 3 '08 #4
On Apr 3, 8:44*am, Ant <ant...@gmail.comwrote:
On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:
What's the neatest and/or most efficient way of testing if one of a

A different approach:
>words = ["he", "sh", "bla"]
name = "blah"
True in (word in name for word in words)

True
>name = "bling"
True in (word in name for word in words)

False

Perhaps not as obvious or readable as Jeff's example, but is
essentially doing the same thing using generator syntax.
That's pretty :)
Apr 3 '08 #5
On Apr 3, 1:37 pm, tinn...@isbd.co.uk wrote:
What's the neatest and/or most efficient way of testing if one of a
set of strings (contained in a dictionary, list or similar) is a
sub-string of a given string?
[...]
You could use the Aho-Corasick algorithm <http://en.wikipedia.org/wiki/
Aho-Corasick_algorithm>.
I don't know if there's a Python implementation yet.

Dennis Benzinger
Apr 3 '08 #6
Dennis Benzinger:
You could use the Aho-Corasick algorithm <http://en.wikipedia.org/wiki/
Aho-Corasick_algorithm>.
I don't know if there's a Python implementation yet.
http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/

Bye,
bearophile
Apr 3 '08 #7
On Apr 3, 9:00 am, Jeff <jeffo...@gmail.comwrote:
On Apr 3, 8:44 am, Ant <ant...@gmail.comwrote:
On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:
What's the neatest and/or most efficient way of testing if one of a
A different approach:
>>words = ["he", "sh", "bla"]
>>name = "blah"
>>True in (word in name for word in words)
True
>>name = "bling"
>>True in (word in name for word in words)
False
Perhaps not as obvious or readable as Jeff's example, but is
essentially doing the same thing using generator syntax.

That's pretty :)
It's even prettier in 2.5:

any(word in name for word in words)

George
Apr 3 '08 #8
Ant
On Apr 3, 2:27 pm, George Sakkis <george.sak...@gmail.comwrote:
....
It's even prettier in 2.5:

any(word in name for word in words)

George
And arguably the most readable yet!
Apr 3 '08 #9
be************@lycos.com wrote:
Dennis Benzinger:
>You could use the Aho-Corasick algorithm <http://en.wikipedia.org/wiki/
Aho-Corasick_algorithm>.
I don't know if there's a Python implementation yet.

http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/
http://nicolas.lehuen.com/download/pytst/ can do it as well.
Apr 3 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: junky_fellow | last post by:
Can anyone suggest some efficient way to search a substring in a text file. For eg. suppose i want to search the pattern "abc" in a text file, then which algorithm should i use. I just want some...
8
by: Jami Bradley | last post by:
Hi, I'm looking for an efficient way to do this, because I know it will be heavily used :-) I have a fixed width string and I need to substitute a substring of characters with new values. I...
11
by: Darren Anderson | last post by:
I have a function that I've tried using in an if then statement and I've found that no matter how much reworking I do with the code, the expected result is incorrect. the code: If Not...
0
by: Ankit Aneja | last post by:
string comm="CONTSCAN E:\\projects backup\\ankitclam backup\\Clamtest\\testing\\hello.txt\r\n" int x=comm.Length; x=x-7; string path; path=comm.Substring(9,x); MessageBox.Show(path); it...
29
by: Ajay | last post by:
Hi all,Could anybody tell me the most efficient method to find a substr in a string.
4
by: Steven Blair | last post by:
Hi, I get passed a string like these examples: 3.1 4.23 1.1 5 I need to seperate the values into two integer variables.
4
by: Jason | last post by:
I have a string that looks like this? ihelloworld_zzz_yyy In a single vb.net line, I want to remove the frist character and anything after and including the first "_" resulting in : ...
11
by: VK | last post by:
In the continuation of the discussion at "Making Site Opaque -- This Strategy Feasible?" and my comment at http://groups.google.com/group/comp.lang.javascript/msg/b515a4408680e8e2 I have...
3
by: =?Utf-8?B?anAybXNmdA==?= | last post by:
Two part question: 1. Is Regex more efficient than manually comparing values using Substring? 2. I've never created a Regex expression. How would I use regex to do the equivalent of what I...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.