Efficient way of testing for substring being one of a set?

tinnews

What's the neatest and/or most efficient way of testing if one of a
set of strings (contained in a dictionary, list or similar) is a
sub-string of a given string?

I.e. I have a string delivered into my program and I want to see if
any of a set of strings is a substring of the string I have been
given. It's quite OK to stop at the first one found. Ideally the
strings being searched through will be the keys of a dictionary but
this isn't a necessity, they can just be in a list if it could be done
more efficiently using a list.
Is this the best one can do (ignoring the likelihood that I've got
some syntax wrong) :-

# l is the list
# str is the incoming string
answer = ""
for x in l:
if str.find(x) < 0:
continue
answer = x
--
Chris Green

Apr 3 '08 #1

Subscribe Post Reply

2510

Paul Hankin

On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:

What's the neatest and/or most efficient way of testing if one of a
set of strings (contained in a dictionary, list or similar) is a
sub-string of a given string?

I.e. I have a string delivered into my program and I want to see if
any of a set of strings is a substring of the string I have been
given. It's quite OK to stop at the first one found. Ideally the
strings being searched through will be the keys of a dictionary but
this isn't a necessity, they can just be in a list if it could be done
more efficiently using a list.

Is this the best one can do (ignoring the likelihood that I've got
some syntax wrong) :-

# l is the list
# str is the incoming string
answer = ""
for x in l:
if str.find(x) < 0:
continue
answer = x

I'd not use 'l' (confused with '1') or 'str' (a standard module) as
variable names. Your code checks every string in the list even when
it's found one... you can reverse the test and break when the first
one is found. Using 'in' rather than testing the return value of find
is nicer as a substring test. Finally, using the 'else' clause lets
you make it clear that answer is set to the empty string when no match
is found.

for answer in l:
if str in answer: break
else:
answer = ''

--
Paul Hankin

Apr 3 '08 #2

tinnews

Paul Hankin <pa*********@gmail.comwrote:

On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:
What's the neatest and/or most efficient way of testing if one of a
set of strings (contained in a dictionary, list or similar) is a
sub-string of a given string?

I.e. I have a string delivered into my program and I want to see if
any of a set of strings is a substring of the string I have been
given. It's quite OK to stop at the first one found. Ideally the
strings being searched through will be the keys of a dictionary but
this isn't a necessity, they can just be in a list if it could be done
more efficiently using a list.

Is this the best one can do (ignoring the likelihood that I've got
some syntax wrong) :-

# l is the list
# str is the incoming string
answer = ""
for x in l:
if str.find(x) < 0:
continue
answer = x

I'd not use 'l' (confused with '1') or 'str' (a standard module) as
variable names.

Neither would I, it was just thrown together to show how I was
thinking.

Your code checks every string in the list even when
it's found one... you can reverse the test and break when the first
one is found. Using 'in' rather than testing the return value of find
is nicer as a substring test. Finally, using the 'else' clause lets
you make it clear that answer is set to the empty string when no match
is found.

for answer in l:
if str in answer: break
else:
answer = ''

OK, that does improve things somewhat, thanks.

--
Chris Green

Apr 3 '08 #3

Ant

On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:

What's the neatest and/or most efficient way of testing if one of a

A different approach:

>>words = ["he", "sh", "bla"]
name = "blah"
True in (word in name for word in words)

True

>>name = "bling"
True in (word in name for word in words)

False

Perhaps not as obvious or readable as Jeff's example, but is
essentially doing the same thing using generator syntax.

Apr 3 '08 #4

Jeff

On Apr 3, 8:44*am, Ant <ant...@gmail.comwrote:

On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:

What's the neatest and/or most efficient way of testing if one of a

A different approach:

>words = ["he", "sh", "bla"]
name = "blah"
True in (word in name for word in words)

True

>name = "bling"
True in (word in name for word in words)

False

Perhaps not as obvious or readable as Jeff's example, but is
essentially doing the same thing using generator syntax.

That's pretty :)

Apr 3 '08 #5

Dennis.Benzinger

On Apr 3, 1:37 pm, tinn...@isbd.co.uk wrote:

What's the neatest and/or most efficient way of testing if one of a
set of strings (contained in a dictionary, list or similar) is a
sub-string of a given string?
[...]

You could use the Aho-Corasick algorithm <http://en.wikipedia.org/wiki/
Aho-Corasick_algorithm>.
I don't know if there's a Python implementation yet.

Dennis Benzinger

Apr 3 '08 #6

bearophileHUGS

Dennis Benzinger:

You could use the Aho-Corasick algorithm <http://en.wikipedia.org/wiki/
Aho-Corasick_algorithm>.
I don't know if there's a Python implementation yet.

http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/

Bye,
bearophile

Apr 3 '08 #7

George Sakkis

On Apr 3, 9:00 am, Jeff <jeffo...@gmail.comwrote:

On Apr 3, 8:44 am, Ant <ant...@gmail.comwrote:

On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:

What's the neatest and/or most efficient way of testing if one of a

A different approach:

>>words = ["he", "sh", "bla"]
>>name = "blah"
>>True in (word in name for word in words)

True

>>name = "bling"
>>True in (word in name for word in words)

False

Perhaps not as obvious or readable as Jeff's example, but is
essentially doing the same thing using generator syntax.

That's pretty :)

It's even prettier in 2.5:

any(word in name for word in words)

George

Apr 3 '08 #8

Ant

On Apr 3, 2:27 pm, George Sakkis <george.sak...@gmail.comwrote:
....

It's even prettier in 2.5:

any(word in name for word in words)

George

And arguably the most readable yet!

Apr 3 '08 #9

Pop User

be************@lycos.com wrote:

Dennis Benzinger:
>You could use the Aho-Corasick algorithm <http://en.wikipedia.org/wiki/
Aho-Corasick_algorithm>.
I don't know if there's a Python implementation yet.

http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/

http://nicolas.lehuen.com/download/pytst/ can do it as well.

Apr 3 '08 #10

Similar topics

searching a substring

by: junky_fellow | last post by:

Can anyone suggest some efficient way to search a substring in a text file. For eg. suppose i want to search the pattern "abc" in a text file, then which algorithm should i use. I just want some...

C / C++

Efficient fixed width string substition puzzle

by: Jami Bradley | last post by:

Hi, I'm looking for an efficient way to do this, because I know it will be heavily used :-) I have a fixed width string and I need to substitute a substring of characters with new values. I...

C# / C Sharp

Bug: string.substring.trim = "" (boolean expression issue)

by: Darren Anderson | last post by:

I have a function that I've tried using in an if then statement and I've found that no matter how much reworking I do with the code, the expected result is incorrect. the code: If Not...

Visual Basic .NET

extracting substring

by: Ankit Aneja | last post by:

string comm="CONTSCAN E:\\projects backup\\ankitclam backup\\Clamtest\\testing\\hello.txt\r\n" int x=comm.Length; x=x-7; string path; path=comm.Substring(9,x); MessageBox.Show(path); it...

C# / C Sharp

progrm to find a substring in a string

by: Ajay | last post by:

Hi all,Could anybody tell me the most efficient method to find a substr in a string.

C / C++

Efficient way to do this

by: Steven Blair | last post by:

Hi, I get passed a string like these examples: 3.1 4.23 1.1 5 I need to seperate the values into two integer variables.

C# / C Sharp

single line string maninuplation substring, trim, indexof ?

by: Jason | last post by:

I have a string that looks like this? ihelloworld_zzz_yyy In a single vb.net line, I want to remove the frist character and anything after and including the first "_" resulting in : ...

Visual Basic .NET

TransModal modal dialog project : beta testing

by: VK | last post by:

In the continuation of the discussion at "Making Site Opaque -- This Strategy Feasible?" and my comment at http://groups.google.com/group/comp.lang.javascript/msg/b515a4408680e8e2 I have...

Javascript

Regex Usage, or use Substring?

by: =?Utf-8?B?anAybXNmdA==?= | last post by:

Two part question: 1. Is Regex more efficient than manually comparing values using Substring? 2. I've never created a Regex expression. How would I use regex to do the equivalent of what I...

C# / C Sharp

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice