473,397 Members | 2,099 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

finding all sublists of list2 that are identical to list1

Hello,

the function given below returns all indexes of list2 where a sublist
of list2 that is identical to list1 begins.

As I will need this function quite often, I would like to know if more
experienced programmers would agree with the way I defined the
function:

- Is there a more efficient way to do it? (Apart from building
automata.)
- Is there a more elegant/Pythonic way to write the function?

Klaus
def sublist(list1, list2):
if list1 == list2:
return [0]
elif len(list1) > len(list2):
return []
else:
result = []
shift = 0
i = 0
while shift <= len(list2)-len(list1):
if list1[i] == list2[shift+i]:
if i == len(list1)-1:
result.append(shift)
i = 0
shift += 1
else:
i += 1
else:
i = 0
shift += 1
return result
Jul 18 '05 #1
4 1924
> - Is there a more efficient way to do it? (Apart from building
automata.)
Look into the shift-and algorithm for string-search- that should be much
faster. Its not trivial so I can't reproduce it right here out of my head,
but google helps.
- Is there a more elegant/Pythonic way to write the function?


Yes. You should work much more with slicing, and compare whole sublists like
this:

-------snip-----------
hs = [0, 1, 5, 3, 2, 4, 5, 3, 2]
ne = [5, 3, 2]

def sublist(needle, haystack):
hs_size = len(haystack)
n_size = len(needle)

if n_size > hs_size:
return []
elif hs_size == n_size and needle == haystack:
return []
else:
res = []
offset = 0
sublist = haystack[offset:offset + n_size]
while len(sublist) == n_size:
if sublist == needle:
res.append(offset)
offset += 1
sublist = haystack[offset:offset + n_size]
return res

if __name__ == "__main__":
print sublist(ne, hs)

-------snip-----------

--
Regards,

Diez B. Roggisch
Jul 18 '05 #2
On 9 Feb 2004 06:59:35 -0800, kl************@yahoo.de (Klaus Neuner)
wrote:
Hello,

the function given below returns all indexes of list2 where a sublist
of list2 that is identical to list1 begins.

As I will need this function quite often, I would like to know if more
experienced programmers would agree with the way I defined the
function:

- Is there a more efficient way to do it? (Apart from building
automata.)
- Is there a more elegant/Pythonic way to write the function?

Klaus
def sublist(list1, list2):
if list1 == list2:
return [0]
elif len(list1) > len(list2):
return []
else:
result = []
shift = 0
i = 0
while shift <= len(list2)-len(list1):
if list1[i] == list2[shift+i]:
if i == len(list1)-1:
result.append(shift)
i = 0
shift += 1
else:
i += 1
else:
i = 0
shift += 1
return result


I don't know about it being Pythonic, but list comprehensions are
always fun. This one's 10-20% faster (based on timeit results) than
your code:

def sublist(list1, list2):
len1 = len(list1)
rnge = xrange(len(list2))
return [i for i in rnge if list2[i:len1+i] == list1]

If you need this for a specific purpose, you might be able to
transform the data and try a different technique which would be
executed internally at the "C" level. For example, if your list is
always integers, you might be able to join them into a tab-delimited
string and scan the string with a regular expression (unless you
consider that building automata).

The list doesn't have to be integers for the regex idea to be
used--you just have to be sure the values in list1 contain no regex
metacharacters, and that list1 and list2 don't contain your delimiter
(tab in this example). If list1 contains regex characters, you need
to escape them with a backslash first.

It's possible that if you know something about the size of list1, you
could use this information also, but without more details, I'm
shooting in the dark.

--dang
Jul 18 '05 #3
> def sublist(list1, list2):
len1 = len(list1)
rnge = xrange(len(list2))
return [i for i in rnge if list2[i:len1+i] == list1]
Nice one - looks like I optimized too much.
If you need this for a specific purpose, you might be able to
transform the data and try a different technique which would be
executed internally at the "C" level. For example, if your list is
always integers, you might be able to join them into a tab-delimited
string and scan the string with a regular expression (unless you
consider that building automata).
You don't gain anything by that - it would still perform the search inO(n). But your idea makes sense when the created "strings" would then be

used in a better perforiming string searching algorithm like shift-and.

--
Regards,

Diez B. Roggisch
Jul 18 '05 #4
> def sublist(list1, list2):
len1 = len(list1)
rnge = xrange(len(list2))
return [i for i in rnge if list2[i:len1+i] == list1]
Thanks.
but without more details, I'm
shooting in the dark.


Actually, I want to leave open what list1 and list2 may contain. I
want to define a class Matcher, like so:

class Matcher(object):

def match(self, a, b):
if a == b:
return True
else:
return False

def all_matches(self, list1, list2):
len1 = len(list1)
rnge = xrange(len(list2))
return [i for i in rnge if self.match(list2[i:len1+i], list1)]

I use the function "match" instead of "==" here, because I want to
make subclasses that use a different concept of equivalence. One type
of equivalence
could be "equality in the first element".

class FirstElementMatcher(Matcher):

def match(self, a, b):
if a[0] == b[0]:
return True
else:
return False
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Ponder | last post by:
Okay here's the scenario: I have two big lists of numbers. One number on each line (they're telephone numbers if that helps.) What I want to do, is have VB search through list2.txt for numbers...
3
by: Raseliarison nirinA | last post by:
hi all, i found an unanswered question at http://www.faqts.com/knowledge_base/index.phtml/fid/538 with possible response below. i've tried to send it at faqt.python but can't figure out how to...
17
by: Adam H. Peterson | last post by:
Is there a standard way to find the minimum value for a data type? I'm thinking along the lines of std::numeric_limits<T>::min(). But that doesn't work for floating point types. I can't use...
2
by: Ginu | last post by:
Hi, the task is to identify semantically identical elements where some additional attributes do not match. The XSL-transformation should find a node NAME which @id attribute matches to another...
0
by: Jorge | last post by:
Hi Jy In the OleDbDataAdapter.SelectCommand.CommandText = "Select from Table where FieldName like '% " & variable & "%'" Kind Regards Jorge >-----Original Message-----
14
by: kelvin.jones | last post by:
Hi, if I had the ID of an input element, how can I find the input's FORM in javascript? Basically, given a input's dom id, I want to insert something in the onSubmit of the Form that that input...
5
by: Manish | last post by:
This is the print_r() for a variable $categories. $categories :: Array ( =Array ( =Array (
275
by: Astley Le Jasper | last post by:
Sorry for the numpty question ... How do you find the reference name of an object? So if i have this bob = modulename.objectname() how do i find that the name is 'bob'
1
by: cincisuzie | last post by:
I'm a beginner, so I will try and convey my questions as best as possible. I have a database of approximately 20,000 companies - I will call this list LIST1, and I purchased a list to with...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.