By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,813 Members | 1,121 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,813 IT Pros & Developers. It's quick & easy.

finding all sublists of list2 that are identical to list1

P: n/a
Hello,

the function given below returns all indexes of list2 where a sublist
of list2 that is identical to list1 begins.

As I will need this function quite often, I would like to know if more
experienced programmers would agree with the way I defined the
function:

- Is there a more efficient way to do it? (Apart from building
automata.)
- Is there a more elegant/Pythonic way to write the function?

Klaus
def sublist(list1, list2):
if list1 == list2:
return [0]
elif len(list1) > len(list2):
return []
else:
result = []
shift = 0
i = 0
while shift <= len(list2)-len(list1):
if list1[i] == list2[shift+i]:
if i == len(list1)-1:
result.append(shift)
i = 0
shift += 1
else:
i += 1
else:
i = 0
shift += 1
return result
Jul 18 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
> - Is there a more efficient way to do it? (Apart from building
automata.)
Look into the shift-and algorithm for string-search- that should be much
faster. Its not trivial so I can't reproduce it right here out of my head,
but google helps.
- Is there a more elegant/Pythonic way to write the function?


Yes. You should work much more with slicing, and compare whole sublists like
this:

-------snip-----------
hs = [0, 1, 5, 3, 2, 4, 5, 3, 2]
ne = [5, 3, 2]

def sublist(needle, haystack):
hs_size = len(haystack)
n_size = len(needle)

if n_size > hs_size:
return []
elif hs_size == n_size and needle == haystack:
return []
else:
res = []
offset = 0
sublist = haystack[offset:offset + n_size]
while len(sublist) == n_size:
if sublist == needle:
res.append(offset)
offset += 1
sublist = haystack[offset:offset + n_size]
return res

if __name__ == "__main__":
print sublist(ne, hs)

-------snip-----------

--
Regards,

Diez B. Roggisch
Jul 18 '05 #2

P: n/a
On 9 Feb 2004 06:59:35 -0800, kl************@yahoo.de (Klaus Neuner)
wrote:
Hello,

the function given below returns all indexes of list2 where a sublist
of list2 that is identical to list1 begins.

As I will need this function quite often, I would like to know if more
experienced programmers would agree with the way I defined the
function:

- Is there a more efficient way to do it? (Apart from building
automata.)
- Is there a more elegant/Pythonic way to write the function?

Klaus
def sublist(list1, list2):
if list1 == list2:
return [0]
elif len(list1) > len(list2):
return []
else:
result = []
shift = 0
i = 0
while shift <= len(list2)-len(list1):
if list1[i] == list2[shift+i]:
if i == len(list1)-1:
result.append(shift)
i = 0
shift += 1
else:
i += 1
else:
i = 0
shift += 1
return result


I don't know about it being Pythonic, but list comprehensions are
always fun. This one's 10-20% faster (based on timeit results) than
your code:

def sublist(list1, list2):
len1 = len(list1)
rnge = xrange(len(list2))
return [i for i in rnge if list2[i:len1+i] == list1]

If you need this for a specific purpose, you might be able to
transform the data and try a different technique which would be
executed internally at the "C" level. For example, if your list is
always integers, you might be able to join them into a tab-delimited
string and scan the string with a regular expression (unless you
consider that building automata).

The list doesn't have to be integers for the regex idea to be
used--you just have to be sure the values in list1 contain no regex
metacharacters, and that list1 and list2 don't contain your delimiter
(tab in this example). If list1 contains regex characters, you need
to escape them with a backslash first.

It's possible that if you know something about the size of list1, you
could use this information also, but without more details, I'm
shooting in the dark.

--dang
Jul 18 '05 #3

P: n/a
> def sublist(list1, list2):
len1 = len(list1)
rnge = xrange(len(list2))
return [i for i in rnge if list2[i:len1+i] == list1]
Nice one - looks like I optimized too much.
If you need this for a specific purpose, you might be able to
transform the data and try a different technique which would be
executed internally at the "C" level. For example, if your list is
always integers, you might be able to join them into a tab-delimited
string and scan the string with a regular expression (unless you
consider that building automata).
You don't gain anything by that - it would still perform the search inO(n). But your idea makes sense when the created "strings" would then be

used in a better perforiming string searching algorithm like shift-and.

--
Regards,

Diez B. Roggisch
Jul 18 '05 #4

P: n/a
> def sublist(list1, list2):
len1 = len(list1)
rnge = xrange(len(list2))
return [i for i in rnge if list2[i:len1+i] == list1]
Thanks.
but without more details, I'm
shooting in the dark.


Actually, I want to leave open what list1 and list2 may contain. I
want to define a class Matcher, like so:

class Matcher(object):

def match(self, a, b):
if a == b:
return True
else:
return False

def all_matches(self, list1, list2):
len1 = len(list1)
rnge = xrange(len(list2))
return [i for i in rnge if self.match(list2[i:len1+i], list1)]

I use the function "match" instead of "==" here, because I want to
make subclasses that use a different concept of equivalence. One type
of equivalence
could be "equality in the first element".

class FirstElementMatcher(Matcher):

def match(self, a, b):
if a[0] == b[0]:
return True
else:
return False
Jul 18 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.