443,813 Members | 1,121 Online
Need help? Post your question and get tips & solutions from a community of 443,813 IT Pros & Developers. It's quick & easy.

# finding all sublists of list2 that are identical to list1

 P: n/a Hello, the function given below returns all indexes of list2 where a sublist of list2 that is identical to list1 begins. As I will need this function quite often, I would like to know if more experienced programmers would agree with the way I defined the function: - Is there a more efficient way to do it? (Apart from building automata.) - Is there a more elegant/Pythonic way to write the function? Klaus def sublist(list1, list2): if list1 == list2: return [0] elif len(list1) > len(list2): return [] else: result = [] shift = 0 i = 0 while shift <= len(list2)-len(list1): if list1[i] == list2[shift+i]: if i == len(list1)-1: result.append(shift) i = 0 shift += 1 else: i += 1 else: i = 0 shift += 1 return result Jul 18 '05 #1
4 Replies

 P: n/a > - Is there a more efficient way to do it? (Apart from building automata.) Look into the shift-and algorithm for string-search- that should be much faster. Its not trivial so I can't reproduce it right here out of my head, but google helps. - Is there a more elegant/Pythonic way to write the function? Yes. You should work much more with slicing, and compare whole sublists like this: -------snip----------- hs = [0, 1, 5, 3, 2, 4, 5, 3, 2] ne = [5, 3, 2] def sublist(needle, haystack): hs_size = len(haystack) n_size = len(needle) if n_size > hs_size: return [] elif hs_size == n_size and needle == haystack: return [] else: res = [] offset = 0 sublist = haystack[offset:offset + n_size] while len(sublist) == n_size: if sublist == needle: res.append(offset) offset += 1 sublist = haystack[offset:offset + n_size] return res if __name__ == "__main__": print sublist(ne, hs) -------snip----------- -- Regards, Diez B. Roggisch Jul 18 '05 #2

 P: n/a On 9 Feb 2004 06:59:35 -0800, kl************@yahoo.de (Klaus Neuner) wrote: Hello,the function given below returns all indexes of list2 where a sublistof list2 that is identical to list1 begins.As I will need this function quite often, I would like to know if moreexperienced programmers would agree with the way I defined thefunction:- Is there a more efficient way to do it? (Apart from buildingautomata.)- Is there a more elegant/Pythonic way to write the function?Klaus def sublist(list1, list2): if list1 == list2: return [0] elif len(list1) > len(list2): return [] else: result = [] shift = 0 i = 0 while shift <= len(list2)-len(list1): if list1[i] == list2[shift+i]: if i == len(list1)-1: result.append(shift) i = 0 shift += 1 else: i += 1 else: i = 0 shift += 1 return result I don't know about it being Pythonic, but list comprehensions are always fun. This one's 10-20% faster (based on timeit results) than your code: def sublist(list1, list2): len1 = len(list1) rnge = xrange(len(list2)) return [i for i in rnge if list2[i:len1+i] == list1] If you need this for a specific purpose, you might be able to transform the data and try a different technique which would be executed internally at the "C" level. For example, if your list is always integers, you might be able to join them into a tab-delimited string and scan the string with a regular expression (unless you consider that building automata). The list doesn't have to be integers for the regex idea to be used--you just have to be sure the values in list1 contain no regex metacharacters, and that list1 and list2 don't contain your delimiter (tab in this example). If list1 contains regex characters, you need to escape them with a backslash first. It's possible that if you know something about the size of list1, you could use this information also, but without more details, I'm shooting in the dark. --dang Jul 18 '05 #3

 P: n/a > def sublist(list1, list2): len1 = len(list1) rnge = xrange(len(list2)) return [i for i in rnge if list2[i:len1+i] == list1] Nice one - looks like I optimized too much. If you need this for a specific purpose, you might be able to transform the data and try a different technique which would be executed internally at the "C" level. For example, if your list is always integers, you might be able to join them into a tab-delimited string and scan the string with a regular expression (unless you consider that building automata). You don't gain anything by that - it would still perform the search inO(n). But your idea makes sense when the created "strings" would then be used in a better perforiming string searching algorithm like shift-and. -- Regards, Diez B. Roggisch Jul 18 '05 #4

 P: n/a > def sublist(list1, list2): len1 = len(list1) rnge = xrange(len(list2)) return [i for i in rnge if list2[i:len1+i] == list1] Thanks. but without more details, I'm shooting in the dark. Actually, I want to leave open what list1 and list2 may contain. I want to define a class Matcher, like so: class Matcher(object): def match(self, a, b): if a == b: return True else: return False def all_matches(self, list1, list2): len1 = len(list1) rnge = xrange(len(list2)) return [i for i in rnge if self.match(list2[i:len1+i], list1)] I use the function "match" instead of "==" here, because I want to make subclasses that use a different concept of equivalence. One type of equivalence could be "equality in the first element". class FirstElementMatcher(Matcher): def match(self, a, b): if a[0] == b[0]: return True else: return False Jul 18 '05 #5

### This discussion thread is closed

Replies have been disabled for this discussion.