464,812 Members | 1,083 Online
Need help? Post your question and get tips & solutions from a community of 464,812 IT Pros & Developers. It's quick & easy.

# Intersection of lists/sets -- with a catch

 P: n/a Hello All, I find myself in this situation from time to time: I want to compare two lists of arbitrary objects and (1) find those unique to the first list, (2) find those unique to the second list, (3) find those that overlap. But here is the catch: comparison is not straight-forward. For example, I will want to compare 2 objects based on a set of common attributes. These two objects need not be members of the same class, etc. A function might help to illustrate: def test_elements(element1, element2): """ Returns bool. """ # any evaluation can follow return (element1.att_a == element2.att_a) and \ (element1.att_b == element2.att_b) So my very naieve solution usually bears resemblance to the following code (made inelegant by the irritating necessity to keep track of a counter): for some_element in some_list: i = 0 while i < len(another_list): another_element = another_list[i] overlap = test_elements(some_element, another_element) if overlap: store_overlap(some_element, another_element) # speed-up, same as i+=1 another_list.pop(i) break else: i += 1 if not overlap: store_unique_some(some_element) After this loop, "identical" elements (according to the evaluation criteria) from the two lists are stored somewhere, elements unique to 'some_list' are stored somewhere else, and 'another_list' has only elements originally unique to itself according to the evaluation criteria remaining. Of course this code assumes that both original lists contained only unique elements within themselves and were previously sorted based on the evaluation criteria (to speed-up the loop). So, one improvement I can think of is to use an iterator for the inner loop: for some_element in some_list: for another_element in another_list: overlap = test_elements(some_element, another_element) if overlap: store_overlap(some_element,another_element) # big problem here -- iterator gets lost another_list.remove(another_element) break if not overlap: unique_some.append(some_element) But the problem is obvious, the iterator becomes out of sync with 'another_list'. An example can illustrate: py> alist = [1,2,3,4] py> for el in alist: .... print el .... if el == 3: .... alist.remove(el) .... 1 2 3 py> alist [1, 2, 4] So, a question would be how one might "correct" the iterator in this situation. Inspecting dir(iter) provides no clues as to the identity of an iterator's internal counter. Its probably obvious to everyone that this type of task seems perfect for sets. However, it does not seem that sets can be used in the following way, using a hypothetical "comparator" function. The "comparator" would be analagous to a function passed to the list.sort() method. Such a device would crush the previous code to the following very straight-forward statements: some_set = Set(some_list, comparator=test_elements) another_set = Set(another_list, comparator=test_elements) overlaps = some_set.intersection(another_set) unique_some = some_set.difference(another_set) unique_another = another_set.difference(some_set) I am under the personal opinion that such a modification to the set type would make it vastly more flexible, if it does not already have this ability. Any thoughts on how I might accomplish either technique or any thoughts on how to make my code more straightforward would be greatly appreciated. James -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com/ Oct 18 '05 #1