473,480 Members | 1,755 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Intersection of lists/sets -- with a catch

Hello All,

I find myself in this situation from time to time: I want to compare two lists
of arbitrary objects and (1) find those unique to the first list, (2) find
those unique to the second list, (3) find those that overlap. But here is the
catch: comparison is not straight-forward. For example, I will want to
compare 2 objects based on a set of common attributes. These two objects need
not be members of the same class, etc. A function might help to illustrate:

def test_elements(element1, element2):
"""
Returns bool.
"""
# any evaluation can follow
return (element1.att_a == element2.att_a) and \
(element1.att_b == element2.att_b)

So my very naieve solution usually bears resemblance to the following code
(made inelegant by the irritating necessity to keep track of a counter):

for some_element in some_list:
i = 0
while i < len(another_list):
another_element = another_list[i]
overlap = test_elements(some_element, another_element)
if overlap:
store_overlap(some_element, another_element)
# speed-up, same as i+=1
another_list.pop(i)
break
else:
i += 1
if not overlap:
store_unique_some(some_element)

After this loop, "identical" elements (according to the evaluation criteria)
from the two lists are stored somewhere, elements unique to 'some_list' are
stored somewhere else, and 'another_list' has only elements originally unique
to itself according to the evaluation criteria remaining. Of course this code
assumes that both original lists contained only unique elements within
themselves and were previously sorted based on the evaluation criteria (to
speed-up the loop).

So, one improvement I can think of is to use an iterator for the inner loop:

for some_element in some_list:
for another_element in another_list:
overlap = test_elements(some_element, another_element)
if overlap:
store_overlap(some_element,another_element)
# big problem here -- iterator gets lost
another_list.remove(another_element)
break
if not overlap:
unique_some.append(some_element)

But the problem is obvious, the iterator becomes out of sync with
'another_list'. An example can illustrate:

py> alist = [1,2,3,4]
py> for el in alist:
.... print el
.... if el == 3:
.... alist.remove(el)
....
1
2
3
py> alist
[1, 2, 4]

So, a question would be how one might "correct" the iterator in this
situation. Inspecting dir(iter) provides no clues as to the identity of an
iterator's internal counter.

Its probably obvious to everyone that this type of task seems perfect for
sets. However, it does not seem that sets can be used in the following way,
using a hypothetical "comparator" function. The "comparator" would be
analagous to a function passed to the list.sort() method. Such a device would
crush the previous code to the following very straight-forward statements:

some_set = Set(some_list, comparator=test_elements)
another_set = Set(another_list, comparator=test_elements)
overlaps = some_set.intersection(another_set)
unique_some = some_set.difference(another_set)
unique_another = another_set.difference(some_set)

I am under the personal opinion that such a modification to the set type would
make it vastly more flexible, if it does not already have this ability.

Any thoughts on how I might accomplish either technique or any thoughts on how
to make my code more straightforward would be greatly appreciated.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Oct 18 '05 #1
2 2508

James Stroud wrote:
Hello All,

I find myself in this situation from time to time: I want to compare two lists
of arbitrary objects and (1) find those unique to the first list, (2) find
those unique to the second list, (3) find those that overlap. But here is the
catch: comparison is not straight-forward. For example, I will want to
compare 2 objects based on a set of common attributes. These two objects need
not be members of the same class, etc. A function might help to illustrate:

def test_elements(element1, element2):
"""
Returns bool.
"""
# any evaluation can follow
return (element1.att_a == element2.att_a) and \
(element1.att_b == element2.att_b)

[snip]
Its probably obvious to everyone that this type of task seems perfect for
sets. However, it does not seem that sets can be used in the following way,
using a hypothetical "comparator" function. The "comparator" would be
analagous to a function passed to the list.sort() method. Such a device would
crush the previous code to the following very straight-forward statements:

some_set = Set(some_list, comparator=test_elements)
another_set = Set(another_list, comparator=test_elements)
overlaps = some_set.intersection(another_set)
unique_some = some_set.difference(another_set)
unique_another = another_set.difference(some_set)

I am under the personal opinion that such a modification to the set type would
make it vastly more flexible, if it does not already have this ability.

Any thoughts on how I might accomplish either technique or any thoughts on how
to make my code more straightforward would be greatly appreciated.

Howabout something like this (untested):

class CmpProxy(object):
def __init__(self,obj):
self.obj = obj
def __eq__(self,other):
return (self.obj.att_a == other.obj.att_b
and self.obj.att_b == other.obj.att_b)
def __hash__(self):
return hash((self.obj.att_a,self.obj.att_b))

set_a = set(CmpProxy(x) for x in list_a)
set_b = set(CmpProxy(y) for y in list_b)
overlaps = [ z.obj for z in set_a.intersection(set.b) ]
Carl Banks

Oct 18 '05 #2
"Carl Banks" <in**********@aerojockey.com> wrote:
Howabout something like this (untested):

class CmpProxy(object):
def __init__(self,obj):
self.obj = obj
def __eq__(self,other):
return (self.obj.att_a == other.obj.att_b
and self.obj.att_b == other.obj.att_b)
def __hash__(self):
return hash((self.obj.att_a,self.obj.att_b))

set_a = set(CmpProxy(x) for x in list_a)
set_b = set(CmpProxy(y) for y in list_b)
overlaps = [ z.obj for z in set_a.intersection(set.b) ]


Or more generally:

class Comparable(object):
def __init__(self, obj, key):
self.obj,self._key = obj,key
def __hash__(self):
return hash(self._key(self.obj))
def __eq__(self, other):
return self._key(self.obj) == self._key(other.obj)
if __name__ == '__main__':
some_list = ["hello", "world"]
another_list = ["HeLlO", "word"]
key = str.lower
some_set = set(Comparable(x,key) for x in some_list)
another_set = set(Comparable(x,key) for x in another_list)
print "overlaps:", [x.obj for x in some_set.intersection(another_set)]
print "unique_some:", [x.obj for x in some_set.difference(another_set)]
print "unique_another:", [x.obj for x in another_set.difference(some_set)]
George
Oct 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
80953
by: Mickel Grönroos | last post by:
Hi! Are there any standard list methods for getting the intersection and difference of two lists? (The union is easy ("list1.extend(list2)"), unless you want it to contain unique values.) ...
5
9160
by: Antoine Logean | last post by:
Hi, What is the easiest way to get the intersection of two strings in python (a kind a "and" operator) ? ex: string_1 =...
17
2746
by: Gordon Williams | last post by:
Hi, I have to lists that I need to find the common numbers (2nd rounded to nearest integral) and I am wondering if there is a more efficient way of doing it. >>> a= >>> b= >>> ...
7
2183
by: les_ander | last post by:
Hi, I have 2 lists of tuples that look like: E1= and E2=. In this tuple, the ordering does not matter, i.e. (u,v) is the same as (v,u). What I want to do is the following: given 2 list of...
3
5487
by: ryu | last post by:
Hi, May I know how to do an intersection of sets using C#? Where the number of sets will only be known during runtime. Many Thanks
3
2582
by: Suresh Jeevanandam | last post by:
I have a list of sets in variable lsets . Now I want to find the intersection of all the sets. r = lsets for s in r: r = r & s Is there any other shorter way?
2
2287
by: mkppk | last post by:
I have kind of strange change I'd like to make to the sets.Set() intersection() method.. Normally, intersection would return items in both s1 and s2 like with something like this: ...
11
8533
by: Prateek | last post by:
I have 3 variable length lists of sets. I need to find the common elements in each list (across sets) really really quickly. Here is some sample code: # Doesn't make sense to union the sets -...
0
7044
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
6908
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7087
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
6944
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
4782
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
2995
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
2985
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1300
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
182
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.