473,804 Members | 3,762 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Intersection of lists/sets -- with a catch

Hello All,

I find myself in this situation from time to time: I want to compare two lists
of arbitrary objects and (1) find those unique to the first list, (2) find
those unique to the second list, (3) find those that overlap. But here is the
catch: comparison is not straight-forward. For example, I will want to
compare 2 objects based on a set of common attributes. These two objects need
not be members of the same class, etc. A function might help to illustrate:

def test_elements(e lement1, element2):
"""
Returns bool.
"""
# any evaluation can follow
return (element1.att_a == element2.att_a) and \
(element1.att_b == element2.att_b)

So my very naieve solution usually bears resemblance to the following code
(made inelegant by the irritating necessity to keep track of a counter):

for some_element in some_list:
i = 0
while i < len(another_lis t):
another_element = another_list[i]
overlap = test_elements(s ome_element, another_element )
if overlap:
store_overlap(s ome_element, another_element )
# speed-up, same as i+=1
another_list.po p(i)
break
else:
i += 1
if not overlap:
store_unique_so me(some_element )

After this loop, "identical" elements (according to the evaluation criteria)
from the two lists are stored somewhere, elements unique to 'some_list' are
stored somewhere else, and 'another_list' has only elements originally unique
to itself according to the evaluation criteria remaining. Of course this code
assumes that both original lists contained only unique elements within
themselves and were previously sorted based on the evaluation criteria (to
speed-up the loop).

So, one improvement I can think of is to use an iterator for the inner loop:

for some_element in some_list:
for another_element in another_list:
overlap = test_elements(s ome_element, another_element )
if overlap:
store_overlap(s ome_element,ano ther_element)
# big problem here -- iterator gets lost
another_list.re move(another_el ement)
break
if not overlap:
unique_some.app end(some_elemen t)

But the problem is obvious, the iterator becomes out of sync with
'another_list'. An example can illustrate:

py> alist = [1,2,3,4]
py> for el in alist:
.... print el
.... if el == 3:
.... alist.remove(el )
....
1
2
3
py> alist
[1, 2, 4]

So, a question would be how one might "correct" the iterator in this
situation. Inspecting dir(iter) provides no clues as to the identity of an
iterator's internal counter.

Its probably obvious to everyone that this type of task seems perfect for
sets. However, it does not seem that sets can be used in the following way,
using a hypothetical "comparator " function. The "comparator " would be
analagous to a function passed to the list.sort() method. Such a device would
crush the previous code to the following very straight-forward statements:

some_set = Set(some_list, comparator=test _elements)
another_set = Set(another_lis t, comparator=test _elements)
overlaps = some_set.inters ection(another_ set)
unique_some = some_set.differ ence(another_se t)
unique_another = another_set.dif ference(some_se t)

I am under the personal opinion that such a modification to the set type would
make it vastly more flexible, if it does not already have this ability.

Any thoughts on how I might accomplish either technique or any thoughts on how
to make my code more straightforward would be greatly appreciated.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Oct 18 '05 #1
2 2529

James Stroud wrote:
Hello All,

I find myself in this situation from time to time: I want to compare two lists
of arbitrary objects and (1) find those unique to the first list, (2) find
those unique to the second list, (3) find those that overlap. But here is the
catch: comparison is not straight-forward. For example, I will want to
compare 2 objects based on a set of common attributes. These two objects need
not be members of the same class, etc. A function might help to illustrate:

def test_elements(e lement1, element2):
"""
Returns bool.
"""
# any evaluation can follow
return (element1.att_a == element2.att_a) and \
(element1.att_b == element2.att_b)

[snip]
Its probably obvious to everyone that this type of task seems perfect for
sets. However, it does not seem that sets can be used in the following way,
using a hypothetical "comparator " function. The "comparator " would be
analagous to a function passed to the list.sort() method. Such a device would
crush the previous code to the following very straight-forward statements:

some_set = Set(some_list, comparator=test _elements)
another_set = Set(another_lis t, comparator=test _elements)
overlaps = some_set.inters ection(another_ set)
unique_some = some_set.differ ence(another_se t)
unique_another = another_set.dif ference(some_se t)

I am under the personal opinion that such a modification to the set type would
make it vastly more flexible, if it does not already have this ability.

Any thoughts on how I might accomplish either technique or any thoughts on how
to make my code more straightforward would be greatly appreciated.

Howabout something like this (untested):

class CmpProxy(object ):
def __init__(self,o bj):
self.obj = obj
def __eq__(self,oth er):
return (self.obj.att_a == other.obj.att_b
and self.obj.att_b == other.obj.att_b )
def __hash__(self):
return hash((self.obj. att_a,self.obj. att_b))

set_a = set(CmpProxy(x) for x in list_a)
set_b = set(CmpProxy(y) for y in list_b)
overlaps = [ z.obj for z in set_a.intersect ion(set.b) ]
Carl Banks

Oct 18 '05 #2
"Carl Banks" <in**********@a erojockey.com> wrote:
Howabout something like this (untested):

class CmpProxy(object ):
def __init__(self,o bj):
self.obj = obj
def __eq__(self,oth er):
return (self.obj.att_a == other.obj.att_b
and self.obj.att_b == other.obj.att_b )
def __hash__(self):
return hash((self.obj. att_a,self.obj. att_b))

set_a = set(CmpProxy(x) for x in list_a)
set_b = set(CmpProxy(y) for y in list_b)
overlaps = [ z.obj for z in set_a.intersect ion(set.b) ]


Or more generally:

class Comparable(obje ct):
def __init__(self, obj, key):
self.obj,self._ key = obj,key
def __hash__(self):
return hash(self._key( self.obj))
def __eq__(self, other):
return self._key(self. obj) == self._key(other .obj)
if __name__ == '__main__':
some_list = ["hello", "world"]
another_list = ["HeLlO", "word"]
key = str.lower
some_set = set(Comparable( x,key) for x in some_list)
another_set = set(Comparable( x,key) for x in another_list)
print "overlaps:" , [x.obj for x in some_set.inters ection(another_ set)]
print "unique_som e:", [x.obj for x in some_set.differ ence(another_se t)]
print "unique_another :", [x.obj for x in another_set.dif ference(some_se t)]
George
Oct 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
80988
by: Mickel Grönroos | last post by:
Hi! Are there any standard list methods for getting the intersection and difference of two lists? (The union is easy ("list1.extend(list2)"), unless you want it to contain unique values.) Here is what I would like to have: list1 = list2 =
5
9174
by: Antoine Logean | last post by:
Hi, What is the easiest way to get the intersection of two strings in python (a kind a "and" operator) ? ex: string_1 = "the_car_of_my_fried_is_bigger_as_mine_but_my_girlfriend_is_more_beautifull" string_2 = "my_girlfriend_is_more_beautifull_and_has_blue_eyes"
17
2788
by: Gordon Williams | last post by:
Hi, I have to lists that I need to find the common numbers (2nd rounded to nearest integral) and I am wondering if there is a more efficient way of doing it. >>> a= >>> b= >>> (l,round(m))]
7
2221
by: les_ander | last post by:
Hi, I have 2 lists of tuples that look like: E1= and E2=. In this tuple, the ordering does not matter, i.e. (u,v) is the same as (v,u). What I want to do is the following: given 2 list of tuples, E1 and E2, I want to create another list with tuples that are common to both. So in the above example I would like
3
5508
by: ryu | last post by:
Hi, May I know how to do an intersection of sets using C#? Where the number of sets will only be known during runtime. Many Thanks
3
2607
by: Suresh Jeevanandam | last post by:
I have a list of sets in variable lsets . Now I want to find the intersection of all the sets. r = lsets for s in r: r = r & s Is there any other shorter way?
2
2321
by: mkppk | last post by:
I have kind of strange change I'd like to make to the sets.Set() intersection() method.. Normally, intersection would return items in both s1 and s2 like with something like this: s1.intersection(s2) I want the item matching to be a bit "looser".. that is, items in s2 that match to just the beginning of items in s1 would be included in the result of intersection().
11
8572
by: Prateek | last post by:
I have 3 variable length lists of sets. I need to find the common elements in each list (across sets) really really quickly. Here is some sample code: # Doesn't make sense to union the sets - we're going to do intersections later anyway l1 = reduce(operator.add, list(x) for x in l1) l2 = reduce(operator.add, list(x) for x in l2) l3 = reduce(operator.add, list(x) for x in l3)
0
9706
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9579
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10571
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10326
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10075
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6851
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5520
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5651
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4295
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.