473,725 Members | 2,230 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to subclass sets.Set() to change intersection() behavior?

I have kind of strange change I'd like to make to the sets.Set()
intersection() method..

Normally, intersection would return items in both s1 and s2 like with
something like this: s1.intersection (s2)

I want the item matching to be a bit "looser".. that is, items in s2
that match to just the beginning of items in s1 would be included in
the result of intersection().

I do not know how intersection() is implemented, so I just kinda
guessed it might have something to do with how it compares set
elements, probably using __eq__ or __cmp__. SO, I though if I override
these methods, maybe magically that would affect the way intersection
works.. so far, no luck =(

Please take a look at the little example script to try to illustrate
what I would like to happen when using my subclass.. Is my approach
totally wrong, or is there a better way to accomplish this? I am trying
to avoid running through nested loops of lists (see final example).

P.S.
- the lists I am working with are small, like 1-10 items each
- actually, not so concerned witht the items in the resulting set, just
want to know that the two sets have at least one item "in common"
- would welcome any other suggestions that would be FAST


import sets

# the way set intersection normally works
s1=sets.Set(['macys','instal lment','oil','b eans'])
s2=sets.Set(['macy','oil','i nst','coffee'])

# prints Set(['oil']), as expected..
print s1.intersection (s2)
# my subclass, mySet - I don't know how to effect the .intersection()
method
# my best guess was to change the __eq__ or maybe the __cmp__ methods??
# for now, mySet does nothing special at all but call the functions
from sets.Set
class mySet(sets.Set) :

def __init__(self,i terable=None):

sets.Set.__init __(self,iterabl e)

def __eq__(self,oth er):

# maybe something here?
return sets.Set.__eq__ (self,other)

def __cmp__(self,ot her):

# or maybe something here?
return sets.Set.__cmp_ _(self,other)

# the same sets used in previous example
s3=mySet(['macys','instal lment','oil','b eans'])
s4=mySet(['macy','oil','i nst','coffee'])

# and, the same result: mySet(['oil'])
print s3.intersection (s4)

#************** *************** *************** *************** *************** **
# THE RESULT I WOULD LIKE TO GET WOULD LOOK LIKE THIS
# because I want items of s4 to match to the beginning of items in s3
# actually I am not so concerned with the result of intersection, just
want to know there there was
# at least one item in common between the two sets..
#
# mySet(['macy','inst',' oil'])
#************** *************** *************** *************** *************** **

# this is the list implementation I am trying to avoid because I am
under the impression using set would be faster..(??)
# please let me know if I am wrong about that assumption

L1=['macys','instal lment','oil','b eans']
L2=['macy','oil','i nst','coffee']

L3=[]
for x in L1:
for y in L2:
if x.startswith(y) :
L3.append(y)

# prints ['macy', 'inst', 'oil']
print L3

Dec 13 '06 #1
2 2316
[mkppk]
I have kind of strange change I'd like to make to the sets.Set()
intersection() method..

Normally, intersection would return items in both s1 and s2 like with
something like this: s1.intersection (s2)
. . .
- the lists I am working with are small, like 1-10 items each
from sets import Set
from itertools import ifilter

class mySet(Set):
def isDisjoint(self , other):
if len(self) <= len(other):
little, big = self, other
else:
little, big = other, self
for elem in ifilter(big._da ta.has_key, little):
return False
return True

p = mySet('abc')
q = mySet('def')
r = mySet('cde')
print p.isDisjoint(q)
print r.isDisjoint(q)

Hope something like this works for you.
Raymond

Dec 13 '06 #2
At Tuesday 12/12/2006 23:23, mkppk wrote:
>I have kind of strange change I'd like to make to the sets.Set()
intersection () method..

Normally, intersection would return items in both s1 and s2 like with
something like this: s1.intersection (s2)

I want the item matching to be a bit "looser".. that is, items in s2
that match to just the beginning of items in s1 would be included in
the result of intersection().

I do not know how intersection() is implemented, so I just kinda
guessed it might have something to do with how it compares set
elements, probably using __eq__ or __cmp__. SO, I though if I override
these methods, maybe magically that would affect the way intersection
works.. so far, no luck =(
You got it the wrong way... That methods are used to compare two
sets, not to compare their elements.
You don't have to modify set behavior, instead, you should modify how
the set elements compare themselves. That is, you should inherit from
str and implement some "fuzzy comparison" logic.
>- the lists I am working with are small, like 1-10 items each
For such small lists, perhaps the best way is to iterate along both
lists, like in your example. But replace x.startswith(y) with
x[:len(y)]==y which is faster. Also, don't you have to test the other
way too? y.startswith(x)
># this is the list implementation I am trying to avoid because I am
under the impression using set would be faster..(??)
# please let me know if I am wrong about that assumption

L1=['macys','instal lment','oil','b eans']
L2=['macy','oil','i nst','coffee']

L3=[]
for x in L1:
for y in L2:
if x.startswith(y) :
L3.append(y)

# prints ['macy', 'inst', 'oil']
print L3
You can use the timeit module to measure performance.

Just for fun -because I don't think it would be better for small sets
as you have- this is an implementation of a "fuzzystrin g" class which
only compares its first character.

class fuzzystr(str):

"""A fuzzy string. Only takes its first character into account
when comparing.
That is, fuzzystr('abc') ==fuzzystr('add ')"""

def __cmp__(self, other):
if not isinstance(othe r, basestring): return -1 # always <
any other thing
if not self: return len(other) and -1 or 0
if not other: return 1
return cmp(self[0], other[0])

def __eq__(self, other): return self.__cmp__(ot her)==0
def __ne__(self, other): return self.__cmp__(ot her)!=0
def __lt__(self, other): return self.__cmp__(ot her)<0
def __le__(self, other): return self.__cmp__(ot her)<=0
def __gt__(self, other): return self.__cmp__(ot her)>0
def __ge__(self, other): return self.__cmp__(ot her)>=0

def __hash__(self):
# This must hold for all instances: x==y =hash(x)==hash( y)
if self: return hash(self[0])
return hash('')

try: set
except NameError: from sets import Set as set

s1=set(map(fuzz ystr,['macys','instal lment','oil','b eans']))
s2=set(map(fuzz ystr,['macy','oil','i nst','coffee']))
assert s1.intersection (s2) == set(map(fuzzyst r,['macy','inst',' oil']))
--
Gabriel Genellina
Softlab SRL

_______________ _______________ _______________ _____
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
Dec 13 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

21
2482
by: Raymond Hettinger | last post by:
I've gotten lots of feedback on the itertools module but have not heard a peep about the new sets module. * Are you overjoyed/outraged by the choice of | and & as set operators (instead of + and *)? * Is the support for sets of sets necessary for your work and, if so, then is the implementation sufficiently powerful?
7
2217
by: Steve | last post by:
This post has two parts. First is my feedback on sets. (Hello? Last summer called, they want their discussion thread back...) Second is some questions about my implementation of a partition function for sets. Part 1. --- >From: Raymond Hettinger (vze4rx4y@verizon.net) >Subject: Py2.3: Feedback on Sets >Newsgroups: comp.lang.python >Date: 2003-08-11 23:02:18 PST
1
7783
by: Ryan R. Rosario | last post by:
Hello - I am working on a scheduling application that has many "rules" for scheduling people. I throw each person into the set that corresponds to 2 teams. Then I split this large group (of everybody) into 3 groups (indicating which day they will work a particular shift). So I have two disjoint sets: teamA and teamB. and I have three sets: day1, day2, day3 (not divided by team). and so on...
2
2528
by: James Stroud | last post by:
Hello All, I find myself in this situation from time to time: I want to compare two lists of arbitrary objects and (1) find those unique to the first list, (2) find those unique to the second list, (3) find those that overlap. But here is the catch: comparison is not straight-forward. For example, I will want to compare 2 objects based on a set of common attributes. These two objects need not be members of the same class, etc. A function...
3
2598
by: Suresh Jeevanandam | last post by:
I have a list of sets in variable lsets . Now I want to find the intersection of all the sets. r = lsets for s in r: r = r & s Is there any other shorter way?
6
1853
by: John Henry | last post by:
Hi list, If I have a bunch of sets: a = set((1, 2, 3)) b = set((2, 3)) c = set((1, 3)) ..... What's the cleanest way to say:
11
8561
by: Prateek | last post by:
I have 3 variable length lists of sets. I need to find the common elements in each list (across sets) really really quickly. Here is some sample code: # Doesn't make sense to union the sets - we're going to do intersections later anyway l1 = reduce(operator.add, list(x) for x in l1) l2 = reduce(operator.add, list(x) for x in l2) l3 = reduce(operator.add, list(x) for x in l3)
13
31634
by: jm.suresh | last post by:
It is not possible to index set objects. That is OK. But, what if I want to find some element from the Set. from sets import Set s = Set( range(12 ) if I do pop, that particular element gets removed. I do not want to remove the element, but get some element from the Set.
1
6547
by: JosAH | last post by:
Greetings, Introduction This week I'll write a bit about generics (those funny angular brackets). I need an example and decided to use sets and some of their operations. This weeks' article discusses one set class and two interesting operations on the set: combinations and set partitioning. First a description of the algorithms is given and then we'll have some fun with the generic implementation of them.
0
9250
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9165
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8080
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6699
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6007
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4507
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4777
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3216
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2153
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.