How to subclass sets.Set() to change intersection() behavior?

mkppk

I have kind of strange change I'd like to make to the sets.Set()
intersection() method..

Normally, intersection would return items in both s1 and s2 like with
something like this: s1.intersection(s2)

I want the item matching to be a bit "looser".. that is, items in s2
that match to just the beginning of items in s1 would be included in
the result of intersection().

I do not know how intersection() is implemented, so I just kinda
guessed it might have something to do with how it compares set
elements, probably using __eq__ or __cmp__. SO, I though if I override
these methods, maybe magically that would affect the way intersection
works.. so far, no luck =(

Please take a look at the little example script to try to illustrate
what I would like to happen when using my subclass.. Is my approach
totally wrong, or is there a better way to accomplish this? I am trying
to avoid running through nested loops of lists (see final example).

P.S.
- the lists I am working with are small, like 1-10 items each
- actually, not so concerned witht the items in the resulting set, just
want to know that the two sets have at least one item "in common"
- would welcome any other suggestions that would be FAST

import sets

# the way set intersection normally works
s1=sets.Set(['macys','installment','oil','beans'])
s2=sets.Set(['macy','oil','inst','coffee'])

# prints Set(['oil']), as expected..
print s1.intersection(s2)
# my subclass, mySet - I don't know how to effect the .intersection()
method
# my best guess was to change the __eq__ or maybe the __cmp__ methods??
# for now, mySet does nothing special at all but call the functions
from sets.Set
class mySet(sets.Set):

def __init__(self,iterable=None):

sets.Set.__init__(self,iterable)

def __eq__(self,other):

# maybe something here?
return sets.Set.__eq__(self,other)

def __cmp__(self,other):

# or maybe something here?
return sets.Set.__cmp__(self,other)

# the same sets used in previous example
s3=mySet(['macys','installment','oil','beans'])
s4=mySet(['macy','oil','inst','coffee'])

# and, the same result: mySet(['oil'])
print s3.intersection(s4)

#************************************************* ***************************
# THE RESULT I WOULD LIKE TO GET WOULD LOOK LIKE THIS
# because I want items of s4 to match to the beginning of items in s3
# actually I am not so concerned with the result of intersection, just
want to know there there was
# at least one item in common between the two sets..
#
# mySet(['macy','inst','oil'])
#************************************************* ***************************

# this is the list implementation I am trying to avoid because I am
under the impression using set would be faster..(??)
# please let me know if I am wrong about that assumption

L1=['macys','installment','oil','beans']
L2=['macy','oil','inst','coffee']

L3=[]
for x in L1:
for y in L2:
if x.startswith(y):
L3.append(y)

# prints ['macy', 'inst', 'oil']
print L3

Dec 13 '06 #1

Subscribe Post Reply

2279

Raymond Hettinger

[mkppk]

I have kind of strange change I'd like to make to the sets.Set()
intersection() method..

Normally, intersection would return items in both s1 and s2 like with
something like this: s1.intersection(s2)

. . .

- the lists I am working with are small, like 1-10 items each

from sets import Set
from itertools import ifilter

class mySet(Set):
def isDisjoint(self, other):
if len(self) <= len(other):
little, big = self, other
else:
little, big = other, self
for elem in ifilter(big._data.has_key, little):
return False
return True

p = mySet('abc')
q = mySet('def')
r = mySet('cde')
print p.isDisjoint(q)
print r.isDisjoint(q)

Hope something like this works for you.
Raymond

Dec 13 '06 #2

Gabriel Genellina

At Tuesday 12/12/2006 23:23, mkppk wrote:

>I have kind of strange change I'd like to make to the sets.Set()
intersection() method..

Normally, intersection would return items in both s1 and s2 like with
something like this: s1.intersection(s2)

I want the item matching to be a bit "looser".. that is, items in s2
that match to just the beginning of items in s1 would be included in
the result of intersection().

I do not know how intersection() is implemented, so I just kinda
guessed it might have something to do with how it compares set
elements, probably using __eq__ or __cmp__. SO, I though if I override
these methods, maybe magically that would affect the way intersection
works.. so far, no luck =(

You got it the wrong way... That methods are used to compare two
sets, not to compare their elements.
You don't have to modify set behavior, instead, you should modify how
the set elements compare themselves. That is, you should inherit from
str and implement some "fuzzy comparison" logic.

>- the lists I am working with are small, like 1-10 items each

For such small lists, perhaps the best way is to iterate along both
lists, like in your example. But replace x.startswith(y) with
x[:len(y)]==y which is faster. Also, don't you have to test the other
way too? y.startswith(x)

># this is the list implementation I am trying to avoid because I am
under the impression using set would be faster..(??)
# please let me know if I am wrong about that assumption

L1=['macys','installment','oil','beans']
L2=['macy','oil','inst','coffee']

L3=[]
for x in L1:
for y in L2:
if x.startswith(y):
L3.append(y)

# prints ['macy', 'inst', 'oil']
print L3

You can use the timeit module to measure performance.

Just for fun -because I don't think it would be better for small sets
as you have- this is an implementation of a "fuzzystring" class which
only compares its first character.

class fuzzystr(str):

"""A fuzzy string. Only takes its first character into account
when comparing.
That is, fuzzystr('abc')==fuzzystr('add')"""

def __cmp__(self, other):
if not isinstance(other, basestring): return -1 # always <
any other thing
if not self: return len(other) and -1 or 0
if not other: return 1
return cmp(self[0], other[0])

def __eq__(self, other): return self.__cmp__(other)==0
def __ne__(self, other): return self.__cmp__(other)!=0
def __lt__(self, other): return self.__cmp__(other)<0
def __le__(self, other): return self.__cmp__(other)<=0
def __gt__(self, other): return self.__cmp__(other)>0
def __ge__(self, other): return self.__cmp__(other)>=0

def __hash__(self):
# This must hold for all instances: x==y =hash(x)==hash(y)
if self: return hash(self[0])
return hash('')

try: set
except NameError: from sets import Set as set

s1=set(map(fuzzystr,['macys','installment','oil','beans']))
s2=set(map(fuzzystr,['macy','oil','inst','coffee']))
assert s1.intersection(s2) == set(map(fuzzystr,['macy','inst','oil']))
--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar

Dec 13 '06 #3

Similar topics

Py2.3: Feedback on Sets

by: Raymond Hettinger | last post by:

I've gotten lots of feedback on the itertools module but have not heard a peep about the new sets module. * Are you overjoyed/outraged by the choice of | and & as set operators (instead of + and...

Python

Feedback on Sets, and Partitions

by: Steve | last post by:

This post has two parts. First is my feedback on sets. (Hello? Last summer called, they want their discussion thread back...) Second is some questions about my implementation of a partition...

Python

Intersection of Multiple Sets

by: Ryan R. Rosario | last post by:

Hello - I am working on a scheduling application that has many "rules" for scheduling people. I throw each person into the set that corresponds to 2 teams. Then I split this large group (of...

C / C++

Intersection of lists/sets -- with a catch

by: James Stroud | last post by:

Hello All, I find myself in this situation from time to time: I want to compare two lists of arbitrary objects and (1) find those unique to the first list, (2) find those unique to the second...

Python

finding the intersection of a list of Sets

by: Suresh Jeevanandam | last post by:

I have a list of sets in variable lsets . Now I want to find the intersection of all the sets. r = lsets for s in r: r = r & s Is there any other shorter way?

Python

Dealing with multiple sets

by: John Henry | last post by:

Hi list, If I have a bunch of sets: a = set((1, 2, 3)) b = set((2, 3)) c = set((1, 3)) ..... What's the cleanest way to say:

Python

fastest way to find the intersection of n lists of sets

by: Prateek | last post by:

I have 3 variable length lists of sets. I need to find the common elements in each list (across sets) really really quickly. Here is some sample code: # Doesn't make sense to union the sets -...

Python

Getting some element from sets.Set

by: jm.suresh | last post by:

It is not possible to index set objects. That is OK. But, what if I want to find some element from the Set. from sets import Set s = Set( range(12 ) if I do pop, that particular element gets...

Python

Generics and Sets

by: JosAH | last post by:

Greetings, Introduction This week I'll write a bit about generics (those funny angular brackets). I need an example and decided to use sets and some of their operations. This weeks' article...

Java

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General