473,839 Members | 1,392 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Feedback on Sets, and Partitions

This post has two parts. First is my feedback on sets. (Hello? Last
summer called, they want their discussion thread back...) Second is some
questions about my implementation of a partition function for sets.

Part 1.
---
From: Raymond Hettinger (vz******@veriz on.net)
Subject: Py2.3: Feedback on Sets
Newsgroups: comp.lang.pytho n
Date: 2003-08-11 23:02:18 PST

I've gotten lots of feedback on the itertools module
but have not heard a peep about the new sets module.

* Are you overjoyed/outraged by the choice of | and &
as set operators (instead of + and *)?
I like '|' and '&', because union and intersection are more like or and
and than they are like addition and multiplication.
* Is the support for sets of sets necessary for your work
and, if so, then is the implementation sufficiently
powerful?
Sets would be far less interesting and useful if we could not have sets
of sets. The implementation is sufficient, but not entirely intuitive. Of
course, there is no distinction between immutable and mutable sets in
normal set theory. I could live with only immutable sets, but it is nice
being able to use:
myset.add(x)

instead of:
newmyset = myset | Set([x])

I would like (I think) syntactic support where curly-braces could replace
the class constructor.

Another unintutive feature is that there can be multiple sets with the
same members. I.e., you can have sets A and B such that "A == B" is true
and "A is B" is false. I would like more information on the reasons for
this choice and whether there would be anything to gain or lose by having
"A == B" entail "A is B".
* Is there a compelling need for additional set methods like
Set.powerset() and Set.isdisjoint( s) or are the current
offerings sufficient?
I don't know if the need is compelling, but I consider the implementation
incomplete without powerset() and some other functions. However, please
see my discussion of a partition function below.
* Does the performance meet your expectations?
I don't know. For memory usage, I did some tests and it looks like a set
that is the singleton of an integer takes about 200 bytes, maybe a little
less. On the one hand that isn't bad, on the other, it's kind of big. 50
bytes would be great. Does or could Set use __slots__?
As for speed, see below.
* Do you care that sets can only contain hashable elements?
Is this required for efficiently preventing duplicate elements or at
least for having lookups be O(1)? Then I can live with it. In general
though I would think that the more types of things that are eligible to
be elements, the better.
* How about the design constraint that the argument to most
set methods must be another Set (as opposed to any iterable)?
Fine with me. It is not obvious in the first place what you should expect
the union of a set and a tuple to be (in set theory, not Python).
* Are the docs clear? Can you suggest improvements?
5.13.1 Set Objects. The table of operations could list proper subset as
well as subset.
Also, I would be happy to see as much information as possible about O()
properties of sets. (Explicit, not "same as a dictionary".) Maybe that
doesn't belong in the documentation proper.
* Are sets helpful in your daily work or does the need arise
only rarely?


I am strongly in favor of moving sets into the core language with
supporting syntax and a C implementation.

Part 2
---
What is true of the performance of a partition function for the most part
should be true of a powerset, combination, and permutation (for ordered
subsets) function.

In general, I don't know if a partition function has much use when you're
dealing with sets with moderately large numbers of members. The problem
with these functions is that their size grows quickly. I have ruled out
making one giant set of all the partitions because it is too big (1
million sets uses 200MB memory). Instead I have a generator function that
yields a different partition on each iteration. The problem then is that
it takes too long. I fear that, despite what I said above about including
them in a complete implementation of sets for Python, there may not be
any practical use for these functions except in limited domains with
small numbers of elements.

So my question is for advice on improving my implementation (by a
constant factor would be great, by a factor of n would be amazing,
although maybe impossible), and also general advice on the best
performance I could ever hope for with these sorts of functions. Below
are some numbers, and then the code.

For those who don't know, a partition P of a set A is a set of mutually
exclusive and jointly exhaustive subsets of A. In other words, every
member of A is a member of exactly one member of P.

As I learned when working on this implementation, the number of
partitions for a set of size n is the Bell Number of n. The formula for
computing the Bell Number isn't really relevant at the moment, but here
are the Bell Numbers for 0,...,14:
1, 1, 2, 5, 15, 52, 203, 877, 4140, 21147, 115975, 678570, 4213597,
27644437, 190899322

This means that a set with 3 elements has 5 partitions. Here is an
example:
Partitions({a,b ,c}) =
{
{
{a,b,c}
}
{
{a,b},{c}
}
{
{a,c},{b}
}
{
{b,c},{a}
}
{
{a},{b},{c}
}
}

This shows one big set of all the partitions, where each partition is a
set of sets of elements. The function below, being a generator, does not
return one big set of all partitions, but a different partition on each
iteration.

Now, my computer is a 1.8GHz Athlon with 512MB RAM (PC133 unfortunately).
I use Fedora Core 1 Linux and Python 2.3 RPM from python.org. For a
benchmark, the following code takes about 10 seconds to run:

i = 10000000 #ten million
while i > 0:
i -= 1

So here are some times for generating the partitions of sets of
increasing sizes:

1, <1s
2, <1s
3, <1s
4, <1s
5, <1s
6, <1s
7, <1s
8, <1s
9, 1.6s
10, 9s
11, 53s
12, 5m25s

For my current project, I need to generate partitions of sets up to size
25. I am rather hopeless it will ever be feasible, and I have begun
looking for a different approach. Thanks for any replies!

#begin code

from sets import Set, ImmutableSet

#def singleton(s):
# return Set([s])

def partitions(s):
if len(s) <= 1:
yield Set([s])
else:
x = Set([iter(s).next()])
for t in partitions(s - x):
yield t | Set([x])
for u in t:
yield (t - Set([u])) | Set([u | x])
s = Set("abcdefghij ") #10 elements, which is 9s runtime
count = 0
for p in partitions(s):
count += 1

print count
Jul 18 '05 #1
7 2225
In article <ma************ *************** ***********@pyt hon.org>,
"Steve" <hu****@fea.s t> wrote:
For my current project, I need to generate partitions of sets up to size
25. I am rather hopeless it will ever be feasible, and I have begun
looking for a different approach. Thanks for any replies!


There are 463859033222999 9353 partitions of an 25 item set.
<http://www.research.at t.com/projects/OEIS?Anum=A0001 10>

So, I think generating them all is a little out of the question.

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science
Jul 18 '05 #2
In article <ep************ *************** *@news.service. uci.edu>,
David Eppstein <ep******@ics.u ci.edu> wrote:
In article <ma************ *************** ***********@pyt hon.org>,
"Steve" <hu****@fea.s t> wrote:
For my current project, I need to generate partitions of sets up to size
25. I am rather hopeless it will ever be feasible, and I have begun
looking for a different approach. Thanks for any replies!


There are 463859033222999 9353 partitions of an 25 item set.
<http://www.research.at t.com/projects/OEIS?Anum=A0001 10>

So, I think generating them all is a little out of the question.


BTW, here is some code for generating these numbers. I think it's a
little interesting that one can write the obvious one-line definition
for factorials, forgetting the base case, and then make it work anyway
via the magic of memoization. Something to think about in connection
with PEP 318...
def memoize(recurre nce, base_case={}):
memo = {}
for k,rk in base_case.items ():
if not isinstance(k,tu ple):
k = (k,)
memo[k] = rk

def memoized(*args) :
if args not in memo:
memo[args] = recurrence(*arg s)
return memo[args]

return memoized

def factorial(n):
return factorial(n-1)*n

factorial = memoize(factori al, base_case={0:1} )

def choose(n,k):
return factorial(n) // factorial(k) // factorial(n-k)

def bell(n):
return sum([choose(n-1,k)*bell(n-1-k) for k in range(n)])

bell = memoize(bell, base_case={0:1} )

print "There are",bell(25)," partitions of a 25 item set."

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science
Jul 18 '05 #3
"Steve" <hu****@fea.s t> wrote in message news:<ma******* *************** *************** *@python.org>.. .
This post has two parts. First is my feedback on sets. (Hello? Last
summer called, they want their discussion thread back...) Second is some
questions about my implementation of a partition function for sets.

Part 1.
--- ..... snipped Part 2
--- ..... snipped For my current project, I need to generate partitions of sets up to size
25. I am rather hopeless it will ever be feasible, and I have begun
looking for a different approach. Thanks for any replies!


For Part 2, Google on restricted growth functions if you are
interested in non-recursive generation of set partitions of n objects
with k classes.

I have an implementation done in python, not using sets but rather an
array of integers which can be used for indexing, for processing
equivalences and set partitions (See
http://www.geocities.com/eadorio/equiv.pdf).

Regards,

Ernie
Jul 18 '05 #4
David Eppstein <ep******@ics.u ci.edu> wrote:
BTW, here is some code for generating these numbers. I think it's a
little interesting that one can write the obvious one-line definition
for factorials, forgetting the base case, and then make it work anyway
via the magic of memoization. Something to think about in connection
with PEP 318...


[snip some interesting code]

Here's a more tedious way to do the same, but this code is more
"graphic" in that it shows how one can build up a triangle of numbers
line by line according to some mutation prescription.

Depending on the kind of prescription one ends up with pascals
triangle or the triangle for bell numbers.

def pascal_mutate(L ):
R = [L[i] + L[i+1] for i in xrange(len(L)-1)]
return [1] + R + [1]

def stirling_mutate (L):
R = [L[i] + (i+2)*L[i+1] for i in xrange(len(L)-1)]
return [1] + R + [1]

def triangle(mutato r):
L = [1]
while 1:
yield L
L = mutator(L)

def listify(gen):
cache = []
def func(n):
while len(cache) <= n:
cache.append(ge n.next())
return cache[n]
return func

def triangle_functi on_generator(mu tator):
T = listify(triangl e(mutator))
def func(n,k):
return T(n)[k]
return func

pascal = triangle_functi on_generator(pa scal_mutate)
stirling_raw = triangle_functi on_generator(st irling_mutate)

def noverk(n,k):
return reduce(lambda a,b: a*(n-b)/(b+1),range(k), 1)

def stirling(n,k):
return stirling_raw(n-1,k-1)

def bell(n):
return sum([stirling(n,i) for i in range(1,n+1)])

def test():
n,k = 10,4
assert pascal(n,k) == noverk(n,k)
print bell(25)

if __name__=='__ma in__':
test()

#prints 463859033222999 9353

Anton

Jul 18 '05 #5
"Steve" <hu****@fea.s t> wrote in message news:<ma******* *************** *************** *@python.org>.. .
Another unintutive feature is that there can be multiple sets with the
same members. I.e., you can have sets A and B such that "A == B" is true
and "A is B" is false. I would like more information on the reasons for
this choice and whether there would be anything to gain or lose by having
"A == B" entail "A is B".


I can't speak for the designers, but I think there's two problems with
A == B => A is B. One, none of the other sequences behave that way.
Two (and this is the real killer) I think it would be tough to
implement.
A = Set([1, 2, 3])
B = Set([1, 2])
#A is B clearly false, as is A==B
B.add(3)
#Now A==B is true

After every alteration of a set, such as the add() above, you'd have
to do an exhaustive comparison of every other set for equality. Then
you'd do something like B = A to drop the old B set and set up another
reference to A. Then if you altered B and not A, you'd have to do
something like copy A, make the change, and assign the result to B.

Problem with that is, it breaks the case where you want A and B to be
pointing to a single set which is modified so that A and B change.
The most common case would be passing a Set to a subroutine.

So I just don't think it's feasible.
Jul 18 '05 #6
> I could live with only immutable sets, but it is nice
being able to use:
myset.add(x)

instead of:
newmyset = myset | Set([x])
What about

myset |= Set([x])

Presumably that would work analogously to += for strings.
Another unintutive feature is that there can be multiple sets with the
same members. I.e., you can have sets A and B such that "A == B" is true
and "A is B" is false.


Why is that any more unintuitive than the fact that there can be multiple
integers with the same value?
For example:

x = 12345
y = 12344+1

After these two statements, most Python implementations will evaluat "x is
y" as false, but of course 1==y is true.
Jul 18 '05 #7
ea*****@yahoo.c om (Ernie) wrote:
I have an implementation done in python, not using sets but rather an
array of integers which can be used for indexing, for processing
equivalences and set partitions (See
http://www.geocities.com/eadorio/equiv.pdf).


Thanks.

Here's something I wrote in 2000. I just did some very superficial
inspection of the code and adapted a few names that would not be ok
anymore, now that we have sets, but very likely I would do things very
differently now.

It might start off other implementations though so I'm providing it
below here.

# SetPart.py : return a set divided into subsets. The possible ways
# this can be done are indexed. Algorithms are adapted from the book
# "Combinator ial Algorithms..." (1999) by Donald Kreher and
# Douglas Stinson.
# See also: http://www.math.mtu.edu/~kreher/cages.html
# Translated and adapted for Python by Anton Vredegoor:
# an***@vredegoor .doge.nl 19 jun 2000
# last update: april 2004

class SetPart:

def __init__(self, aset):
self.aset = aset
self.m = len(self.aset)
self.d = self.Generalize dRGF(self.m)
self.count = self.d[self.m][0]

def __getitem__(sel f, index):
# Partition number index
if index > self.count - 1:
raise IndexError
rgf = self.UnrankRGF( self.m, index)
result = self.RGFToSetPa rt(self.m, rgf, self.aset)
return result[1:]

## ** Algorithm 3.12
def RGFToSetPart(se lf, m, b, aset):
# Transform a restricted growth function list into a partition
A = []
n = 1
for i in range(1,m+1):
if b[i] > n:
n = b[i]
for i in range(n+1):
A.append([])
for i in range(1,m+1):
A[b[i]].append(aset[i-1])
return A

## ** Algorithm 3.14
def GeneralizedRGF( self, m):
# Compute the values d[i.j]
d = [[1L] * (m+1) for i in range(m+1)]
for i in range(1,m+1):
for j in range(0,m-i+1):
d[i][j] = j * d[i-1][j] + d[i-1][j+1]
return d

##** Algorithm 3.16
def UnrankRGF(self, m, r):
# Unrank a restricted grow function
f = [1] * (m+1)
j = 1
d = self.d
for i in range(2,m+1):
v = d[m-i][j]
cr = j*v
if cr <= r:
f[i] = j + 1
r -= cr
j += 1
else:
f[i] = r / v + 1
r %= v
return f

def test():
aset = range(5)
P = SetPart(aset)
print P.count
for x in P:
print x

if __name__ == '__main__':
test()

Anton
Jul 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

21
2505
by: Raymond Hettinger | last post by:
I've gotten lots of feedback on the itertools module but have not heard a peep about the new sets module. * Are you overjoyed/outraged by the choice of | and & as set operators (instead of + and *)? * Is the support for sets of sets necessary for your work and, if so, then is the implementation sufficiently powerful?
5
2010
by: Raymond Hettinger | last post by:
For Py2.4, I'm working on a C implementation of Sets.py with the possibility of having a set() type as a builtin. There is a question as whether the current design for set of sets has been useful. To support sets-of-sets, the current design includes an automatic conversion to an immutable set class. The original use case was for implementing graph algorithms. Has anyone used sets-of-sets to solve any non-toy problems? Has anyone...
5
4313
by: Nick J Chackowsky | last post by:
Wrote a python script to find the partitions of an integer (a list of all the ways you can express n as a sum of integers). For example, the partitions of 5 are 5 4+1 3+2 2+2+1 3+1+1 2+1+1+1 1+1+1+1+1
1
4193
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make regular expressions easier to create and use (and in my experience as a regular expression user, it makes them MUCH easier to create and use.) I'm still working on formal documentation, and in any case, such documentation isn't necessarily the...
3
2475
by: jcgeorge | last post by:
I have a Windows DPF (v8.2.2) environment. 2 Nodes 5 Partitions Server1 - Cat (0) Data (1) Data (2) Server2 - Data (3) Data (4) I want to use block-based IO, but I do not want the same size block area of the bufferpool in each partition, I want a smaller value in the catalog partition - because the bufferpool size is much smaller.
8
1653
by: arunrocks | last post by:
Hi I am having a requirement to create a db in 2 out of 8 partitiones. I have the following doubts. 1. should I create a new instance in 2 partitions alone (the present instance spans 8 nodes) 2. or is there a way to create the db in 2 out of 8. If I have to create a new instance, (its a BCU from IBM) I;d be happy if someone would link a material (1st time I am working in a partitioned env) Should I execute db2icrt in each of the...
13
31652
by: jm.suresh | last post by:
It is not possible to index set objects. That is OK. But, what if I want to find some element from the Set. from sets import Set s = Set( range(12 ) if I do pop, that particular element gets removed. I do not want to remove the element, but get some element from the Set.
1
6556
by: JosAH | last post by:
Greetings, Introduction This week I'll write a bit about generics (those funny angular brackets). I need an example and decided to use sets and some of their operations. This weeks' article discusses one set class and two interesting operations on the set: combinations and set partitioning. First a description of the algorithms is given and then we'll have some fun with the generic implementation of them.
1
27813
Nepomuk
by: Nepomuk | last post by:
In most modern distributions of Linux and Unix (at least ones with graphical environments like KDE, Gnome or XFCE), you get to mount your partitions easily from the desktop. In some cases however, it isn't that easy. Note, that all of these commands have to be run in a terminal as root user. To open a terminal (sometimes called a console), either press ALT+F2 and type xterm (alternatives could be konsole or terminal) or press CRTL+ALT+F1 to...
0
9855
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9697
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10908
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10587
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9426
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7829
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5867
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4487
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3136
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.