473,396 Members | 1,998 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Feedback on Sets, and Partitions

This post has two parts. First is my feedback on sets. (Hello? Last
summer called, they want their discussion thread back...) Second is some
questions about my implementation of a partition function for sets.

Part 1.
---
From: Raymond Hettinger (vz******@verizon.net)
Subject: Py2.3: Feedback on Sets
Newsgroups: comp.lang.python
Date: 2003-08-11 23:02:18 PST

I've gotten lots of feedback on the itertools module
but have not heard a peep about the new sets module.

* Are you overjoyed/outraged by the choice of | and &
as set operators (instead of + and *)?
I like '|' and '&', because union and intersection are more like or and
and than they are like addition and multiplication.
* Is the support for sets of sets necessary for your work
and, if so, then is the implementation sufficiently
powerful?
Sets would be far less interesting and useful if we could not have sets
of sets. The implementation is sufficient, but not entirely intuitive. Of
course, there is no distinction between immutable and mutable sets in
normal set theory. I could live with only immutable sets, but it is nice
being able to use:
myset.add(x)

instead of:
newmyset = myset | Set([x])

I would like (I think) syntactic support where curly-braces could replace
the class constructor.

Another unintutive feature is that there can be multiple sets with the
same members. I.e., you can have sets A and B such that "A == B" is true
and "A is B" is false. I would like more information on the reasons for
this choice and whether there would be anything to gain or lose by having
"A == B" entail "A is B".
* Is there a compelling need for additional set methods like
Set.powerset() and Set.isdisjoint(s) or are the current
offerings sufficient?
I don't know if the need is compelling, but I consider the implementation
incomplete without powerset() and some other functions. However, please
see my discussion of a partition function below.
* Does the performance meet your expectations?
I don't know. For memory usage, I did some tests and it looks like a set
that is the singleton of an integer takes about 200 bytes, maybe a little
less. On the one hand that isn't bad, on the other, it's kind of big. 50
bytes would be great. Does or could Set use __slots__?
As for speed, see below.
* Do you care that sets can only contain hashable elements?
Is this required for efficiently preventing duplicate elements or at
least for having lookups be O(1)? Then I can live with it. In general
though I would think that the more types of things that are eligible to
be elements, the better.
* How about the design constraint that the argument to most
set methods must be another Set (as opposed to any iterable)?
Fine with me. It is not obvious in the first place what you should expect
the union of a set and a tuple to be (in set theory, not Python).
* Are the docs clear? Can you suggest improvements?
5.13.1 Set Objects. The table of operations could list proper subset as
well as subset.
Also, I would be happy to see as much information as possible about O()
properties of sets. (Explicit, not "same as a dictionary".) Maybe that
doesn't belong in the documentation proper.
* Are sets helpful in your daily work or does the need arise
only rarely?


I am strongly in favor of moving sets into the core language with
supporting syntax and a C implementation.

Part 2
---
What is true of the performance of a partition function for the most part
should be true of a powerset, combination, and permutation (for ordered
subsets) function.

In general, I don't know if a partition function has much use when you're
dealing with sets with moderately large numbers of members. The problem
with these functions is that their size grows quickly. I have ruled out
making one giant set of all the partitions because it is too big (1
million sets uses 200MB memory). Instead I have a generator function that
yields a different partition on each iteration. The problem then is that
it takes too long. I fear that, despite what I said above about including
them in a complete implementation of sets for Python, there may not be
any practical use for these functions except in limited domains with
small numbers of elements.

So my question is for advice on improving my implementation (by a
constant factor would be great, by a factor of n would be amazing,
although maybe impossible), and also general advice on the best
performance I could ever hope for with these sorts of functions. Below
are some numbers, and then the code.

For those who don't know, a partition P of a set A is a set of mutually
exclusive and jointly exhaustive subsets of A. In other words, every
member of A is a member of exactly one member of P.

As I learned when working on this implementation, the number of
partitions for a set of size n is the Bell Number of n. The formula for
computing the Bell Number isn't really relevant at the moment, but here
are the Bell Numbers for 0,...,14:
1, 1, 2, 5, 15, 52, 203, 877, 4140, 21147, 115975, 678570, 4213597,
27644437, 190899322

This means that a set with 3 elements has 5 partitions. Here is an
example:
Partitions({a,b,c}) =
{
{
{a,b,c}
}
{
{a,b},{c}
}
{
{a,c},{b}
}
{
{b,c},{a}
}
{
{a},{b},{c}
}
}

This shows one big set of all the partitions, where each partition is a
set of sets of elements. The function below, being a generator, does not
return one big set of all partitions, but a different partition on each
iteration.

Now, my computer is a 1.8GHz Athlon with 512MB RAM (PC133 unfortunately).
I use Fedora Core 1 Linux and Python 2.3 RPM from python.org. For a
benchmark, the following code takes about 10 seconds to run:

i = 10000000 #ten million
while i > 0:
i -= 1

So here are some times for generating the partitions of sets of
increasing sizes:

1, <1s
2, <1s
3, <1s
4, <1s
5, <1s
6, <1s
7, <1s
8, <1s
9, 1.6s
10, 9s
11, 53s
12, 5m25s

For my current project, I need to generate partitions of sets up to size
25. I am rather hopeless it will ever be feasible, and I have begun
looking for a different approach. Thanks for any replies!

#begin code

from sets import Set, ImmutableSet

#def singleton(s):
# return Set([s])

def partitions(s):
if len(s) <= 1:
yield Set([s])
else:
x = Set([iter(s).next()])
for t in partitions(s - x):
yield t | Set([x])
for u in t:
yield (t - Set([u])) | Set([u | x])
s = Set("abcdefghij") #10 elements, which is 9s runtime
count = 0
for p in partitions(s):
count += 1

print count
Jul 18 '05 #1
7 2189
In article <ma**************************************@python.o rg>,
"Steve" <hu****@fea.st> wrote:
For my current project, I need to generate partitions of sets up to size
25. I am rather hopeless it will ever be feasible, and I have begun
looking for a different approach. Thanks for any replies!


There are 4638590332229999353 partitions of an 25 item set.
<http://www.research.att.com/projects/OEIS?Anum=A000110>

So, I think generating them all is a little out of the question.

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science
Jul 18 '05 #2
In article <ep****************************@news.service.uci.e du>,
David Eppstein <ep******@ics.uci.edu> wrote:
In article <ma**************************************@python.o rg>,
"Steve" <hu****@fea.st> wrote:
For my current project, I need to generate partitions of sets up to size
25. I am rather hopeless it will ever be feasible, and I have begun
looking for a different approach. Thanks for any replies!


There are 4638590332229999353 partitions of an 25 item set.
<http://www.research.att.com/projects/OEIS?Anum=A000110>

So, I think generating them all is a little out of the question.


BTW, here is some code for generating these numbers. I think it's a
little interesting that one can write the obvious one-line definition
for factorials, forgetting the base case, and then make it work anyway
via the magic of memoization. Something to think about in connection
with PEP 318...
def memoize(recurrence, base_case={}):
memo = {}
for k,rk in base_case.items():
if not isinstance(k,tuple):
k = (k,)
memo[k] = rk

def memoized(*args):
if args not in memo:
memo[args] = recurrence(*args)
return memo[args]

return memoized

def factorial(n):
return factorial(n-1)*n

factorial = memoize(factorial, base_case={0:1})

def choose(n,k):
return factorial(n) // factorial(k) // factorial(n-k)

def bell(n):
return sum([choose(n-1,k)*bell(n-1-k) for k in range(n)])

bell = memoize(bell, base_case={0:1})

print "There are",bell(25),"partitions of a 25 item set."

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science
Jul 18 '05 #3
"Steve" <hu****@fea.st> wrote in message news:<ma**************************************@pyt hon.org>...
This post has two parts. First is my feedback on sets. (Hello? Last
summer called, they want their discussion thread back...) Second is some
questions about my implementation of a partition function for sets.

Part 1.
--- ..... snipped Part 2
--- ..... snipped For my current project, I need to generate partitions of sets up to size
25. I am rather hopeless it will ever be feasible, and I have begun
looking for a different approach. Thanks for any replies!


For Part 2, Google on restricted growth functions if you are
interested in non-recursive generation of set partitions of n objects
with k classes.

I have an implementation done in python, not using sets but rather an
array of integers which can be used for indexing, for processing
equivalences and set partitions (See
http://www.geocities.com/eadorio/equiv.pdf).

Regards,

Ernie
Jul 18 '05 #4
David Eppstein <ep******@ics.uci.edu> wrote:
BTW, here is some code for generating these numbers. I think it's a
little interesting that one can write the obvious one-line definition
for factorials, forgetting the base case, and then make it work anyway
via the magic of memoization. Something to think about in connection
with PEP 318...


[snip some interesting code]

Here's a more tedious way to do the same, but this code is more
"graphic" in that it shows how one can build up a triangle of numbers
line by line according to some mutation prescription.

Depending on the kind of prescription one ends up with pascals
triangle or the triangle for bell numbers.

def pascal_mutate(L):
R = [L[i] + L[i+1] for i in xrange(len(L)-1)]
return [1] + R + [1]

def stirling_mutate(L):
R = [L[i] + (i+2)*L[i+1] for i in xrange(len(L)-1)]
return [1] + R + [1]

def triangle(mutator):
L = [1]
while 1:
yield L
L = mutator(L)

def listify(gen):
cache = []
def func(n):
while len(cache) <= n:
cache.append(gen.next())
return cache[n]
return func

def triangle_function_generator(mutator):
T = listify(triangle(mutator))
def func(n,k):
return T(n)[k]
return func

pascal = triangle_function_generator(pascal_mutate)
stirling_raw = triangle_function_generator(stirling_mutate)

def noverk(n,k):
return reduce(lambda a,b: a*(n-b)/(b+1),range(k),1)

def stirling(n,k):
return stirling_raw(n-1,k-1)

def bell(n):
return sum([stirling(n,i) for i in range(1,n+1)])

def test():
n,k = 10,4
assert pascal(n,k) == noverk(n,k)
print bell(25)

if __name__=='__main__':
test()

#prints 4638590332229999353

Anton

Jul 18 '05 #5
"Steve" <hu****@fea.st> wrote in message news:<ma**************************************@pyt hon.org>...
Another unintutive feature is that there can be multiple sets with the
same members. I.e., you can have sets A and B such that "A == B" is true
and "A is B" is false. I would like more information on the reasons for
this choice and whether there would be anything to gain or lose by having
"A == B" entail "A is B".


I can't speak for the designers, but I think there's two problems with
A == B => A is B. One, none of the other sequences behave that way.
Two (and this is the real killer) I think it would be tough to
implement.
A = Set([1, 2, 3])
B = Set([1, 2])
#A is B clearly false, as is A==B
B.add(3)
#Now A==B is true

After every alteration of a set, such as the add() above, you'd have
to do an exhaustive comparison of every other set for equality. Then
you'd do something like B = A to drop the old B set and set up another
reference to A. Then if you altered B and not A, you'd have to do
something like copy A, make the change, and assign the result to B.

Problem with that is, it breaks the case where you want A and B to be
pointing to a single set which is modified so that A and B change.
The most common case would be passing a Set to a subroutine.

So I just don't think it's feasible.
Jul 18 '05 #6
> I could live with only immutable sets, but it is nice
being able to use:
myset.add(x)

instead of:
newmyset = myset | Set([x])
What about

myset |= Set([x])

Presumably that would work analogously to += for strings.
Another unintutive feature is that there can be multiple sets with the
same members. I.e., you can have sets A and B such that "A == B" is true
and "A is B" is false.


Why is that any more unintuitive than the fact that there can be multiple
integers with the same value?
For example:

x = 12345
y = 12344+1

After these two statements, most Python implementations will evaluat "x is
y" as false, but of course 1==y is true.
Jul 18 '05 #7
ea*****@yahoo.com (Ernie) wrote:
I have an implementation done in python, not using sets but rather an
array of integers which can be used for indexing, for processing
equivalences and set partitions (See
http://www.geocities.com/eadorio/equiv.pdf).


Thanks.

Here's something I wrote in 2000. I just did some very superficial
inspection of the code and adapted a few names that would not be ok
anymore, now that we have sets, but very likely I would do things very
differently now.

It might start off other implementations though so I'm providing it
below here.

# SetPart.py : return a set divided into subsets. The possible ways
# this can be done are indexed. Algorithms are adapted from the book
# "Combinatorial Algorithms..." (1999) by Donald Kreher and
# Douglas Stinson.
# See also: http://www.math.mtu.edu/~kreher/cages.html
# Translated and adapted for Python by Anton Vredegoor:
# an***@vredegoor.doge.nl 19 jun 2000
# last update: april 2004

class SetPart:

def __init__(self, aset):
self.aset = aset
self.m = len(self.aset)
self.d = self.GeneralizedRGF(self.m)
self.count = self.d[self.m][0]

def __getitem__(self, index):
# Partition number index
if index > self.count - 1:
raise IndexError
rgf = self.UnrankRGF(self.m, index)
result = self.RGFToSetPart(self.m, rgf, self.aset)
return result[1:]

## ** Algorithm 3.12
def RGFToSetPart(self, m, b, aset):
# Transform a restricted growth function list into a partition
A = []
n = 1
for i in range(1,m+1):
if b[i] > n:
n = b[i]
for i in range(n+1):
A.append([])
for i in range(1,m+1):
A[b[i]].append(aset[i-1])
return A

## ** Algorithm 3.14
def GeneralizedRGF(self, m):
# Compute the values d[i.j]
d = [[1L] * (m+1) for i in range(m+1)]
for i in range(1,m+1):
for j in range(0,m-i+1):
d[i][j] = j * d[i-1][j] + d[i-1][j+1]
return d

##** Algorithm 3.16
def UnrankRGF(self, m, r):
# Unrank a restricted grow function
f = [1] * (m+1)
j = 1
d = self.d
for i in range(2,m+1):
v = d[m-i][j]
cr = j*v
if cr <= r:
f[i] = j + 1
r -= cr
j += 1
else:
f[i] = r / v + 1
r %= v
return f

def test():
aset = range(5)
P = SetPart(aset)
print P.count
for x in P:
print x

if __name__ == '__main__':
test()

Anton
Jul 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

21
by: Raymond Hettinger | last post by:
I've gotten lots of feedback on the itertools module but have not heard a peep about the new sets module. * Are you overjoyed/outraged by the choice of | and & as set operators (instead of + and...
5
by: Raymond Hettinger | last post by:
For Py2.4, I'm working on a C implementation of Sets.py with the possibility of having a set() type as a builtin. There is a question as whether the current design for set of sets has been useful....
5
by: Nick J Chackowsky | last post by:
Wrote a python script to find the partitions of an integer (a list of all the ways you can express n as a sum of integers). For example, the partitions of 5 are 5 4+1 3+2 2+2+1 3+1+1 2+1+1+1...
1
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make...
3
by: jcgeorge | last post by:
I have a Windows DPF (v8.2.2) environment. 2 Nodes 5 Partitions Server1 - Cat (0) Data (1) Data (2) Server2 - Data (3) Data (4) I want to use block-based IO, but I do not want the same size...
8
by: arunrocks | last post by:
Hi I am having a requirement to create a db in 2 out of 8 partitiones. I have the following doubts. 1. should I create a new instance in 2 partitions alone (the present instance spans 8 nodes)...
13
by: jm.suresh | last post by:
It is not possible to index set objects. That is OK. But, what if I want to find some element from the Set. from sets import Set s = Set( range(12 ) if I do pop, that particular element gets...
1
by: JosAH | last post by:
Greetings, Introduction This week I'll write a bit about generics (those funny angular brackets). I need an example and decided to use sets and some of their operations. This weeks' article...
1
Nepomuk
by: Nepomuk | last post by:
In most modern distributions of Linux and Unix (at least ones with graphical environments like KDE, Gnome or XFCE), you get to mount your partitions easily from the desktop. In some cases however, it...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.