This post has two parts. First is my feedback on sets. (Hello? Last

summer called, they want their discussion thread back...) Second is some

questions about my implementation of a partition function for sets.

Part 1.

---

From: Raymond Hettinger (vz******@verizon.net)

Subject: Py2.3: Feedback on Sets

Newsgroups: comp.lang.python

Date: 2003-08-11 23:02:18 PST

I've gotten lots of feedback on the itertools module

but have not heard a peep about the new sets module.

* Are you overjoyed/outraged by the choice of | and &

as set operators (instead of + and *)?

I like '|' and '&', because union and intersection are more like or and

and than they are like addition and multiplication.

* Is the support for sets of sets necessary for your work

and, if so, then is the implementation sufficiently

powerful?

Sets would be far less interesting and useful if we could not have sets

of sets. The implementation is sufficient, but not entirely intuitive. Of

course, there is no distinction between immutable and mutable sets in

normal set theory. I could live with only immutable sets, but it is nice

being able to use:

myset.add(x)

instead of:

newmyset = myset | Set([x])

I would like (I think) syntactic support where curly-braces could replace

the class constructor.

Another unintutive feature is that there can be multiple sets with the

same members. I.e., you can have sets A and B such that "A == B" is true

and "A is B" is false. I would like more information on the reasons for

this choice and whether there would be anything to gain or lose by having

"A == B" entail "A is B".

* Is there a compelling need for additional set methods like

Set.powerset() and Set.isdisjoint(s) or are the current

offerings sufficient?

I don't know if the need is compelling, but I consider the implementation

incomplete without powerset() and some other functions. However, please

see my discussion of a partition function below.

* Does the performance meet your expectations?

I don't know. For memory usage, I did some tests and it looks like a set

that is the singleton of an integer takes about 200 bytes, maybe a little

less. On the one hand that isn't bad, on the other, it's kind of big. 50

bytes would be great. Does or could Set use __slots__?

As for speed, see below.

* Do you care that sets can only contain hashable elements?

Is this required for efficiently preventing duplicate elements or at

least for having lookups be O(1)? Then I can live with it. In general

though I would think that the more types of things that are eligible to

be elements, the better.

* How about the design constraint that the argument to most

set methods must be another Set (as opposed to any iterable)?

Fine with me. It is not obvious in the first place what you should expect

the union of a set and a tuple to be (in set theory, not Python).

* Are the docs clear? Can you suggest improvements?

5.13.1 Set Objects. The table of operations could list proper subset as

well as subset.

Also, I would be happy to see as much information as possible about O()

properties of sets. (Explicit, not "same as a dictionary".) Maybe that

doesn't belong in the documentation proper.

* Are sets helpful in your daily work or does the need arise

only rarely?

I am strongly in favor of moving sets into the core language with

supporting syntax and a C implementation.

Part 2

---

What is true of the performance of a partition function for the most part

should be true of a powerset, combination, and permutation (for ordered

subsets) function.

In general, I don't know if a partition function has much use when you're

dealing with sets with moderately large numbers of members. The problem

with these functions is that their size grows quickly. I have ruled out

making one giant set of all the partitions because it is too big (1

million sets uses 200MB memory). Instead I have a generator function that

yields a different partition on each iteration. The problem then is that

it takes too long. I fear that, despite what I said above about including

them in a complete implementation of sets for Python, there may not be

any practical use for these functions except in limited domains with

small numbers of elements.

So my question is for advice on improving my implementation (by a

constant factor would be great, by a factor of n would be amazing,

although maybe impossible), and also general advice on the best

performance I could ever hope for with these sorts of functions. Below

are some numbers, and then the code.

For those who don't know, a partition P of a set A is a set of mutually

exclusive and jointly exhaustive subsets of A. In other words, every

member of A is a member of exactly one member of P.

As I learned when working on this implementation, the number of

partitions for a set of size n is the Bell Number of n. The formula for

computing the Bell Number isn't really relevant at the moment, but here

are the Bell Numbers for 0,...,14:

1, 1, 2, 5, 15, 52, 203, 877, 4140, 21147, 115975, 678570, 4213597,

27644437, 190899322

This means that a set with 3 elements has 5 partitions. Here is an

example:

Partitions({a,b,c}) =

{

{

{a,b,c}

}

{

{a,b},{c}

}

{

{a,c},{b}

}

{

{b,c},{a}

}

{

{a},{b},{c}

}

}

This shows one big set of all the partitions, where each partition is a

set of sets of elements. The function below, being a generator, does not

return one big set of all partitions, but a different partition on each

iteration.

Now, my computer is a 1.8GHz Athlon with 512MB RAM (PC133 unfortunately).

I use Fedora Core 1 Linux and Python 2.3 RPM from python.org. For a

benchmark, the following code takes about 10 seconds to run:

i = 10000000 #ten million

while i > 0:

i -= 1

So here are some times for generating the partitions of sets of

increasing sizes:

1, <1s

2, <1s

3, <1s

4, <1s

5, <1s

6, <1s

7, <1s

8, <1s

9, 1.6s

10, 9s

11, 53s

12, 5m25s

For my current project, I need to generate partitions of sets up to size

25. I am rather hopeless it will ever be feasible, and I have begun

looking for a different approach. Thanks for any replies!

#begin code

from sets import Set, ImmutableSet

#def singleton(s):

# return Set([s])

def partitions(s):

if len(s) <= 1:

yield Set([s])

else:

x = Set([iter(s).next()])

for t in partitions(s - x):

yield t | Set([x])

for u in t:

yield (t - Set([u])) | Set([u | x])

s = Set("abcdefghij") #10 elements, which is 9s runtime

count = 0

for p in partitions(s):

count += 1

print count