472,958 Members | 2,722 Online

# [perl-python] exercise: partition a list by equivalence

here's another interesting algorithmic exercise, again from part of a
larger program in the previous series.

Here's the original Perl documentation:

=pod

merge(\$pairings) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut

For those of you unfamiliar with math lingoes, partition a list means
to create sublists, based on some criteria. In our case, the input list
comes in the form of pairs, and its members are the union of all
members in the pairs. The criterion for partition in our case is that
of equivalence, specified in the pairs input.

Try to write a Python code for this. In the Python code, the input
should be a list of couples. (for Perlers, sketch out the algorithm on
paper and try to code in Python directly.)

I'll post Perl & Python code tomorrow.

==This is brought to you by the Perl-Python community.==
== http://xahlee.org/perl-python/python.html ==

Xah
xa*@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

Jul 18 '05 #1
41 3442
>For those of you unfamiliar with math lingoes, partition a list means
to create sublists, based on some criteria.

Typical moronic mathematicians with their exclusionary lingoes...why
can't they just say "create sublists based on some criteria" like
normal people?

- alex23

Jul 18 '05 #2
Xah Lee wrote:
here's another interesting algorithmic exercise, again from part of a
larger program in the previous series.

Here's the original Perl documentation:

=pod

merge(\$pairings) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut

Almost a joke:

from numarray import *

def merge(*pairs):
m, M = min(flattened), max(flattened)

d = M - m + 1
matrix = zeros((d, d), type = Bool)

for x, y in pairs:
X, Y = x - m, y - m

matrix[X, X] = 1
matrix[X, Y] = 1
matrix[Y, X] = 1
matrix[Y, Y] = 1

while True:
next = greater(dot(matrix, matrix), 0)
if alltrue(ravel(next == matrix)):
break
matrix = next

results = []
for i in range(d):
eqls, = nonzero(matrix[i])
if eqls.size():
if i == eqls[0]:
results.append(tuple(x + m for x in eqls))

return results

Disclaimer: I'm not an expert in numarray and suppose the something can
be dramatically imporved.
Jul 18 '05 #3
On Thu, Feb 17, 2005 at 03:46:20PM -0800, Xah Lee wrote:
here's another interesting algorithmic exercise, again from part of a
larger program in the previous series.

Here's the original Perl documentation:

=pod

merge(\$pairings) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut

in Python:

def merge(pairings):
rev = {}
res = []
for pairing in pairings:
first, second = pairing
has_first = first in rev
has_second = second in rev
if has_first and has_second:
rev[first].extend(rev[second])
rev[second][:] = []
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
rev[second] = rev[first]
elif has_second:
rev[second].append(first)
rev[first] = rev[second]
else:
copy = [first, second]
res.append(copy)
rev[first] = rev[second] = copy
return filter(None, res)

and in Perl:

sub merge(\$)
{
my %rev = ();
my @res = ();
my (\$pairing, \$first, \$second, \$has_first, \$has_second);
foreach \$pairing (@{\$_[0]})
{
(\$first, \$second) = @\$pairing;
\$has_first = exists \$rev{\$first};
\$has_second = exists \$rev{\$second};
if (\$has_first and \$has_second)
{
push @{\$rev{\$first}}, @{\$rev{\$second}};
@{\$rev{\$second}} = ();
\$rev{\$second} = \$rev{\$first};
}
elsif (exists \$rev{\$first})
{
push @{\$rev{\$first}}, \$second;
\$rev{\$second} = \$rev{\$first};
}
elsif (exists \$rev{\$second})
{
push @{\$rev{\$second}}, \$first;
\$rev{\$first} = \$rev{\$second};
}
else
{
my @copy = (\$first, \$second);
push @res, \@copy;
\$rev{\$first} = \$rev{\$second} = \@copy;
}
}
return [grep @\$_, @res];
}

although in Perl you wouldn't define it to take a reference to a list
(as you did), but rather a list, and you wouldn't define it to return
a reference to a list, but rather a list in list context (and possibly
a reference in scalar context).

Both versions are (IMHO) pretty clear (when the docstring/pod is
included), and O(n) because dict/hash lookup and list
appending/pushing is O(1) in both languages. Both versions can
probably be tweaked for speed quite a bit, but I don't *think* there's
a better-than-O(n) algorithm for this.

Note that the Python version assumes that the pairs' elements are
hashable; your example used numbers, so I thought it was pretty safe
assumption. The Perl version has no such restriction.
--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
Noble cosa es, aún para un anciano, el aprender.
-- Sófocles. (497-406 a.C.) Filósofo griego.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFhXjgPqu395ykGsRAus5AJ9iCSbGsyP3QHZy/whP195GSVNIwwCcDyqf
hIA+oWaLBHdSGUi7t79Wfx8=
=BlC7
-----END PGP SIGNATURE-----

Jul 18 '05 #4

John Lenton wrote:
On Thu, Feb 17, 2005 at 03:46:20PM -0800, Xah Lee wrote:
here's another interesting algorithmic exercise, again from part of a larger program in the previous series.

Here's the original Perl documentation:

=pod

merge(\$pairings) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut
in Python:

def merge(pairings):
rev = {}
res = []
for pairing in pairings:
first, second = pairing
has_first = first in rev
has_second = second in rev

Not robust in the face of input like:
[[1,1]]
or
[[1,2], [1,2]]
or
[[1,2], [2,1]]
or
[[1,2], [2,3], [3,1]]

needs "if first == second: continue" here
if has_first and has_second:
needs "if rev[first] == rev[second]: continue" here
rev[first].extend(rev[second])
rev[second][:] = []
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
rev[second] = rev[first]
elif has_second:
rev[second].append(first)
rev[first] = rev[second]
else:
copy = [first, second]
res.append(copy)
My reaction to the "magic" by which res grows was "omigod that's the
most horrible thing I've seen for quite a while" but there was worse to
come :-)
rev[first] = rev[second] = copy
return filter(None, res)

and in Perl:
[snip]

Both versions are (IMHO) pretty clear (when the docstring/pod is
included), and O(n) because dict/hash lookup and list
appending/pushing is O(1) in both languages. Both versions can
probably be tweaked for speed quite a bit, but I don't *think* there's a better-than-O(n) algorithm for this.
List appending is O(1) only in the amortised sense. Extending is not
O(1) in any sense. Neither is the list comparison that is necessary for
robustness (using your data structure, that is).

You don't need to think. This problem has been extensively picked over
by folk who are a lot smarter than us, starting from 30 years ago.
Google for "disjoint set" and "union-find". One gets the impression
that the best possible algorithm would be slightly *worse* than O(n).

Note that the Python version assumes that the pairs' elements are
hashable; your example used numbers, so I thought it was pretty safe
assumption. The Perl version has no such restriction.

In the real world, the objects under comparison (images, fingerprints,
customer records, etc) may not themselves be hashable and comparison
for equality may be expensive but a unique identifier would be assigned
to each object and that ID would of course be cheaply hashable and
comparable.

Jul 18 '05 #5
On Fri, Feb 18, 2005 at 03:21:10PM -0800, John Machin wrote:
Not robust in the face of input like:
[[1,1]]
or
[[1,2], [1,2]]
or
[[1,2], [2,1]]
or
[[1,2], [2,3], [3,1]]

needs "if first == second: continue" here
if has_first and has_second:
needs "if rev[first] == rev[second]: continue" here

an 'is' is enough, and better.
rev[first].extend(rev[second])
rev[second][:] = []
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
rev[second] = rev[first]
elif has_second:
rev[second].append(first)
rev[first] = rev[second]
else:
copy = [first, second]
res.append(copy)

My reaction to the "magic" by which res grows was "omigod that's the
most horrible thing I've seen for quite a while" but there was worse to
come :-)

what is magic about it? is it really that horrible?
List appending is O(1) only in the amortised sense. Extending is not
O(1) in any sense. Neither is the list comparison that is necessary for
robustness (using your data structure, that is).

You don't need to think. This problem has been extensively picked over
by folk who are a lot smarter than us, starting from 30 years ago.
Google for "disjoint set" and "union-find". One gets the impression
that the best possible algorithm would be slightly *worse* than O(n).

Of course! I'd forgotten clean about union-find. And yes, it's
O(n*log(n)) union and find, and my implementation is O(n**2) for
union, O(1) for find; I'm pleased that, in spite of having forgotten
about union-find, I "reinvented" (heh) something that is better than
the "naive" implementation we saw in class.

Now I'm even more worried about your dismissal of this as magic and
horrible.

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
Why don't you fix your little problem... and light this candle?
-- Alan Shepherd, the first man into space, Gemini program

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFoBIgPqu395ykGsRApUIAKCnNmctznqq1KTfZhi7Dl 8YpjPnbQCguo2E
z/Ss9rWC64h9vuDrJM//Ge8=
=rVAU
-----END PGP SIGNATURE-----

Jul 18 '05 #6
Hello John,

Try your python code on this example:
merge([[1,2], [3,4], [1,2], [5,3]])

The result given by your function is:
[[3, 4, 5]]

Sorry...

To Xah: next time you propose an exercise, write some UNIT TESTS!!!
Then people will be able to test if there answers are correct or not.

Cyril

On Fri, 18 Feb 2005 13:20:51 -0300, John Lenton <jo**@grulic.org.ar> wrote:
On Thu, Feb 17, 2005 at 03:46:20PM -0800, Xah Lee wrote:
here's another interesting algorithmic exercise, again from part of a
larger program in the previous series.

Here's the original Perl documentation:

=pod

merge(\$pairings) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut

in Python:

def merge(pairings):
rev = {}
res = []
for pairing in pairings:
first, second = pairing
has_first = first in rev
has_second = second in rev
if has_first and has_second:
rev[first].extend(rev[second])
rev[second][:] = []
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
rev[second] = rev[first]
elif has_second:
rev[second].append(first)
rev[first] = rev[second]
else:
copy = [first, second]
res.append(copy)
rev[first] = rev[second] = copy
return filter(None, res)

and in Perl:

sub merge(\$)
{
my %rev = ();
my @res = ();
my (\$pairing, \$first, \$second, \$has_first, \$has_second);
foreach \$pairing (@{\$_[0]})
{
(\$first, \$second) = @\$pairing;
\$has_first = exists \$rev{\$first};
\$has_second = exists \$rev{\$second};
if (\$has_first and \$has_second)
{
push @{\$rev{\$first}}, @{\$rev{\$second}};
@{\$rev{\$second}} = ();
\$rev{\$second} = \$rev{\$first};
}
elsif (exists \$rev{\$first})
{
push @{\$rev{\$first}}, \$second;
\$rev{\$second} = \$rev{\$first};
}
elsif (exists \$rev{\$second})
{
push @{\$rev{\$second}}, \$first;
\$rev{\$first} = \$rev{\$second};
}
else
{
my @copy = (\$first, \$second);
push @res, \@copy;
\$rev{\$first} = \$rev{\$second} = \@copy;
}
}
return [grep @\$_, @res];
}

although in Perl you wouldn't define it to take a reference to a list
(as you did), but rather a list, and you wouldn't define it to return
a reference to a list, but rather a list in list context (and possibly
a reference in scalar context).

Both versions are (IMHO) pretty clear (when the docstring/pod is
included), and O(n) because dict/hash lookup and list
appending/pushing is O(1) in both languages. Both versions can
probably be tweaked for speed quite a bit, but I don't *think* there's
a better-than-O(n) algorithm for this.

Note that the Python version assumes that the pairs' elements are
hashable; your example used numbers, so I thought it was pretty safe
assumption. The Perl version has no such restriction.

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
Noble cosa es, aún para un anciano, el aprender.
-- Sófocles. (497-406 a.C.) Filósofo griego.

--
http://mail.python.org/mailman/listinfo/python-list

Jul 18 '05 #7

John Lenton wrote:
On Fri, Feb 18, 2005 at 03:21:10PM -0800, John Machin wrote:
Not robust in the face of input like:
[[1,1]]
or
[[1,2], [1,2]]
or
[[1,2], [2,1]]
or
[[1,2], [2,3], [3,1]]

needs "if first == second: continue" here
if has_first and has_second:

needs "if rev[first] == rev[second]: continue" here

an 'is' is enough, and better.

Good point. You're redeeming yourself :-)
rev[first].extend(rev[second])
rev[second][:] = []
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
rev[second] = rev[first]
elif has_second:
rev[second].append(first)
rev[first] = rev[second]
else:
copy = [first, second]
res.append(copy)

My reaction to the "magic" by which res grows was "omigod that's the most horrible thing I've seen for quite a while" but there was worse to come :-)

what is magic about it? is it really that horrible?

Try explaining to the newbies over on the tutor list how despite "res"
only ever *explicitly* having little bits like [3, 4] appended to it,
it (or more properly the thing to which it refers) is actually
festering and growing and being mutated under the surface until at the
finale it bursts out dripping slime just like the creature from the
black lagoon ...

Jul 18 '05 #8
Xah Lee wrote:
merge(\$pairings) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

Not sure how efficient this is, but I decided to take advantage of the
operations provided by sets:

def merge(pairs):
pairs = set(tuple(sorted(p)) for p in pairings)
merged = []
# Each loop will result in a new, complete sublist in the result.
while pairs:
p = set(pairings.pop())
remove = set([])
for pair in pairs:
pairSet = set(pair)
if pairSet & p:
p |= pairSet
pairs -= remove
merged.append(list(p))
return merged

--
Brian Beck
Jul 18 '05 #9
Brian Beck wrote:
[code]

Whoops, that should say:

def merge(pairs):
pairs = set(tuple(sorted(p)) for p in pairs)
merged = []
# Each loop will result in a new, complete sublist in the result.
while pairs:
p = set(pairs.pop())
remove = set([])
for pair in pairs:
pairSet = set(pair)
if pairSet & p:
p |= pairSet
pairs -= remove
merged.append(list(p))
return merged

--
Brian Beck
Jul 18 '05 #10
Brian Beck wrote:
Brian Beck wrote:
> [code]

Ah heck, nevermind... it worked for my small test cases but the
algorithm is just wrong.

--
Brian Beck
Jul 18 '05 #11
"John Machin" <sj******@lexicon.net> wrote:
You don't need to think. This problem has been extensively picked over
by folk who are a lot smarter than us, starting from 30 years ago.
Google for "disjoint set" and "union-find". One gets the impression
that the best possible algorithm would be slightly *worse* than O(n).

It can be solved by union-find

import UnionFind
import sets

def merge(pairs):
uf = UnionFind.UnionFind()
items = sets.Set()
for a,b in pairs:
uf.union(a,b)
for a in items:

If you do all the unions before all the finds, as in this algorithm,
union-find is O(n).

However it might be easier to treat the input pairs as the edges of a
graph and apply a depth-first-search based graph connected components
algorithm.

--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
Jul 18 '05 #12
Well, it looks like David posted a good solution, but I haven't tested
it (I'm assuming his works fine). I fixed mine up anyway... it actually
works now. If you're not using 2.4 you'll have to import sets.Set as set.

def merge(pairList):
pairList = set(tuple(sorted(p)) for p in pairList)
# Sort & set to remove duplicates, tuple to make hashable
merged = []
removePairs = set([])

# Each loop will result in a new, complete sublist in the result
while pairList:
if removePairs:
removePairs = set([])
else:
subList = set(pairList.pop()) # Start a new sublist

for pair in pairList:
pairSet = set(pair)
# True when pairSet and subList share at least one element
if pairSet & subList:
subList |= pairSet # Merge pair with subList
removePairs.add(pair) # Mark pair for removal

if removePairs:
pairList -= removePairs
else:
merged.append(list(subList))

return merged

--
Brian Beck
Jul 18 '05 #13
On Fri, Feb 18, 2005 at 04:52:46PM -0800, John Machin wrote:
needs "if rev[first] == rev[second]: continue" here

an 'is' is enough, and better.

Good point. You're redeeming yourself :-)

this, together with you saying that it is hard to explain, makes me
think that you aren't comfortable thinking of lists as mutable
objects.

what is magic about it? is it really that horrible?

Try explaining to the newbies over on the tutor list how despite "res"
only ever *explicitly* having little bits like [3, 4] appended to it,
it (or more properly the thing to which it refers) is actually
festering and growing and being mutated under the surface until at the
finale it bursts out dripping slime just like the creature from the
black lagoon ...

understanding why that works, and why it is 'is' and not '==', are
both part of the same thing. Lists are mutable, and you can mutate
them, and they mutate. Unless you actually write code that uses the
fact you will forget it, and it will bite you. Of course, don't use it
just for the heck of it, but that creature you dismiss as a
slime-dripping mutation is actually quite useful.

While I'm at being unpolite, do you really think this code was harder
to understand than the code posted by anton, using numarray?

And, of course, if this code were for anything non-throw-awayable,
there would've been quite a bit of explaining going on between those
lines of code.

Ok, now back to being polite :)

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
Hay más días que longanizas.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFq8rgPqu395ykGsRAnTNAJ4rtCpfoFUYJYLpQ6WmvH bmTLSsYgCeJzRE
dXMUU9lYxyECNtld9JjcdeA=
=2pzJ
-----END PGP SIGNATURE-----

Jul 18 '05 #14
In article <ep****************************@news.service.uci.e du>,
David Eppstein <ep******@ics.uci.edu> wrote:
It can be solved by union-find
Here's a cleaned up version, after I added a proper iter() method to the
UnionFind data structure:

import UnionFind

def merge(pairs):
uf = UnionFind.UnionFind()
for a,b in pairs:
uf.union(a,b)
components = {}
for a in uf:
components.setdefault(uf[a],[]).append(a)
return components.values()
If you do all the unions before all the finds, as in this algorithm,
union-find is O(n).
That was a little too sloppy. It is possible to solve the union find
problem, with all unions before all finds, in time O(n). However the
usual union find data structure takes more time, O(n alpha(n)).
However it might be easier to treat the input pairs as the edges of a
graph and apply a depth-first-search based graph connected components
algorithm.

This would be O(n), though.

--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
Jul 18 '05 #15

John Lenton wrote:
On Fri, Feb 18, 2005 at 04:52:46PM -0800, John Machin wrote:
> needs "if rev[first] == rev[second]: continue" here

an 'is' is enough, and better.
Good point. You're redeeming yourself :-)

this, together with you saying that it is hard to explain, makes me
think that you aren't comfortable thinking of lists as mutable
objects.

How so? There is no connection between is/== and mutability. Let me
amplify: The point about 'is' is a good one, and aids your redemption
work.

what is magic about it? is it really that horrible?
Try explaining to the newbies over on the tutor list how despite "res" only ever *explicitly* having little bits like [3, 4] appended to it, it (or more properly the thing to which it refers) is actually
festering and growing and being mutated under the surface until at the finale it bursts out dripping slime just like the creature from the
black lagoon ...

understanding why that works, and why it is 'is' and not '==', are
both part of the same thing.

What same thing is that?
Lists are mutable, and you can mutate
them, and they mutate. Unless you actually write code that uses the
fact you will forget it, and it will bite you. Of course, don't use it just for the heck of it, but that creature you dismiss as a
slime-dripping mutation is actually quite useful.
You are confusing mutability of lists (without which they would be
called tuples!) with my point that the 'res' list was having its
contents fiddled with implicitly through other "pointers".

While I'm at being unpolite, do you really think this code was harder
to understand than the code posted by anton, using numarray?
I only read it as far as the bit where it creating a matrix of size
O(N**2) -- in my app N can be over a million so I lost interest
rapidly.

And, of course, if this code were for anything non-throw-awayable,
there would've been quite a bit of explaining going on between those
lines of code.
Not of course, but of necessity.
Ok, now back to being polite :)

Welcome back.

Jul 18 '05 #16
On Fri, Feb 18, 2005 at 09:57:59PM -0800, John Machin wrote:

this, together with you saying that it is hard to explain, makes me
think that you aren't comfortable thinking of lists as mutable
objects.
How so? There is no connection between is/== and mutability. Let me
amplify: The point about 'is' is a good one, and aids your redemption
work.

hey, it worked for all the test cases provided by the customer! :) and
then some; I hadn't thought of checking for cycles nor repetetitions.
[snip]

You are confusing mutability of lists (without which they would be
called tuples!) with my point that the 'res' list was having its
contents fiddled with implicitly through other "pointers".

umm... nope, see, well, hair-splitting and all that, there is this
list that holds the partitions; the partitions are lists of
elements. There is a reverse mapping that, for each element, holds the
partition that element is in. So basically you go down the list of
pairings, modifying the partitions as you go. I'm certain if I had
commented the function appropriately, we wouldn't be having this
discussion :)

Let me see if I can remedy that:

def merge(pairings):
"""
merge(pairings) takes a list of pairs, each pair indicates
the sameness of the two indexes. Returns a partitioned list
of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the
same. Therefore 1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not
specified.)
"""

# the list of partitions, or partitioned list.
# (each partition is a list of elements)
res = []

# reverse mapping (element -> partition it is in)
rev = {}
for pairing in pairings:
first, second = pairing
has_first = first in rev
has_second = second in rev
if has_first and has_second:
# both elements already in a partition...
if rev[first] is rev[second]:
# both elements in same partition, nothing to do
continue
# joining the two partitions:
# - grow one partition with the other one
rev[first].extend(rev[second])
# - clear the other one
rev[second][:] = []
# update reverse mapping
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
# update reverse mapping
rev[second] = rev[first]
elif has_second:
# ditto
rev[second].append(first)
rev[first] = rev[second]
else:
# create partition from scratch
if first == second:
new = [first]
else:
new = [first, second]
# add it to list of partitions
res.append(new)
# update reverse mapping
rev[first] = rev[second] = new
# remove empty partitions
return filter(None, res)

hrmph, I should hit the sack. Sorry if this is still ugly, I'm too
tired to tell.

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
Todo bicho que camina va a parar cuando se canse.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFulQgPqu395ykGsRAn9kAJ9bTmJYwo5L90AE3mUiQ3 0MINy0hwCZARCA
npF5v2EOF8fnMT8vS5AGed4=
=ELjA
-----END PGP SIGNATURE-----

Jul 18 '05 #17
here's the answer to the partition by equivalence exercise.

-------------------------------------------

# Perl code

sub merge(\$) {
my @pairings = @{\$_[0]};

my @interm; # array of hashs

# chop the first value of @pairings into @interm
\$interm[0]={\$pairings[0][0]=>'x'}; \${interm[0]}{\$pairings[0][1]}='x';
shift @pairings;

N1: for my \$aPair (@pairings) {
for my \$aGroup (@interm) {
if (exists \${\$aGroup}{\$aPair->[0]})
{\${\$aGroup}{\$aPair->[1]}='x'; next N1}
if (exists \${\$aGroup}{\$aPair->[1]})
{\${\$aGroup}{\$aPair->[0]}='x'; next N1}
}
push @interm, {\$aPair->[0]=>'x'}; \${interm[-1]}{\$aPair->[1]}='x';
}

my @fin = shift @interm;

N2: for my \$group (@interm) {
for my \$newcoup (@fin) {
foreach my \$k (keys %\$group) {
if (exists \${\$newcoup}{\$k}) {map { \${\$newcoup}{\$_}='x'}
(keys %\$group); next N2;}
}
}
push @fin, \$group;
}
return map {[keys (%\$_)]} @fin;
}

-----------------------------------------------
# Here's a direct translation of the Perl code above into python:
©def merge(pairings): # pairings is a list of couples. e.g.
[(9,2),(7,6),...]
© # interm is a list of groups. Each group is a list that hold
© # equivalent numbers. interm stands for interim result. Each
group
© # is a dictionary. Keys are numbers, values are all dummy
© # 'x'. Dictionary is used for ease of dealing with duplicates or
© # move first pair of pairings into interm as the first group
pairings[0]
© # go thru pairings. For each pair, check if it is in any group in
© # interm. If any part of pair is in a group, then add the other
© # part into that group. If no part of the pair is in any group,
© # then add this pair into interm as a new group.
break
break
interm[-1][aPair[1]]='x'
© # now make another pass of the groups in interm, because some
pair
© # that may connect two groups (i.e. with one element in one
group,
© # and second element in another group), yet the pair is simply
© # consumed by a group.
© # This pass will check if there are any element in any other
© # group, if so, such two groups will be unioned. In this pass, we
© # move things from interm into fin. fin==final.
© for kk in group.keys(): newcoup[kk]='x';
© # now turn the dictionaries into lists for return value
© for group in fin: result.append(group.keys())
-------------------
I wrote this (Perl) program in 2003-09, and now basically forgot all
about the internals. The original Perl code does not have inline
comments, nor public consumable documentation as this is my own
project. In the process of translation and the publication and
algorithm i used as i go thru the lines. I was thinking of a quick
brainless translation word-for-word, but that turned out not possible
as i run into problems.

(While i'm learning Python, i run into frustrations with the Python
Documentation. (although it has far more quality than Perl
documentations). The frustrations with documentations will be appended

The translation problem i run into is this. In Perl, there's a GOTO
construct where in a loop one can say "break XYZ" to jump to a
arbitrary outer loop labeled XYZ. Python has "break" but does not
provide a GOTO jump as in Perl. In the process, i have to actually
figure out (for the first time for me) how loops with GOTO jumps can be
translated to alternative structure. This turned out not to be too
hard. For a GOTO jump to a far outer group, one can use multiple breaks
at the end of each loop, possibly in addiction adding a "else"
clause to the different levels of the loops. (Python language can have
a "else" clause for "for" loops. It is executed when the loop
completes. (as opposed to when a break inside jumped out))

Here is a loop with GOTO, translated into Python without:

N1: for my \$aPair (@pairings) {
for my \$aGroup (@interm) {
if (exists \${\$aGroup}{\$aPair->[0]})
{\${\$aGroup}{\$aPair->[1]}='x'; next N1}
if (exists \${\$aGroup}{\$aPair->[1]})
{\${\$aGroup}{\$aPair->[0]}='x'; next N1}
}
push @interm, {\$aPair->[0]=>'x'}; \${interm[-1]}{\$aPair->[1]}='x';
}
break
break
interm[-1][aPair[1]]='x'

Below is another such trans-structure, distinct from the above.

N2: for my \$group (@interm) {
for my \$newcoup (@fin) {
foreach my \$k (keys %\$group) {
if (exists \${\$newcoup}{\$k}) {map { \${\$newcoup}{\$_}='x'}
(keys %\$group); next N2;}
}
}
push @fin, \$group;
}
© for kk in group.keys(): newcoup[kk]='x';

The Python version above has not been tested much, but i suppose it is
fine since it is basically a copy of the Perl code. The Perl version is
fairly reliable, as it went thru the gauntlet of special cases testing
and i've used it over the year in the larger program... however no any
formal proof or exhaustive machine testing has been done. Later i might
write some program to auto-test them... but that'd be another day. The
Python version can also use some clean up, or even rewrite as a program
in the Python language proper. Addenda or Errata will be added on this
page.

PS all together there are some 3 or so solutions posted on the
newsgroups. (one by private email) I will have to filter them out and
study them.

Any interesting or important Addenda or Errata will be emailed out
later.
In addition to being archived here:
http://xahlee.org/perl-python/partition_by_equiv.html

Xah
xa*@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

Jul 18 '05 #18
Xah Lee wrote:
here's the answer to the partition by equivalence exercise.

Your Python solution is, as expected, wrong (try with
[[10, 8], [7, 3], [1, 7], [5, 4], [2, 2], [3, 8], [7, 10], [2, 3], [6,
10], [3, 2]]
for example).

The solution by John Lenton is wrong, too.

The solution by Brian Beck delivers the correct result for most input,
but for some input lists like
[[3, 3], [8, 7], [3, 2], [8, 5], [5, 6], [6, 3], [10, 8], [8, 10], [4,
10], [10, 2]]
it chokes and returns the empty list.

My solution (which may not be the fastest or most effective, but till
now is the shortest <wink> and it works):

def merge(pairings):
sets = {}
for x1, x2 in pairings:
newset = (sets.get(x1, frozenset([x1]))
| sets.get(x2, frozenset([x2])))
for i in newset:
sets[i] = newset

return[list(aset) for aset in set(sets.itervalues())]
Reinhold
Jul 18 '05 #19
David Eppstein:
However it might be easier to treat the input pairs as the edges of a
graph and apply a depth-first-search based graph connected components
algorithm. || This would be O(n), though.

Is the DFS the best one here (instead of BFS)?

With the graph implementation that I've just shown here it becomes:

.. import graph
.. def merge(l):
.. g = graph.Graph()
.. return g.connectedComponents()
.. print merge( [ [1,2], [2,4], [5,6] ] )

I presume the complexity is O(n+a); n (the nodes) is proportional to
the number of pairs, and
a (the arcs) depends on the "intricacy" of the input pairs. For a
complete graph, this can
become slow (but there is a high probably that all the pairs are in the
same connected component).
Bye,
Bearophile

Jul 18 '05 #20
Bearophile:
I presume the complexity is O(n+a); n (the nodes)
is proportional to the number of pairs, and a
(the arcs) depends on the "intricacy" of the input pairs.

Opps... n (the number of nodes) is the number of different numbers
contained in the pairs :-]

Bearophile

Jul 18 '05 #21
be************@lycos.com wrote:
David Eppstein:
However it might be easier to treat the input pairs as the edges of a
graph and apply a depth-first-search based graph connected components
algorithm. || This would be O(n), though.
Is the DFS the best one here (instead of BFS)?

I'm not sure it makes much difference.
With the graph implementation that I've just shown here it becomes:

. import graph
. def merge(l):
. g = graph.Graph()
. return g.connectedComponents()
. print merge( [ [1,2], [2,4], [5,6] ] )

I presume the complexity is O(n+a); n (the nodes) is proportional to
the number of pairs, and
a (the arcs) depends on the "intricacy" of the input pairs. For a
complete graph, this can
become slow (but there is a high probably that all the pairs are in the
same connected component).

It's still linear in the input size, which is the best you could hope
for.

--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
Jul 18 '05 #22
The GOTO statement from Perl has been messed up.

This block:
© for kk in group.keys(): newcoup[kk]='x';

should be:

comlete code is at:
http://xahlee.org/perl-python/partition_by_equiv.html

Xah
xa*@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

Jul 18 '05 #23
Reinhold Birkenfeld wrote:
def merge(pairings):
sets = {}
for x1, x2 in pairings:
newset = (sets.get(x1, frozenset([x1]))
| sets.get(x2, frozenset([x2])))
for i in newset:
sets[i] = newset

return[list(aset) for aset in set(sets.itervalues())]

Looks good. I used it as inspiration for this new one, which takes less
time for large datasets, and especially for those where a good number of
merges are expected (at least that looks to be the pattern; with some
large datasets this takes less than a quarter of the time as the one
above). I'm sure it can be improved still -- yours is still faster in
many cases.

def merge2(pairings):
elements = {}
sets = []
for x1, x2 in pairings:
i = [elements.get(x1, -1), elements.get(x2, -1)]
i.sort()
if i[1] == -1:
i[1] = len(sets)
sets.append(set([x1, x2]))
elif i[0] == -1:
sets[i[1]] |= set([x1, x2])
elif i[0] == i[1]:
continue
else:
sets[i[1]] |= sets[i[0]]
sets[i[0]].clear()

for x in sets[i[1]]:
elements[x] = i[1]

return[list(s) for s in sets if s]

# Comparison
import profile
import random

# Play with these values
xMax, nPairs = 1000, 5000

l = [[random.randint(0, xMax), random.randint(0, xMax)] for i in
range(nPairs)]

print 'merge2:'
profile.run('merge2(l)') # Mine

print 'merge:'
profile.run('merge(l)') # Reinhold's

--
Brian Beck
Jul 18 '05 #24

Reinhold Birkenfeld wrote:
Xah Lee wrote:
here's the answer to the partition by equivalence exercise.
Your Python solution is, as expected, wrong (try with
[[10, 8], [7, 3], [1, 7], [5, 4], [2, 2], [3, 8], [7, 10], [2, 3],

[6, 10], [3, 2]]
for example).

The solution by John Lenton is wrong, too.

The solution by Brian Beck delivers the correct result for most input, but for some input lists like
[[3, 3], [8, 7], [3, 2], [8, 5], [5, 6], [6, 3], [10, 8], [8, 10], [4, 10], [10, 2]]
it chokes and returns the empty list.

My solution (which may not be the fastest or most effective, but till
now is the shortest <wink> and it works):

def merge(pairings):
sets = {}
for x1, x2 in pairings:
newset = (sets.get(x1, frozenset([x1]))
| sets.get(x2, frozenset([x2])))
for i in newset:
sets[i] = newset

return[list(aset) for aset in set(sets.itervalues())]
Reinhold

FWIW, here's a brief UAT report:

Appears to work: Reinhold, David, Xah (v2)
Has bug(s): John L (v*), Brian (v*)
Incomprehensible: Xah (v*)

'Come back after lunch' award goes to Xah v2, which at a glance appears
to be O(N**4) -- dictionary.update() inside 3-deep nest of 'for'
statements.

Here's a small input that busts all non-working versions:

input: [[1, 2], [3, 4], [2, 3], [4, 5]]
merge_RB -> [[1, 2, 3, 4, 5]]
merge_DE -> [[1, 2, 3, 4, 5]]
merge_JL -> [[1, 2, 3, 4], [5]]
merge_JL2 -> [[1, 2, 3, 4], [5]]
merge_BB -> []
merge_XL -> [[1, 2, 3, 4, 5], [3, 4, 5]]
merge_XL2 -> [[1, 2, 3, 4, 5]]

Jul 18 '05 #25
an interesting problem so developed now is to write a function that
generate test cases for the purpose of testing performance. (just for
fun)

the design of this function could be interesting. We want to be able to
give parameters in this function so as to spit out all possible screw
test cases. First of all, a range of n (some millions) numbers. Then, a
fraction that specifies the percentage of these number are equivalent.
1 being all equivalent. 0 being all "distinct" (having only one
equivalent member (since the input comes in pairs)). Then we want to
have a parameter that says something like the sizes of the equivalence
groups. Suppose 50% of number are equal among themselves (i.e. have
more than one equivalent member). 1 would be all in one group. 0 would
mean all partitions have size 3 or 2. (there are more to it here...
since this is a distribution) ... Then, we need to turn this range of
integers into pairs. That's something more to think about.

So with this function at hand, we'll be able to say for sure which code
performs better (and under what type of input)

the Order notion in computing mathematics is fairly useless for finer
details.

PS it is not trivial to design this pair generating function. I don't
know if the above is on the right track, but essentially we want to
categorize the type of inputs according to the mathematical operational
performance aspect of partition by equivalence, and distill them into
parameters.

another func to write is one that canonicalize the output. Such as
sorting. So that all results can be compared simply by = in Python.

failing to design a elaborate pair_gen, we can just have pairs of
random numbers. But exactly what nature is such input... is more to

(in my original application, each number represent a computer file,
there are up to tens of thousands of files, and much less than 0.1% is
same as another, and for files that are same, each equivalent group
number no more than 10 or so.)

Xah
xa*@xahlee.org
http://xahlee.org/PageTwo_dir/more.html
John Machin wrote:
FWIW, here's a brief UAT report:

Appears to work: Reinhold, David, Xah...
Has bug(s): ...

Jul 18 '05 #26
John Machin>FWIW, here's a brief UAT report:

Here is something else for you.
Note: for more correct comparisons, for the following tests I've
disabled the use of Psyco in Graph (usually enabled, if present).
This looks faster than merge2 for this (quite sparse) random pairs
test:

.. import graph # http://www.fantascienza.net/animalia/graph.zip
.. def merge4(l):
.. g = graph.Graph()
.. for n1,n2 in l:
.. return g.connectedComponents()

present in Graph, but addArcs2 isn't present):

.. def merge5(l):
.. g = graph.Graph()
.. return g.connectedComponents()
If you like to see the actual code, without using Graph, I've made a
very stripped down version of addArcs2 and connectedComponents:

.. from collections import deque
.. from itertools import chain
..
.. def merge6(l):
.. # graph creation
.. gi = {}
.. go = {}
.. for n1,n2 in l:
.. if n1 not in go:
.. go[n1] = {}
.. gi[n1] = {}
.. if n2 not in go:
.. go[n2] = {}
.. gi[n2] = {}
.. go[n1][n2] = 0
.. gi[n2][n1] = 0
.. # Connected components:
.. result = []
.. visited = set()
.. for root in go.iterkeys():
.. if root not in visited:
.. component = [root]
.. Q = deque() # A queue
.. Q.append(root)
.. while Q:
.. n1 = Q.popleft()
.. for n2 in chain( go[n1].iterkeys(), gi[n1].iterkeys()
):
.. if n2 not in visited:
.. Q.append(n2)
.. component.append(n2)
.. result.append(component)
.. return result
At the end of the tests I've added this to be sure the results are
always the same:

r2 = set( frozenset(e) for e in merge2(l) )
r4 = set( frozenset(e) for e in merge4(l) )
r5 = set( frozenset(e) for e in merge5(l) )
r6 = set( frozenset(e) for e in merge6(l) )
assert r2 == r4
assert r2 == r6

For xMax, nPairs = 1000, 5000:
merge2: 15750 function calls in 0.939 CPU seconds
merge4: 11029 function calls in 0.560 CPU seconds
merge5: 6030 function calls in 0.303 CPU seconds
merge6: 6011 function calls in 0.297 CPU seconds

For xMax, nPairs = 2000, 10000:
merge2: 31503 function calls in 2.505 CPU seconds
merge6: 12011 function calls in 0.639 CPU seconds

Timings using clock() (much more reliable than the profilers!), for
xMax, nPairs = 2000, 10000:
merge2: 1.222
merge6: 0.201

Probably merge6 can be improved, reducing memory used...

Bear hugs,
Bearophile

Jul 18 '05 #27
On 19 Feb 2005 17:56:27 -0800, be************@lycos.com wrote:
John Machin>FWIW, here's a brief UAT report:

Here is something else for you.
Note: for more correct comparisons, for the following tests I've
disabled the use of Psyco in Graph (usually enabled, if present).
This looks faster than merge2 for this (quite sparse) random pairs
test:
[snip]

Timings using clock() (much more reliable than the profilers!), for
xMax, nPairs = 2000, 10000:
merge2: 1.222
merge6: 0.201

Probably merge6 can be improved, reducing memory used...

Perhaps I'm missing a posting of yours -- what are merge2 and merge4?
What is "this random pairs test"? What is xMax (nPairs isn't hard to
guess), and how are you generating your input data?

FWIW, my offering is attached.

Jul 18 '05 #28
when i try to run the following program, Python complains about some
global name frozenset is not defined. Is set some new facility in
Python 2.4?

© for x1, x2 in pairings:
© return[list(aset) for aset in set(sets.itervalues())]

it would be nice if the two working programs do not use some package.
This problem shouldn't need to.

Xah
xa*@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

Jul 18 '05 #29

Xah Lee wrote:
when i try to run the following program, Python complains about some
global name frozenset is not defined. Is set some new facility in
Python 2.4?
http://www.python.org/doc/2.3/whatsnew/
http://www.python.org/doc/2.4/whatsnew/whatsnew24.html

You must be running 2.3. If you persist in running 2.3, put the
following at the top of the file:

import sys
PY_VERSION = sys.version_info[:2]
if PY_VERSION == (2,3):
from sets import Set as set
from sets import ImmutableSet as frozenset

That will get Reinhold's gadget working under 2.3. The bearophile's
function uses collections.deque which is built-in to 2.4.
it would be nice if the two working programs do not use some package.
This problem shouldn't need to.

Look at the newsgroup again. By my count there are apparently-working
versions from SIX people: Reinhold, bear.+, Brian Beck, David Eppstein,
yourself, and me. Only David's uses stuff that's not in the Python 2.4
off-the-shelf distribution.

Jul 18 '05 #30
John Machin:
Perhaps I'm missing a posting of yours -- what are merge2 and merge4? What is "this random pairs test"? What is xMax (nPairs isn't hard to
guess), and how are you generating your input data?<

merge2 and this random pairs test comes from the post by Brian Beck.
merge4 is the first in my last post. And merge3 isn't included (nor
tested) because it uses the original (slow) addArcs.
This is the simple test generator taken from his post:

xMax = 3000
nPairs = xMax*5
l = [[randint(0,xMax), randint(0,xMax)] for i in xrange(nPairs)]
For such high number of nPairs (arcs), the resulting graph usuall has
one big connected component, and few tiny debris... To produce a more
divided graph, nPairs have to be much smaller, like xMax*1.5.

FWIW, my offering is attached. <
I'm sorry, I don't see the attach... (just the different title).
John Machin:Only David's uses stuff that's not in the Python 2.4 off-the-shelf

distribution.<

Well, using already mede functions and classes is good, it avoids you
to reinvent things each time. Some of my versions too use my graph
library (for a problem like this I usually don't create stand alone
versions like merge6 and merge7). And my original point was adding one
graph data structure to the standard Python library :-]
I haven't tested the speed of David Eppstein version...

Anyway, this is a simplified version of merge6, that uses an indirect
graph g, instead of the directed graph, and avoids the use of chain:

.. from collections import deque
..
.. def merge7(l):
.. # Undirect graph g creation:
.. g = {}
.. for n1,n2 in l:
.. if n1 not in g: g[n1] = {n2:0}
.. else: g[n1][n2] = 0
.. if n2 not in g: g[n2] = {n1:0}
.. else: g[n2][n1] = 0
.. # Connected components:
.. result = []
.. visited = set()
.. for root in g.iterkeys():
.. if root not in visited:
.. component = [root]
.. Q = deque() # A queue
.. Q.append(root)
.. while Q:
.. n1 = Q.popleft()
.. for n2 in g[n1].iterkeys():
.. if n2 not in visited:
.. Q.append(n2)
.. component.append(n2)
.. result.append(component)
.. return result
It's a bit shorter and a little faster than merge6, memory used is
similar (there is only one dictionary, so there is only one
ammortization overhead of the dictionary, but it contains the sum of
both arcs, so the ammortization can be twice as big, so... memory used
can be the same).
(Usually BFS and DFS memory requirements are a little different; this
implementation uses BFS, and I think it uses a bit less memory than
DFS, but for this problem/test the difference isn't significative.)

Bear hugs,
Bearophile

Jul 18 '05 #31
> From: "Xah Lee" <xa*@xahlee.org>

The GOTO statement from Perl has been messed up.

hey, I didn't do it!

This block:

what do the funny little "Â©"s stand for?

Eric Pederson
http://www.songzilla.blogspot.com
:::::::::::::::::::::::::::::::::::
domainNot="@something.com"
domainIs=domainNot.replace("s","z")
ePrefix="".join([chr(ord(x)+1) for x in "do"])
mailMeAt=ePrefix+domainIs
:::::::::::::::::::::::::::::::::::

Jul 18 '05 #32
On 20 Feb 2005 03:31:33 -0800, be************@lycos.com wrote:
John Machin:

FWIW, my offering is attached. <

I'm sorry, I don't see the attach... (just the different title).

Here it is. I'll reply to the rest tomorrow. It's way past sleep-time
here.

!
!class _Stopper:
! pass
!
!def merge_JM(pairings):
! parent = {}
! size = {}
! # 1. x not in parent => x is unknown
! # 2. parent[x] -> root of cluster containing x
! # 3. As a consequence of (2), parent[root] is root
! # 4. link[x] chains the members of the cluster; 1st is root, 2nd
! # 5. link[last_member] is _stopper
! # 6. size[root] is the number of members of the cluster
! parentget = parent.get
! _stopper = _Stopper()
! for inxa, inxb in pairings:
! roota = parentget(inxa, _stopper)
! rootb = parentget(inxb, _stopper)
! if roota is not _stopper:
! if rootb is not _stopper:
! if roota is rootb:
! continue
! if size[rootb] > size[roota]:
! newroot = rootb
! joiner = roota
! else:
! newroot = roota
! joiner = rootb
! size[newroot] += size[joiner]
! del size[joiner]
! parent[joiner] = newroot
! tail = joiner
! while link[tail] is not _stopper:
! parent[tail] = newroot
! else:
! size[roota] += 1
! parent[inxb] = roota
! else:
! if rootb is not _stopper:
! size[rootb] += 1
! parent[inxa] = rootb
! elif inxa is inxb:
! parent[inxa] = inxa
! size[inxa] = 1
! else:
! parent[inxa] = inxa
! parent[inxb] = inxa
! size[inxa] = 2
! olist = []
! for root in size.iterkeys():
! family = [root]
! while cousin is not _stopper:
! family.append(cousin)
! olist.append(family)
! return olist
!
Jul 18 '05 #33
Reinhold Birkenfeld wrote:
My solution (which may not be the fastest or most effective, but till
now is the shortest <wink> and it works):

def merge(pairings):
sets = {}
for x1, x2 in pairings:
newset = (sets.get(x1, frozenset([x1]))
| sets.get(x2, frozenset([x2])))
for i in newset:
sets[i] = newset

return[list(aset) for aset in set(sets.itervalues())]

A recursive solution (around twice as fast as the above, though very
slow still...)

def merge_rb2(pairings):
def merge(sets):
res_sets = []
for tset in sets:
for item in tset:
for rset in res_sets:
if item in rset:
rset.update(item for item in tset)
break
else:
continue
break
else:
res_sets.append(set(tset))
if len(res_sets) == len(sets):
return res_sets
else:
return merge(res_sets)

return[list(s) for s in merge([set(pair) for pair in pairings])]

Another one:

def merge_rb3(pairings):
dic = {}
for x1, x2 in pairings:
dic.setdefault(x1, []).append(x2)
dic.setdefault(x2, []).append(x1)

def red(k, l):
l.append(k)
sub = dic[k]
for i in dic[k]:
if i not in l:
red(i, l)
del dic[k]

res = []
while len(dic):
rl = []
red(iter(dic).next(), rl)
res.append(rl)

return res

Reinhold
Jul 18 '05 #34

Reinhold Birkenfeld wrote:
Reinhold Birkenfeld wrote:
My solution (which may not be the fastest or most effective, but till now is the shortest <wink> and it works):

[snip RB]
A recursive solution (around twice as fast as the above, though very
slow still...)
[snip RB2]
Another one:

[snip RB3]

Dunno what data you are using for timing, but my tests suggest that RB
is fast enough, RB3 is slightly faster, but RB2 is a real dog and
appears to be quadratic [hint: it has that same for-for-for-update
signature found in phase 2 of Xah's effort]. Not only that, but it
seems to be somewhat irregular. Below are some results on trivial test
data:

prototype input: python -m timeit -s "import
merge;n=3000;inp=((x,x+1)for x in xrange(0,n,2))"
"merge.merge_RB3(inp)"
100000 loops, best of 3: 3.98 usec per loop

n=3000;RB2 64.9 msec per loop
n=3000;RB3 3.98 usec per loop
n=3000;RB 3.06 usec per loop
n=3000;JM3 2.73 usec per loop

n=1000;RB2 4.92 usec per loop
n=1250;RB2 5.34 usec per loop
n=1500;RB2 20.4 usec per loop
n=1750;RB2 22.1 msec per loop
n=2000;RB2 28.8 msec per loop
n=2500;RB2 44.9 msec per loop
n=3000;RB2 64.9 msec per loop

[background: Python 2.4, Win2000, 1.4GHz Athlon chip, 768Mb memory]

Here's a function to generate some serious timing data:
!def mktimedata(n,lev):
! res = []
! d = 1
! for k in range(lev):
! res.extend(zip(xrange(0, n, 2*d), xrange(d, n, 2*d)))
! d *= 2
! return res

Try that with n=3000 and lev=5 ...

Cheers,
John

Jul 18 '05 #35

Eric Pederson wrote:
From: "Xah Lee" <xa*@xahlee.org>

what do the funny little "©"s stand for?

....>>> import unicodedata as ucd; ucd.name('©'.decode('cp1252'))

Xah is asserting his right to be recognised as the author of his
artistic creations, line by line.

Jul 18 '05 #36
John Machin wrote:
Xah is asserting his right to be recognised as the author of his
artistic creations, line by line.

Not very well, however, since his usage doesn't constitute a valid

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
Wyrd has swept all my kin / all the brave chiefs away! / Now I must
Jul 18 '05 #37
John Machin wrote:
Reinhold Birkenfeld wrote:
Reinhold Birkenfeld wrote:
> My solution (which may not be the fastest or most effective, but till > now is the shortest <wink> and it works):

[snip RB]

A recursive solution (around twice as fast as the above, though very
slow still...)

[snip RB2]

Another one:

[snip RB3]

Dunno what data you are using for timing, but my tests suggest that RB
is fast enough, RB3 is slightly faster, but RB2 is a real dog and
appears to be quadratic [hint: it has that same for-for-for-update
signature found in phase 2 of Xah's effort]. Not only that, but it
seems to be somewhat irregular. Below are some results on trivial test
data:

[snip]

Yes, I don't know exactly how I timed this, and I just posted the
solutions to show that there are very different solutions possible. They
are surely not using the best algorithms, as bearophile's function showed.

Reinhold

Jul 18 '05 #38
"John Machin" <sj******@lexicon.net> wrote:
it would be nice if the two working programs do not use some package.
This problem shouldn't need to.

Look at the newsgroup again. By my count there are apparently-working
versions from SIX people: Reinhold, bear.+, Brian Beck, David Eppstein,
yourself, and me. Only David's uses stuff that's not in the Python 2.4
off-the-shelf distribution.

Where "stuff" means a single 62-line file that I linked to in my post.

--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
Jul 18 '05 #39
I started to collect i believe the 4 or so solutions by different
people... but seems it's gonna take some an hour or more... So far the
only other one i've run and find alright is Reinhold Birkenfeld's
original. Others worth noting i'm aware of is David Epsteinn, improved
versions from Reinhold Birkenfeld, the one using graphs by bearophile
....

since many of you have already collected and tested these... i wonder
if anyone'd be interested to put together all the (working) code in a
single message? (or even a webpage.)

thanks.

Xah
xa*@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

Jul 18 '05 #40
Please consider my submission also (Python 2.3-compatible).

-- Paul McGuire

.. import sets
..
.. input = [[1, 2], [3, 4], [2, 3], [4, 5]]
.. input = [[1, 2], [3, 4], [4, 5]]
.. input = [[1, 2],[2,1], [3, 4], [4, 5],[2,2],[2,3],[6,6]]
..
.. def merge(pairings):
.. ret = []
.. for a,b in pairings:
.. s1 = None
.. s2 = None
.. for s in ret:
.. if a in s or b in s:
.. if s1 is None:
.. s1 = s
.. else:
.. s2 = s
.. break
.. else:
.. for s in ret:
.. if a in s:
.. break
.. elif b in s:
.. break
.. else:
.. ret.append(sets.Set([a,b]))
.. continue
.. ret.remove(s1)
.. ret.remove(s2)
.. ret.append(s1.union(s2))
..
.. return[list(s) for s in ret]
..
.. print merge(input)

Jul 18 '05 #41
A slightly better version, only walks the set of cumulative list of
sets once per pairing.

-- Paul

.. import sets
..
.. input = [[1, 2], [3, 4], [2, 3], [4, 5]]
.. input = [[1, 2], [3, 4], [4, 5]]
.. input = [[1, 2],[2,1], [3, 4], [4, 5],[2,2],[2,3],[6,6]]
..
..def merge(pairings):
.. ret = []
.. for a,b in pairings:
.. aset = None
.. bset = None
.. for s in ret:
.. if not aset and a in s:
.. aset = s
.. if not bset and b in s:
.. bset = s
.. if aset and bset:
.. break
.. else:
.. if aset:
.. elif bset:
.. else:
.. ret.append(sets.Set([a,b]))
.. continue
.. if aset is not bset:
.. ret.remove(aset)
.. ret.remove(bset)
.. ret.append(aset.union(bset))
..
.. return[list(s) for s in ret]
..
..print merge(input)

Jul 18 '05 #42

This thread has been closed and replies have been disabled. Please start a new discussion.