473,883 Members | 1,604 Online

# [perl-python] exercise: partition a list by equivalence

here's another interesting algorithmic exercise, again from part of a
larger program in the previous series.

Here's the original Perl documentation:

=pod

merge(\$pairings ) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut

For those of you unfamiliar with math lingoes, partition a list means
to create sublists, based on some criteria. In our case, the input list
comes in the form of pairs, and its members are the union of all
members in the pairs. The criterion for partition in our case is that
of equivalence, specified in the pairs input.

Try to write a Python code for this. In the Python code, the input
should be a list of couples. (for Perlers, sketch out the algorithm on
paper and try to code in Python directly.)

I'll post Perl & Python code tomorrow.

==This is brought to you by the Perl-Python community.==
== http://xahlee.org/perl-python/python.html ==

Xah
xa*@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

Jul 18 '05 #1
41 3572
>For those of you unfamiliar with math lingoes, partition a list means
to create sublists, based on some criteria.

Typical moronic mathematicians with their exclusionary lingoes...why
can't they just say "create sublists based on some criteria" like
normal people?

- alex23

Jul 18 '05 #2
Xah Lee wrote:
here's another interesting algorithmic exercise, again from part of a
larger program in the previous series.

Here's the original Perl documentation:

=pod

merge(\$pairings ) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut

Almost a joke:

from numarray import *

def merge(*pairs):
flattened = reduce(tuple.__ add__, pairs, tuple())
m, M = min(flattened), max(flattened)

d = M - m + 1
matrix = zeros((d, d), type = Bool)

for x, y in pairs:
X, Y = x - m, y - m

matrix[X, X] = 1
matrix[X, Y] = 1
matrix[Y, X] = 1
matrix[Y, Y] = 1

while True:
next = greater(dot(mat rix, matrix), 0)
if alltrue(ravel(n ext == matrix)):
break
matrix = next

results = []
for i in range(d):
eqls, = nonzero(matrix[i])
if eqls.size():
if i == eqls[0]:
results.append( tuple(x + m for x in eqls))

return results

Disclaimer: I'm not an expert in numarray and suppose the something can
be dramatically imporved.
Jul 18 '05 #3
On Thu, Feb 17, 2005 at 03:46:20PM -0800, Xah Lee wrote:
here's another interesting algorithmic exercise, again from part of a
larger program in the previous series.

Here's the original Perl documentation:

=pod

merge(\$pairings ) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut

in Python:

def merge(pairings) :
rev = {}
res = []
for pairing in pairings:
first, second = pairing
has_first = first in rev
has_second = second in rev
if has_first and has_second:
rev[first].extend(rev[second])
rev[second][:] = []
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
rev[second] = rev[first]
elif has_second:
rev[second].append(first)
rev[first] = rev[second]
else:
copy = [first, second]
res.append(copy )
rev[first] = rev[second] = copy
return filter(None, res)

and in Perl:

sub merge(\$)
{
my %rev = ();
my @res = ();
my (\$pairing, \$first, \$second, \$has_first, \$has_second);
foreach \$pairing (@{\$_[0]})
{
(\$first, \$second) = @\$pairing;
\$has_first = exists \$rev{\$first};
\$has_second = exists \$rev{\$second};
if (\$has_first and \$has_second)
{
push @{\$rev{\$first}} , @{\$rev{\$second} };
@{\$rev{\$second} } = ();
\$rev{\$second} = \$rev{\$first};
}
elsif (exists \$rev{\$first})
{
push @{\$rev{\$first}} , \$second;
\$rev{\$second} = \$rev{\$first};
}
elsif (exists \$rev{\$second})
{
push @{\$rev{\$second} }, \$first;
\$rev{\$first} = \$rev{\$second};
}
else
{
my @copy = (\$first, \$second);
push @res, \@copy;
\$rev{\$first} = \$rev{\$second} = \@copy;
}
}
return [grep @\$_, @res];
}

although in Perl you wouldn't define it to take a reference to a list
(as you did), but rather a list, and you wouldn't define it to return
a reference to a list, but rather a list in list context (and possibly
a reference in scalar context).

Both versions are (IMHO) pretty clear (when the docstring/pod is
included), and O(n) because dict/hash lookup and list
appending/pushing is O(1) in both languages. Both versions can
probably be tweaked for speed quite a bit, but I don't *think* there's
a better-than-O(n) algorithm for this.

Note that the Python version assumes that the pairs' elements are
hashable; your example used numbers, so I thought it was pretty safe
assumption. The Perl version has no such restriction.
--
John Lenton (jo**@grulic.or g.ar) -- Random fortune:
Noble cosa es, aún para un anciano, el aprender.
-- Sófocles. (497-406 a.C.) Filósofo griego.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFhXjgPq u395ykGsRAus5AJ 9iCSbGsyP3QHZy/whP195GSVNIwwCc Dyqf
hIA+oWaLBHdSGUi 7t79Wfx8=
=BlC7
-----END PGP SIGNATURE-----

Jul 18 '05 #4

John Lenton wrote:
On Thu, Feb 17, 2005 at 03:46:20PM -0800, Xah Lee wrote:
here's another interesting algorithmic exercise, again from part of a larger program in the previous series.

Here's the original Perl documentation:

=pod

merge(\$pairings ) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut
in Python:

def merge(pairings) :
rev = {}
res = []
for pairing in pairings:
first, second = pairing
has_first = first in rev
has_second = second in rev

Not robust in the face of input like:
[[1,1]]
or
[[1,2], [1,2]]
or
[[1,2], [2,1]]
or
[[1,2], [2,3], [3,1]]

needs "if first == second: continue" here
if has_first and has_second:
needs "if rev[first] == rev[second]: continue" here
rev[first].extend(rev[second])
rev[second][:] = []
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
rev[second] = rev[first]
elif has_second:
rev[second].append(first)
rev[first] = rev[second]
else:
copy = [first, second]
res.append(copy )
My reaction to the "magic" by which res grows was "omigod that's the
most horrible thing I've seen for quite a while" but there was worse to
come :-)
rev[first] = rev[second] = copy
return filter(None, res)

and in Perl:
[snip]

Both versions are (IMHO) pretty clear (when the docstring/pod is
included), and O(n) because dict/hash lookup and list
appending/pushing is O(1) in both languages. Both versions can
probably be tweaked for speed quite a bit, but I don't *think* there's a better-than-O(n) algorithm for this.
List appending is O(1) only in the amortised sense. Extending is not
O(1) in any sense. Neither is the list comparison that is necessary for
robustness (using your data structure, that is).

You don't need to think. This problem has been extensively picked over
by folk who are a lot smarter than us, starting from 30 years ago.
Google for "disjoint set" and "union-find". One gets the impression
that the best possible algorithm would be slightly *worse* than O(n).

Note that the Python version assumes that the pairs' elements are
hashable; your example used numbers, so I thought it was pretty safe
assumption. The Perl version has no such restriction.

In the real world, the objects under comparison (images, fingerprints,
customer records, etc) may not themselves be hashable and comparison
for equality may be expensive but a unique identifier would be assigned
to each object and that ID would of course be cheaply hashable and
comparable.

Jul 18 '05 #5
On Fri, Feb 18, 2005 at 03:21:10PM -0800, John Machin wrote:
Not robust in the face of input like:
[[1,1]]
or
[[1,2], [1,2]]
or
[[1,2], [2,1]]
or
[[1,2], [2,3], [3,1]]

needs "if first == second: continue" here
if has_first and has_second:
needs "if rev[first] == rev[second]: continue" here

an 'is' is enough, and better.
rev[first].extend(rev[second])
rev[second][:] = []
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
rev[second] = rev[first]
elif has_second:
rev[second].append(first)
rev[first] = rev[second]
else:
copy = [first, second]
res.append(copy )

My reaction to the "magic" by which res grows was "omigod that's the
most horrible thing I've seen for quite a while" but there was worse to
come :-)

what is magic about it? is it really that horrible?
List appending is O(1) only in the amortised sense. Extending is not
O(1) in any sense. Neither is the list comparison that is necessary for
robustness (using your data structure, that is).

You don't need to think. This problem has been extensively picked over
by folk who are a lot smarter than us, starting from 30 years ago.
Google for "disjoint set" and "union-find". One gets the impression
that the best possible algorithm would be slightly *worse* than O(n).

Of course! I'd forgotten clean about union-find. And yes, it's
O(n*log(n)) union and find, and my implementation is O(n**2) for
union, O(1) for find; I'm pleased that, in spite of having forgotten
about union-find, I "reinvented " (heh) something that is better than
the "naive" implementation we saw in class.

Now I'm even more worried about your dismissal of this as magic and
horrible.

--
John Lenton (jo**@grulic.or g.ar) -- Random fortune:
Why don't you fix your little problem... and light this candle?
-- Alan Shepherd, the first man into space, Gemini program

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFoBIgPq u395ykGsRApUIAK CnNmctznqq1KTfZ hi7Dl8YpjPnbQCg uo2E
z/Ss9rWC64h9vuDrJ M//Ge8=
=rVAU
-----END PGP SIGNATURE-----

Jul 18 '05 #6
Hello John,

Try your python code on this example:
merge([[1,2], [3,4], [1,2], [5,3]])

The result given by your function is:
[[3, 4, 5]]

Sorry...

To Xah: next time you propose an exercise, write some UNIT TESTS!!!
Then people will be able to test if there answers are correct or not.

Cyril

On Fri, 18 Feb 2005 13:20:51 -0300, John Lenton <jo**@grulic.or g.ar> wrote:
On Thu, Feb 17, 2005 at 03:46:20PM -0800, Xah Lee wrote:
here's another interesting algorithmic exercise, again from part of a
larger program in the previous series.

Here's the original Perl documentation:

=pod

merge(\$pairings ) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut

in Python:

def merge(pairings) :
rev = {}
res = []
for pairing in pairings:
first, second = pairing
has_first = first in rev
has_second = second in rev
if has_first and has_second:
rev[first].extend(rev[second])
rev[second][:] = []
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
rev[second] = rev[first]
elif has_second:
rev[second].append(first)
rev[first] = rev[second]
else:
copy = [first, second]
res.append(copy )
rev[first] = rev[second] = copy
return filter(None, res)

and in Perl:

sub merge(\$)
{
my %rev = ();
my @res = ();
my (\$pairing, \$first, \$second, \$has_first, \$has_second);
foreach \$pairing (@{\$_[0]})
{
(\$first, \$second) = @\$pairing;
\$has_first = exists \$rev{\$first};
\$has_second = exists \$rev{\$second};
if (\$has_first and \$has_second)
{
push @{\$rev{\$first}} , @{\$rev{\$second} };
@{\$rev{\$second} } = ();
\$rev{\$second} = \$rev{\$first};
}
elsif (exists \$rev{\$first})
{
push @{\$rev{\$first}} , \$second;
\$rev{\$second} = \$rev{\$first};
}
elsif (exists \$rev{\$second})
{
push @{\$rev{\$second} }, \$first;
\$rev{\$first} = \$rev{\$second};
}
else
{
my @copy = (\$first, \$second);
push @res, \@copy;
\$rev{\$first} = \$rev{\$second} = \@copy;
}
}
return [grep @\$_, @res];
}

although in Perl you wouldn't define it to take a reference to a list
(as you did), but rather a list, and you wouldn't define it to return
a reference to a list, but rather a list in list context (and possibly
a reference in scalar context).

Both versions are (IMHO) pretty clear (when the docstring/pod is
included), and O(n) because dict/hash lookup and list
appending/pushing is O(1) in both languages. Both versions can
probably be tweaked for speed quite a bit, but I don't *think* there's
a better-than-O(n) algorithm for this.

Note that the Python version assumes that the pairs' elements are
hashable; your example used numbers, so I thought it was pretty safe
assumption. The Perl version has no such restriction.

--
John Lenton (jo**@grulic.or g.ar) -- Random fortune:
Noble cosa es, aún para un anciano, el aprender.
-- Sófocles. (497-406 a.C.) Filósofo griego.

--
http://mail.python.org/mailman/listinfo/python-list

Jul 18 '05 #7

John Lenton wrote:
On Fri, Feb 18, 2005 at 03:21:10PM -0800, John Machin wrote:
Not robust in the face of input like:
[[1,1]]
or
[[1,2], [1,2]]
or
[[1,2], [2,1]]
or
[[1,2], [2,3], [3,1]]

needs "if first == second: continue" here
if has_first and has_second:

needs "if rev[first] == rev[second]: continue" here

an 'is' is enough, and better.

Good point. You're redeeming yourself :-)
rev[first].extend(rev[second])
rev[second][:] = []
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
rev[second] = rev[first]
elif has_second:
rev[second].append(first)
rev[first] = rev[second]
else:
copy = [first, second]
res.append(copy )

My reaction to the "magic" by which res grows was "omigod that's the most horrible thing I've seen for quite a while" but there was worse to come :-)

what is magic about it? is it really that horrible?

Try explaining to the newbies over on the tutor list how despite "res"
only ever *explicitly* having little bits like [3, 4] appended to it,
it (or more properly the thing to which it refers) is actually
festering and growing and being mutated under the surface until at the
finale it bursts out dripping slime just like the creature from the
black lagoon ...

Jul 18 '05 #8
Xah Lee wrote:
merge(\$pairings ) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

Not sure how efficient this is, but I decided to take advantage of the
operations provided by sets:

def merge(pairs):
pairs = set(tuple(sorte d(p)) for p in pairings)
merged = []
# Each loop will result in a new, complete sublist in the result.
while pairs:
p = set(pairings.po p())
remove = set([])
for pair in pairs:
pairSet = set(pair)
if pairSet & p:
p |= pairSet
pairs -= remove
merged.append(l ist(p))
return merged

--
Brian Beck
Jul 18 '05 #9
Brian Beck wrote:
Brian Beck wrote:
> [code]

Ah heck, nevermind... it worked for my small test cases but the
algorithm is just wrong.

--
Brian Beck
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

### Similar topics

 4 8140 by: Mark Wilson CPU | last post by: This must be easy, but I'm missing something... I want to execute a Perl script, and capture ALL its output into a PHP variable. Here are my 2 files: ------------------------------------- test3.pl ------------------------------------- print "PERL Hello from Perl! (plain print)
\n"; print STDERR "PERL This is text sent to STDERR
\n"; \$output="PERL Some output: