By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
439,957 Members | 1,960 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 439,957 IT Pros & Developers. It's quick & easy.

Would there be support for a more general cmp/__cmp__

P: n/a
I was wondering how people would feel if the cmp function and
the __cmp__ method would be a bit more generalised.

The problem now is that the cmp protocol has no way to
indicate two objects are incomparable, they are not
equal but neither is one less or greater than the other.

So I thought that either cmp could return None in this
case or throw a specific exception. People writing a
__cmp__ method could do the same.

--
Antoon Pardon
Oct 20 '05 #1
Share this Question
Share on Google+
39 Replies


P: n/a
Antoon Pardon wrote:
The problem now is that the cmp protocol has no way to
indicate two objects are incomparable, they are not
equal but neither is one less or greater than the other.


If that is the case then you implement __eq__ and __ne__ to return
True/False and make cmp throw an exception. I don't see any need to extend
the cmp protocol.
Oct 20 '05 #2

P: n/a
Op 2005-10-20, Duncan Booth schreef <du**********@invalid.invalid>:
Antoon Pardon wrote:
The problem now is that the cmp protocol has no way to
indicate two objects are incomparable, they are not
equal but neither is one less or greater than the other.


If that is the case then you implement __eq__ and __ne__ to return
True/False and make cmp throw an exception. I don't see any need to extend
the cmp protocol.


That is not sufficient. There are domains where an order exists, but not
a total order. So for some couples of elements it is possible to say
one is less than the other, but not for all such couples.

Your solution wouldn't allow for such a case.

--
Antoon Pardon
Oct 20 '05 #3

P: n/a
Antoon Pardon wrote:
Op 2005-10-20, Duncan Booth schreef <du**********@invalid.invalid>:
Antoon Pardon wrote:
The problem now is that the cmp protocol has no way to
indicate two objects are incomparable, they are not
equal but neither is one less or greater than the other.


If that is the case then you implement __eq__ and __ne__ to return
True/False and make cmp throw an exception. I don't see any need to
extend the cmp protocol.


That is not sufficient. There are domains where an order exists, but
not a total order. So for some couples of elements it is possible to
say one is less than the other, but not for all such couples.

Your solution wouldn't allow for such a case.


I don't see why not. If an ordering exists then __cmp__ returns a result
otherwise it throws an exception. So long as you define __eq__ and __ne__,
__cmp__ is never called when checking for equality.
Oct 20 '05 #4

P: n/a
Antoon Pardon wrote:
Op 2005-10-20, Duncan Booth schreef <du**********@invalid.invalid>:
Antoon Pardon wrote:

The problem now is that the cmp protocol has no way to
indicate two objects are incomparable, they are not
equal but neither is one less or greater than the other.


If that is the case then you implement __eq__ and __ne__ to return
True/False and make cmp throw an exception. I don't see any need to extend
the cmp protocol.

That is not sufficient. There are domains where an order exists, but not
a total order. So for some couples of elements it is possible to say
one is less than the other, but not for all such couples.

Your solution wouldn't allow for such a case.


Wouldn't the full power of rich comparisons be enough? See section 3.3.1
in Python Reference Manual:

http://www.python.org/doc/2.4.2/ref/customization.html

See methods __lt__, __le__, __eq__, __ne__, __gt__, __ge__.

/MiO
Oct 20 '05 #5

P: n/a
Op 2005-10-20, Mikael Olofsson schreef <mi****@isy.liu.se>:
Antoon Pardon wrote:
Op 2005-10-20, Duncan Booth schreef <du**********@invalid.invalid>:
Antoon Pardon wrote:
The problem now is that the cmp protocol has no way to
indicate two objects are incomparable, they are not
equal but neither is one less or greater than the other.

If that is the case then you implement __eq__ and __ne__ to return
True/False and make cmp throw an exception. I don't see any need to extend
the cmp protocol.

That is not sufficient. There are domains where an order exists, but not
a total order. So for some couples of elements it is possible to say
one is less than the other, but not for all such couples.

Your solution wouldn't allow for such a case.


Wouldn't the full power of rich comparisons be enough? See section 3.3.1
in Python Reference Manual:

http://www.python.org/doc/2.4.2/ref/customization.html

See methods __lt__, __le__, __eq__, __ne__, __gt__, __ge__.


Sure, but it would also be rather cumbersome. It is not uncommon
that a general compare function which could give you one of
four results: one_less_two, one_equal_two, one_greater_two or
one_unequal_two, wouldn't differ much from each of those six
comparison methods. So you either write about six times the
same code or you have to do something like the following:

class partial_order:

def __compare(self, other):
...
return ...

def __lt__(self, other):
return self.__compare(other) == one_less_two

def __le__(self, other):
return self.__compare(other) in [one_less_two, one_equal_two]

def __eq__(self, other):
return self.__compare(other) == one_equal_two

def __ne__(self, other):
return self.__compare(other) == one_unequal_two

def __gt__(self, other):
return self.__compare(other) == one_greater_two

def __ge__(self, other):
return self.__compare(other) in [one_greater_two, one_equale_two]
And then you still wouldn't get a usefull result from:

delta = cmp(e1, e2)

--
Antoon Pardon
Oct 20 '05 #6

P: n/a
Antoon Pardon wrote:
I was wondering how people would feel if the cmp function and
the __cmp__ method would be a bit more generalised.

The problem now is that the cmp protocol has no way to
indicate two objects are incomparable, they are not
equal but neither is one less or greater than the other.

So I thought that either cmp could return None in this
case or throw a specific exception. People writing a
__cmp__ method could do the same.

The current behaviour is, of course, by design: """The operators <, >,
==, >=, <=, and != compare the values of two objects. The objects need
not have the same type. If both are numbers, they are converted to a
common type. Otherwise, objects of different types always compare
unequal, and are ordered consistently but arbitrarily."""

Personally I'm still not convinced that your requirement reflects a
substantial use case (but then I'm getting used to that ;-). Just
because an ordering is partial that doesn't mean that two instances of a
class shouldn't be compared.

What would you have Python do when the programmer tries to perform an
invalid comparison (i.e. what are the exact semantics imposed when
__cmp__() returns None/raises an exception)?

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Oct 20 '05 #7

P: n/a
Op 2005-10-20, Steve Holden schreef <st***@holdenweb.com>:
Antoon Pardon wrote:
I was wondering how people would feel if the cmp function and
the __cmp__ method would be a bit more generalised.

The problem now is that the cmp protocol has no way to
indicate two objects are incomparable, they are not
equal but neither is one less or greater than the other.

So I thought that either cmp could return None in this
case or throw a specific exception. People writing a
__cmp__ method could do the same.
The current behaviour is, of course, by design: """The operators <, >,
==, >=, <=, and != compare the values of two objects. The objects need
not have the same type. If both are numbers, they are converted to a
common type. Otherwise, objects of different types always compare
unequal, and are ordered consistently but arbitrarily."""

Personally I'm still not convinced that your requirement reflects a
substantial use case (but then I'm getting used to that ;-). Just
because an ordering is partial that doesn't mean that two instances of a
class shouldn't be compared.


I'm not suggesting that they shouldn't be compared.
What would you have Python do when the programmer tries to perform an
invalid comparison (i.e. what are the exact semantics imposed when
__cmp__() returns None/raises an exception)?


My appologies, I should have been more complete.

What I want is a way to say that all of the following are False:

a < b, a <= b, a == b, a > b, a >= b

and that only the following is True:

a != b
So if a coder writes one of the comparisons and as a result python
calls the __cmp__ method on one of the terms which would return
either None or raise an exceptions I would like it to reflect
the above behaviour.

--
Antoon Pardon
Oct 20 '05 #8

P: n/a
On Thursday 20 October 2005 11:53, Steve Holden wrote:
Personally I'm still not convinced that your requirement reflects a
substantial use case (but then I'm getting used to that ;-). Just
because an ordering is partial that doesn't mean that two instances of a
class shouldn't be compared.
C++ has a cmp template function which can be implemented to define a total
ordering for one type. This can be reliably used by implementations of
ordered tree containers (for example) so that this container template class
can only be instantiated for holding types that provide this comparison
template function.

One disadvantage is that these template container classes can only hold one
type.

ZODB's BTrees work in a similar way but use the regular python comparison
function, and the lack of a guarantee of a total ordering can be a liability.
Described here in 2002, but I think same is true today:
http://mail.zope.org/pipermail/zodb-...ry/002304.html

A BTree might contain two objects that are incomparable. That is, they raise
an exception when compared. However the user will not know this if by
coincidence the BTree implementation has never needed to compare these two
objects.

Now removing a third object from the container such that these two
incomparable objects are adjacent in the BTree. There is a possibility that
an exception might be raised when *reading* content from the BTree, since the
BTree implementation now might need to compare this pair of objects.
What would you have Python do when the programmer tries to perform an
invalid comparison (i.e. what are the exact semantics imposed when
__cmp__() returns None/raises an exception)?


Its too late to handle this by the time a specific comparison method of an
individual object is being called. If you need a total ordering across a
domain of objects then you need to involve some representation of that domain
as a whole.
--
Toby Dickenson
Oct 20 '05 #9

P: n/a
Antoon Pardon wrote:
Op 2005-10-20, Steve Holden schreef <st***@holdenweb.com>:
Antoon Pardon wrote:
I was wondering how people would feel if the cmp function and
the __cmp__ method would be a bit more generalised.

The problem now is that the cmp protocol has no way to
indicate two objects are incomparable, they are not
equal but neither is one less or greater than the other.

So I thought that either cmp could return None in this
case or throw a specific exception. People writing a
__cmp__ method could do the same.

The current behaviour is, of course, by design: """The operators <, >,
==, >=, <=, and != compare the values of two objects. The objects need
not have the same type. If both are numbers, they are converted to a
common type. Otherwise, objects of different types always compare
unequal, and are ordered consistently but arbitrarily."""

Personally I'm still not convinced that your requirement reflects a
substantial use case (but then I'm getting used to that ;-). Just
because an ordering is partial that doesn't mean that two instances of a
class shouldn't be compared.

I'm not suggesting that they shouldn't be compared.

Sorry, I thought that was the meaning of "The problem now is that the
cmp protocol has no way to indicate two objects are incomparable".
What would you have Python do when the programmer tries to perform an
invalid comparison (i.e. what are the exact semantics imposed when
__cmp__() returns None/raises an exception)?

My appologies, I should have been more complete.

What I want is a way to say that all of the following are False:

a < b, a <= b, a == b, a > b, a >= b

and that only the following is True:

a != b
So if a coder writes one of the comparisons and as a result python
calls the __cmp__ method on one of the terms which would return
either None or raise an exceptions I would like it to reflect
the above behaviour.

Ergo: use rich comparisons.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Oct 20 '05 #10

P: n/a
[Toby Dickenson]
...
ZODB's BTrees work in a similar way but use the regular python comparison
function, and the lack of a guarantee of a total ordering can be a liability.
Described here in 2002, but I think same is true today:
http://mail.zope.org/pipermail/zodb-...ry/002304.html


There's a long discussion of this in the ZODB Programming Guide,
subsection "5.3.1 Total Ordering and Persistence" on this page:

http://www.zope.org/Wikis/ZODB/Front...ide/node6.html

That talks about more than the referenced ZODB thread covered.

Persistence adds more twists, such as that the total ordering among
keys must remain the same across time (including across Python
releases; for example, how None compares to objects of other types has
changed across releases).
Oct 20 '05 #11

P: n/a
Op 2005-10-20, Steve Holden schreef <st***@holdenweb.com>:
Antoon Pardon wrote:

Ergo: use rich comparisons.


rich comparosons can only solve the problem partly.

Python does have the cmp function and that can give
totaly inconsistent results even when the order
defined by the rich comparison functions is consistent
but only partial.

I think it is a problem if cmp gives me the following
results.
a1 = foo(1)
a2 = foo(2)
b1 = foo(1)
b2 = foo(2)
cmp(a1,b1) 0 cmp(a2,b2) 0 cmp(a1,a2) 1 cmp(b1,a2) 1 cmp(a1,b2)

-1

It would be better if cmp would give an indication it
can't compare two objects instead of giving incorrect
and inconsistent results.

--
Antoon Pardon
Oct 21 '05 #12

P: n/a
Antoon Pardon wrote:
It would be better if cmp would give an indication it
can't compare two objects instead of giving incorrect
and inconsistent results.


If two objects aren't totally comparable, then using 'cmp' on them is
ill-defined to begin with. The Standard Thing To Do is throw an
exception; see the Highly Obscure Case of the Complex Numbers.
1 == 1j False1 != 1j True1 < 1j Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: cannot compare complex numbers using <, <=, >, >=cmp(1j,1j) 0cmp(1,1j) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: cannot compare complex numbers using <, <=, >, >=

So using the well-known case of complex numbers, the semantics are
already well-defined.
class Incomparable: .... def __cmp__(self,other):
.... raise TypeError("cannot compare Incomparables using <,
<=, >, >=")
.... def __eq__(self,other):
.... return self is other
.... def __neq__(self,other):
.... return self is not other a1 = Incomparable()
a2 = Incomparable()
a1 == a1 1 # I'm running on python 2.2.3 at the moment, so hence the 1. a2 == a2 1 a1 == a2 0 a1 < a2 Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 3, in __cmp__
TypeError: cannot compare Incomparables using <, <=, >, >= cmp(a1,a2)

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 3, in __cmp__
TypeError: cannot compare Incomparables using <, <=, >, >=
So your use-case is already well-defined, and rather simple. Make
__cmp__ raise an exception of your choice, and define rich comparators
only for the comparisons that are supported. If, as you say in another
post, "some" pairs in D cross D are comparable on an operator but not
all of them (and further that this graph is not transitive), then your
_ONLY_ choices, no matter your implementation, is to return some
possibly inconsistent result (a < b == 1, b < c == 1, a < c == 0) or
raise an exception for unapplicable comparisons.

This isn't a Python problem; this is a problem from formal mathematics.

Personally, I'd favor the "raise an exception" case, which is exactly
what will happen if you define rich comparisons and let cmp throw an
exception. Operators that assume comparable objects, like sort, also
almost always assume a total order -- inconsistent operations can give
weird results, while throwing an exception will at least (usually) give
some sort of error that can be caught.

Another poster mentioned the B-tree example, and that isn't solvable in
this case -- B-trees assume a total order, but by their nature aren't
explicit about checking for it; inserting a "partial plus exception"
order might result in tree failure at weird times. An inconsistent
order, however, is even worse -- it would corrupt the tree at the same
times.
Oct 21 '05 #13

P: n/a
Op 2005-10-21, Christopher Subich schreef <cs****************@spam.subich.block.com>:
Antoon Pardon wrote:
It would be better if cmp would give an indication it
can't compare two objects instead of giving incorrect
and inconsistent results.
If two objects aren't totally comparable, then using 'cmp' on them is
ill-defined to begin with.


Where is that documented?
The Standard Thing To Do is throw an
exception; see the Highly Obscure Case of the Complex Numbers.
1 == 1j False1 != 1j True1 < 1j Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: cannot compare complex numbers using <, <=, >, >=
I would say this is the wrong answer. The right answer should
be False IMO.

Especially in the following case do I think the TypeError is
the wrong answer:
3 + 0j < 2 + 0j Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: cannot compare complex numbers using <, <=, >, >=
Look at sets:
set([1]) <= set([2]) False set([2]) <= set([1]) False
cmp(1j,1j) 0cmp(1,1j) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: cannot compare complex numbers using <, <=, >, >=

So using the well-known case of complex numbers, the semantics are
already well-defined.


I think one can argue against this, but I'm willing to leave this
as it is for complex numbers. But I think the situation with sets
should be trated differently than currently.
class Incomparable: ... def __cmp__(self,other):
... raise TypeError("cannot compare Incomparables using <,
<=, >, >=")
... def __eq__(self,other):
... return self is other
... def __neq__(self,other):
... return self is not other a1 = Incomparable()
a2 = Incomparable()
a1 == a1 1 # I'm running on python 2.2.3 at the moment, so hence the 1. a2 == a2 1 a1 == a2 0 a1 < a2 Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 3, in __cmp__
TypeError: cannot compare Incomparables using <, <=, >, >= cmp(a1,a2) Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 3, in __cmp__
TypeError: cannot compare Incomparables using <, <=, >, >=
So your use-case is already well-defined, and rather simple. Make
__cmp__ raise an exception of your choice,


That is not well defined. It doesn't allow to distinghuish between
something went wrong in __cmp__ and and answer that indicates the
only usefull thing you can say is that they are unequal.

Besides I find the following result inconsistent:
set([1]) <= set([1,2]) True cmp(set([1]), set([1,2])) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: cannot compare sets using cmp()

The first set is smaller than the second so acoording to the
documentation, I should get a negative number as a result.
and define rich comparators
only for the comparisons that are supported. If, as you say in another
post, "some" pairs in D cross D are comparable on an operator but not
all of them (and further that this graph is not transitive),
I was talking about partial orders, partial orders are transitive
and python still doesn't handle them right.
then your
_ONLY_ choices, no matter your implementation, is to return some
possibly inconsistent result (a < b == 1, b < c == 1, a < c == 0) or
raise an exception for unapplicable comparisons.

This isn't a Python problem; this is a problem from formal mathematics.
I think it is python's problem if python gives the wrong results.

when a < b then cmp(a,b) should give a negative number as a result.
Python sometimes does not!

when not a < b then cmp(a,b) should not give a negative number as
a result. Python sometimes does!

I think those are python problems.
Personally, I'd favor the "raise an exception" case, which is exactly
what will happen if you define rich comparisons and let cmp throw an
exception.
But I don't want cmp to raise an exception when the two elements are
comparable.

That cmp(set([1]), set([2])) would raise an exception, I can understand,
although I would prefer it returned None, but why should
cmp(set([1]), set([1,2])) raise an exception. There is a perfectly
well defined answer for this.
Operators that assume comparable objects, like sort, also
almost always assume a total order -- inconsistent operations can give
weird results, while throwing an exception will at least (usually) give
some sort of error that can be caught.
No it wont. the sort method will happily sort a list of sets.
lst [set([1]), set([2])] lst.sort()


I agree that it would have been better if an exception had been in
raised in the sort method here.
Another poster mentioned the B-tree example, and that isn't solvable in
this case -- B-trees assume a total order, but by their nature aren't
explicit about checking for it; inserting a "partial plus exception"
order might result in tree failure at weird times. An inconsistent
order, however, is even worse -- it would corrupt the tree at the same
times.


Yes, and it is almost unavoidable now in python. There is no
documentation about handling these cases. So the most likely
scenario is that the person writing the partial ordered class
will use rich comparisons and leave it at that. In that case
using such a class in a tree will certainly result in a corrupt
tree. But even if __cmp__ raises an exceptions at the appropiate
time, that is no good if the implementation of the tree won't
be affected by it because it uses the comparasons operators instead
of the cmp function.

That is why I would suggest the following:

Define a new exception: UnequalException.

If two elements are just unequal whithout one being smaller
or greater than the other, this exception should be raised
by cmp and programmers can do so in __cmp__.

If __cmp__ is used to solve one of the comparisons, then
this exception should result in 'False' except for !=

If the rich comparisons are used to give a result for cmp(a,b)
then the following should happen.

test if a <= b

if so test a == b to decide between a negative number
and zero

if not test a > b to decide between a positive number
and raising an exception.

This would allow for people implementing tree's or sort
algorithms to detect when the data used is not appropiate
and throw an exception instead of giving meaningless results
or corrupt data structures.

--
Antoon Pardon
Oct 24 '05 #14

P: n/a
Antoon Pardon wrote:
Op 2005-10-21, Christopher Subich schreef <cs****************@spam.subich.block.com>:
Antoon Pardon wrote:
It would be better if cmp would give an indication it
can't compare two objects instead of giving incorrect
and inconsistent results.
If two objects aren't totally comparable, then using 'cmp' on them is
ill-defined to begin with.

Where is that documented?

The Standard Thing To Do is throw an
exception; see the Highly Obscure Case of the Complex Numbers.

>1 == 1j


False
>1 != 1j


True
>1 < 1j


Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: cannot compare complex numbers using <, <=, >, >=

I would say this is the wrong answer. The right answer should
be False IMO.

Well in that case it's time to declare Principia Mathematica obsolete.
The real numbers are a two-dimensional field, and you are proposing to
define an ordering for it.

This is a snail that really shoudn't be salted at all.
Especially in the following case do I think the TypeError is
the wrong answer:

3 + 0j < 2 + 0j
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: cannot compare complex numbers using <, <=, >, >=
Here you seem to be asking for special-case code to detect complex
numbers that are on the real line. Mathematically speaking this is an
infintesimal subset of the complex plane, so you are expecting a lot.
And presumably next you would argue that "since (3+0j) < (2+0j) returns
False so should (3+1j) < (2+2j)".

I think you should find something more productive to do with our time.
And mine ;-)
Look at sets:

set([1]) <= set([2])
False
set([2]) <= set([1])

False

Set orderingd are explicitly documented as being based on proper
subsetting. This is an abuse of the operators to make subset tests more
convenient rather than a definition of an ordering.

[...]
The rest of your post does highlight one or two inconsistencies, but I
don't frankly see yor proposed solutions making anything better.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Oct 24 '05 #15

P: n/a
Op 2005-10-24, Steve Holden schreef <st***@holdenweb.com>:
Antoon Pardon wrote:
>set([1]) <= set([2])


False
>set([2]) <= set([1])


False

Set orderingd are explicitly documented as being based on proper
subsetting. This is an abuse of the operators to make subset tests more
convenient rather than a definition of an ordering.


It *is* a definition of an ordering.

For something to be an ordering it has to be anti symmetric and transitive.

The subset relationships on sets conform to these conditions so it is a (partial)
ordering. Check your mathematic books, Why you would think this is abuse is beyond me.

--
Antoon Pardon
Oct 24 '05 #16

P: n/a
On Mon, 24 Oct 2005 09:23:58 +0000, Antoon Pardon wrote:
Op 2005-10-21, Christopher Subich schreef <cs****************@spam.subich.block.com>:
Antoon Pardon wrote:
It would be better if cmp would give an indication it
can't compare two objects instead of giving incorrect
and inconsistent results.
If two objects aren't totally comparable, then using 'cmp' on them is
ill-defined to begin with.


Where is that documented?


Surely that is so obvious, it doesn't need documenting? If comparisons
aren't defined between two things ("which is longer, water or fire?") then
comparing them is ill-defined.

On second thoughts... no, I don't believe it is so obvious. People are
used to thinking about comparisons in the context of numbers and strings,
and it won't hurt to remind them that not every pair of objects can be
compared.
The Standard Thing To Do is throw an
exception; see the Highly Obscure Case of the Complex Numbers.
>>>1 == 1j

False
>>>1 != 1j

True
>>>1 < 1j

Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: cannot compare complex numbers using <, <=, >, >=


I would say this is the wrong answer. The right answer should
be False IMO.


I'm afraid that mathematically you are wrong. Greater and less than is not
defined for complex numbers.

I think I see where you are coming from: you are reading "1 < 1j" as "is
one less than one*j", the answer to which is "no". That's a sensible
answer to a sensible question, except for two problems:

(1) you will upset the mathematical purists, who argue that the English
sentence is not quite the same as the mathematical relation; and

(2) you will upset the programmers, since your suggestion would create a
whole class of gotchas and bugs, where people wrongly assume that since
1<1j is False, 1>1j should be True.

This is one case where practicality *is* purity.

Especially in the following case do I think the TypeError is the wrong
answer:
3 + 0j < 2 + 0j Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: cannot compare complex numbers using <, <=, >, >=
That's an interesting case. A mathematical purist might argue that
ordering is not defined for complex numbers, even if the imaginary
component is zero. A *different* purist might argue:

3 = 3+0j
2 = 2+0j

then since 3 > 2 it should also be true that 3+0j > 2+0j.

I can see merit in both arguments, but I'm inclined to go with the second
argument, if and only if the imaginary component is exactly zero. I take
comfort in the fact that mathematicians allow continued fractions with
zero terms even though dividing by zero is verboten.

Look at sets:
set([1]) <= set([2]) False set([2]) <= set([1]) False
No, sorry, you are making a mistake. In the context of sets, the <
operator doesn't mean "less than", it means "is a subset of". It is
correct to say neither "set([1]) is a subset of set([2])" nor vice versa
is true. Both are false.

By analogy, one can ask, "is the cat inside the box?" and get the answer
"No", but this does not imply that therefore the box must be inside the
cat.
[snip]
So your use-case is already well-defined, and rather simple. Make
__cmp__ raise an exception of your choice,


That is not well defined. It doesn't allow to distinghuish between
something went wrong in __cmp__ and and answer that indicates the only
usefull thing you can say is that they are unequal.


Agreed there: there should be a distinct exception for trying to compare
incomparable objects, rather than ValueError or TypeError.

Besides I find the following result inconsistent:
set([1]) <= set([1,2]) True cmp(set([1]), set([1,2])) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: cannot compare sets using cmp()

The first set is smaller than the second so acoording to the
documentation, I should get a negative number as a result.


No, the first expression tests whether set([1]) is a subset of set([1,2]),
not whether it is "smaller". Smaller and bigger aren't defined for sets.
Informally, we can say that one set is smaller than another if it has
fewer members, but that's only one possible definition of "smaller" out of
many.

Think, for example, of the set of all US military forces. That has one
member. Compare it to the set of all US presidents whose surname is or was
Bush. That set has two members. Do you think it is sensible definition of
"smaller" to say that the set of US military forces is smaller than the
set of Presidents called Bush?
[snip]
then your
_ONLY_ choices, no matter your implementation, is to return some
possibly inconsistent result (a < b == 1, b < c == 1, a < c == 0) or
raise an exception for unapplicable comparisons.

This isn't a Python problem; this is a problem from formal mathematics.


I think it is python's problem if python gives the wrong results.

when a < b then cmp(a,b) should give a negative number as a result.
Python sometimes does not!

when not a < b then cmp(a,b) should not give a negative number as a
result. Python sometimes does!

I think those are python problems.


I think you need to give examples of where < or > *defined as comparisons*
returns the wrong results. The examples from sets do not count, because
the operators aren't being used for comparisons.

Personally, I'd favor the "raise an exception" case, which is exactly
what will happen if you define rich comparisons and let cmp throw an
exception.


But I don't want cmp to raise an exception when the two elements are
comparable.

That cmp(set([1]), set([2])) would raise an exception, I can understand,
although I would prefer it returned None, but why should cmp(set([1]),
set([1,2])) raise an exception. There is a perfectly well defined answer
for this.


Alas, there is not. First you have to define what "less than" and "greater
than" means for sets, and there is no well-defined answer to that. They
certainly don't mean "is a subset" and "is a superset". Nor is it sensible
to define them in terms of the number of elements:

def lt(set1, set2):
"""Defines less than for sets in terms of the number of
elements."""
return len(set1) < len(set2)

def gt(set1, set2):
"""Defines greater than for sets in terms of the number of
elements."""
return len(set1) > len(set2)

py> lt( set([1]), set([1, 2]) )
True
py> gt( set([1]), set([1, 2]) )
False

So far so good. But now:

py> lt( set([1]), set([2]) )
False
py> gt( set([1]), set([2]) )
False
py> set([1]) == set([2])
False

So set([1]) is neither less than, nor greater than, nor equal to set([2]).

Operators that assume comparable objects, like sort, also almost always
assume a total order -- inconsistent operations can give weird results,
while throwing an exception will at least (usually) give some sort of
error that can be caught.


No it wont. the sort method will happily sort a list of sets.
lst [set([1]), set([2])] lst.sort()


That's an accident of implementation. Sorting nows uses only <, rather
than __cmp__, and since < (subset) for sets is defined, sort blindly goes
and sorts the list.

However, look at this:

py> from sets import Set as set
py> L1 = [set([1]), set([2]), set([3])]
py> L2 = [set([2]), set([3]), set([1])]

Notice that L1 and L2 contain the same elements, just in different orders.
Sorting those two lists should give the same order, correct?

py> L1.sort()
py> L2.sort()
py> L1
[Set([1]), Set([2]), Set([3])]
py> L2
[Set([2]), Set([3]), Set([1])]

Should, but doesn't. Oops. That's a bug.

Personally, I argue that sorting is something that you do to lists, and
that all lists should be sortable regardless of whether the objects within
them have a total order or not. If not, the list should impose a sensible
ordering on them, such as (perhaps) lexicographical order ("dictionary
order").

Look at it this way: books do not have a total ordering. What does it mean
to say that Tolstoy's "War and Peace" is less than Tom Clancy's "The Hunt
For Red October"? Are we comparing by title? Author? Contents? Cost of the
book, dimensions of the book, number of pages, some subjective idea of
quality, average word length, maximum sentence length, publication date,
number of protagonists, or what?

And yet libraries somehow find no difficulty in sorting shelves of
hundreds or thousands of books.

So in my opinion, sorting is a red-herring. In an ideal world (Python 3
perhaps?) sorting will be divorced from comparisons, or at least seeing
other people. But that still leaves the problem of comparisons.

--
Steven.

Oct 24 '05 #17

P: n/a
Op 2005-10-24, Steven D'Aprano schreef <st***@REMOVETHIScyber.com.au>:
On Mon, 24 Oct 2005 09:23:58 +0000, Antoon Pardon wrote:
Op 2005-10-21, Christopher Subich schreef <cs****************@spam.subich.block.com>:
Antoon Pardon wrote:
It would be better if cmp would give an indication it
can't compare two objects instead of giving incorrect
and inconsistent results.

If two objects aren't totally comparable, then using 'cmp' on them is
ill-defined to begin with.
Where is that documented?


Surely that is so obvious, it doesn't need documenting? If comparisons
aren't defined between two things ("which is longer, water or fire?") then
comparing them is ill-defined.

On second thoughts... no, I don't believe it is so obvious. People are
used to thinking about comparisons in the context of numbers and strings,
and it won't hurt to remind them that not every pair of objects can be
compared.


I also think there is the problem that people aren't used to partial
ordering. There is an ordering over sets, it is just not a total
ordering. But that some pairs are uncomparable (meaning that neither
one is smaller or greater) doesn't imply that comparing them is
ill defined.
The Standard Thing To Do is throw an
exception; see the Highly Obscure Case of the Complex Numbers.

>>>1 == 1j
False
>>>1 != 1j
True
>>>1 < 1j
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: cannot compare complex numbers using <, <=, >, >=


I would say this is the wrong answer. The right answer should
be False IMO.


I'm afraid that mathematically you are wrong. Greater and less than is not
defined for complex numbers.

I think I see where you are coming from: you are reading "1 < 1j" as "is
one less than one*j", the answer to which is "no". That's a sensible
answer to a sensible question, except for two problems:

(1) you will upset the mathematical purists, who argue that the English
sentence is not quite the same as the mathematical relation; and

(2) you will upset the programmers, since your suggestion would create a
whole class of gotchas and bugs, where people wrongly assume that since
1<1j is False, 1>1j should be True.

This is one case where practicality *is* purity.


Well it is a wrong assumption is general. There is nothing impure about
partial ordering.
Look at sets:
> set([1]) <= set([2])

False
> set([2]) <= set([1])

False


No, sorry, you are making a mistake. In the context of sets, the <
operator doesn't mean "less than", it means "is a subset of". It is
correct to say neither "set([1]) is a subset of set([2])" nor vice versa
is true. Both are false.


That is IMO irrelevant. The subset relationship is an ordering and as
such has all characteristics of other orderings like "less than",
except that the subset relationship is partial and the less than
relationship is total. How it is called "subset" vs "less than" is
IMO irrelevant. It is about mathematical characteristics.
By analogy, one can ask, "is the cat inside the box?" and get the answer
"No", but this does not imply that therefore the box must be inside the
cat.
Bad analogy, this doesn't define a mathematical ordering, the subset
relationship does.
> set([1]) <= set([1,2])

True
> cmp(set([1]), set([1,2]))

Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: cannot compare sets using cmp()

The first set is smaller than the second so acoording to the
documentation, I should get a negative number as a result.


No, the first expression tests whether set([1]) is a subset of set([1,2]),
not whether it is "smaller".


This is quibling with words. Both test whether one comes before the
other is a certain order.
Smaller and bigger aren't defined for sets.
It is not about smaller or bigger, it is about the general concept
of (mathematical) order. Whether you call this order, subset or
less than is not important.
then your
_ONLY_ choices, no matter your implementation, is to return some
possibly inconsistent result (a < b == 1, b < c == 1, a < c == 0) or
raise an exception for unapplicable comparisons.

This isn't a Python problem; this is a problem from formal mathematics.


I think it is python's problem if python gives the wrong results.

when a < b then cmp(a,b) should give a negative number as a result.
Python sometimes does not!

when not a < b then cmp(a,b) should not give a negative number as a
result. Python sometimes does!

I think those are python problems.


I think you need to give examples of where < or > *defined as comparisons*
returns the wrong results. The examples from sets do not count, because
the operators aren't being used for comparisons.


IMO they count, because the subset relationship is a mathematical order
relation just as less than is one.
Personally, I'd favor the "raise an exception" case, which is exactly
what will happen if you define rich comparisons and let cmp throw an
exception.


But I don't want cmp to raise an exception when the two elements are
comparable.

That cmp(set([1]), set([2])) would raise an exception, I can understand,
although I would prefer it returned None, but why should cmp(set([1]),
set([1,2])) raise an exception. There is a perfectly well defined answer
for this.


Alas, there is not. First you have to define what "less than" and "greater
than" means for sets, and there is no well-defined answer to that. They
certainly don't mean "is a subset" and "is a superset". Nor is it sensible
to define them in terms of the number of elements:


Look I have to define this for whatever class of objects I want to use
anyway. On what grounds do you argue that using partial orderings
shouldn't work.

Wether you want to call being a subset "less than" or not is not the
issue. If I need it for my program I can certainly define a set class
where "less than" will be like the subset relationship or I can define
other classes where an ordering exists but not a total one.

IMO there is nothing wrong with using the order symbols python already
has, to use ordering on other objects..
def lt(set1, set2):
"""Defines less than for sets in terms of the number of
elements."""
return len(set1) < len(set2)

def gt(set1, set2):
"""Defines greater than for sets in terms of the number of
elements."""
return len(set1) > len(set2)

py> lt( set([1]), set([1, 2]) )
True
py> gt( set([1]), set([1, 2]) )
False

So far so good. But now:

py> lt( set([1]), set([2]) )
False
py> gt( set([1]), set([2]) )
False
py> set([1]) == set([2])
False

So set([1]) is neither less than, nor greater than, nor equal to set([2]).
So? It seems you never have encounted a partial ordering before. What
you have illustrated here is not special among (partial) orderings.
And the world is full of partial orderings. I think it is wrong to
shoehorn everything in a total ordering and that is why I think
python should give programmers a way in handling partial orderings.
Operators that assume comparable objects, like sort, also almost always
assume a total order -- inconsistent operations can give weird results,
while throwing an exception will at least (usually) give some sort of
error that can be caught.


No it wont. the sort method will happily sort a list of sets.
> lst

[set([1]), set([2])]
> lst.sort()


That's an accident of implementation. Sorting nows uses only <, rather
than __cmp__, and since < (subset) for sets is defined, sort blindly goes
and sorts the list.

However, look at this:

py> from sets import Set as set
py> L1 = [set([1]), set([2]), set([3])]
py> L2 = [set([2]), set([3]), set([1])]

Notice that L1 and L2 contain the same elements, just in different orders.
Sorting those two lists should give the same order, correct?


No. Since we don't have a total ordering, sorting doesn't has to give
the same result. For as far as sorting is defined on this kind of
object, the only thing I would expect is that after a sort the
following condition holds.

for all i,j if i < j then not L[i] > L[j]
py> L1.sort()
py> L2.sort()
py> L1
[Set([1]), Set([2]), Set([3])]
py> L2
[Set([2]), Set([3]), Set([1])]

Should, but doesn't. Oops. That's a bug.

Personally, I argue that sorting is something that you do to lists, and
that all lists should be sortable regardless of whether the objects within
them have a total order or not. If not, the list should impose a sensible
ordering on them, such as (perhaps) lexicographical order ("dictionary
order").

Look at it this way: books do not have a total ordering. What does it mean
to say that Tolstoy's "War and Peace" is less than Tom Clancy's "The Hunt
For Red October"? Are we comparing by title? Author? Contents? Cost of the
book, dimensions of the book, number of pages, some subjective idea of
quality, average word length, maximum sentence length, publication date,
number of protagonists, or what?

And yet libraries somehow find no difficulty in sorting shelves of
hundreds or thousands of books.


But will all libraries sort the shelves in the same way. As far as I
understand a book gets somekind of code in a library and books are
sorted according to these codes. Now everybody sorts these codes the
same way, but the attibution of the code may differ from library to
library.

--
Antoon Pardon
Oct 24 '05 #18

P: n/a
On Monday 24 October 2005 04:33, Steve Holden wrote:
Well in that case it's time to declare Principia Mathematica obsolete.
The real numbers are a two-dimensional field, and you are proposing to
define an ordering for it.


I wasn't a math major, but don't you mean "the complex numbers are a two
dimensional field"? If you mean real numbers, please do explain.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Oct 24 '05 #19

P: n/a
James Stroud wrote:
On Monday 24 October 2005 04:33, Steve Holden wrote:
Well in that case it's time to declare Principia Mathematica obsolete.
The real numbers are a two-dimensional field, and you are proposing to
define an ordering for it.

I wasn't a math major, but don't you mean "the complex numbers are a two
dimensional field"?

Correct.
If you mean real numbers, please do explain.


regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Oct 24 '05 #20

P: n/a
Antoon Pardon wrote:
It *is* a definition of an ordering.

For something to be an ordering it has to be anti symmetric and transitive.

The subset relationships on sets conform to these conditions so it is a (partial)
ordering. Check your mathematic books, Why you would think this is abuse is beyond me


Which is exactly why a < b on sets returns True xor False, but cmp(a,b)
throws an exception.

a <COMPARE> b is a local comparison, asking only for the relationship
between two elements. In some bases, like the complex numbers, some
comparisons are ill-defined.; in others, like sets, they're well-defined
but don't give a total ordering.

cmp(a,b) asks for their relative rankings in some total ordering. For a
space that does not have a total ordering, cmp(a,b) is meaningless at
best and dangerous at worst. It /should/ throw an exception when the
results of cmp aren't well-defined, consistent, antisymmetric, and
transitive.
Oct 25 '05 #21

P: n/a
Antoon Pardon wrote:
I also think there is the problem that people aren't used to partial
ordering. There is an ordering over sets, it is just not a total
ordering. But that some pairs are uncomparable (meaning that neither
one is smaller or greater) doesn't imply that comparing them is
ill defined.
It depends on your definition of "comparison." Admittedly, <, =, !=,
and > can be defined for a partial ordering, but classical mathematics,
as understood by most people (even programmers), assumes that unless a
== b, a > b or a < b.

The comparisons, as defined this way, are done on a totally ordered set,
yes. But since most comparisons ever done -are- done on a totally
ordered set (namely the integers or reals), it's entirely fair to say
that "a programmer's expectation" is that comparisons should work more
or less like a totally ordered list.
With that caevat in mind, comparison on a partially ordered domain /is/
ill-defined; it can give inconsistent results from the "a < b or a > b
or a == b" rule.

Well it is a wrong assumption is general. There is nothing impure about
partial ordering.
Impure? Probably not. Useless from many perspective when "comparisons"
are needed, to the point where it's probably safer to throw an exception
than define results.
That is IMO irrelevant. The subset relationship is an ordering and as
such has all characteristics of other orderings like "less than",
except that the subset relationship is partial and the less than
relationship is total. How it is called "subset" vs "less than" is
IMO irrelevant. It is about mathematical characteristics.
Still accident. < wouldn't be used for sets if we had a subset symbol
on the standard keyboard, APL fans notwithstanding. Simce programmers
familiar with Python understand that < on sets isn't a "real" comparison
(i.e. provide a total order), they don't expect unique, consistent
results from something like sort (or a tree, for that matter).

By analogy, one can ask, "is the cat inside the box?" and get the answer
"No", but this does not imply that therefore the box must be inside the
cat.

Bad analogy, this doesn't define a mathematical ordering, the subset
relationship does.


Yes, it does. Consider "in" as a mathematical operator:

For the set (box, cat-in-box)

box in box: False
box in cat-in-box: False
cat-in-box in box: True
cat-in-box in cat-in-box: False

For the set (box, smart-cat) # cat's too smart to get in the box

box in box: False
box in smart-cat: False
smart-cat in box: False
smart-cat in smart-cat: False

In both these cases, the "in" operator is irreflexive, asymmetric, and
transitive (extend to mouse-in-cat if you really care about transitive),
so "in" is a partial order sans equality. A useless one, but a partial
order nonetheless.

Notice that L1 and L2 contain the same elements, just in different orders.
Sorting those two lists should give the same order, correct?

No. Since we don't have a total ordering, sorting doesn't has to give
the same result. For as far as sorting is defined on this kind of
object, the only thing I would expect is that after a sort the
following condition holds.

for all i,j if i < j then not L[i] > L[j]


Highly formal, aren't you? Again, common programming allows for "not a b" == "a <= b", so your condition becomes the more familiar:
for all i,j in len(list), if i < j then L[i] <= L[j]

This condition is also assumed implicitly by many other assumed "rules"
of sorted lists, namely their uniqueness:

"If L is a list of unique keys, then sort(L) is a unique list of keys in
sorted order."

Yes, this assumes that L has a total order. Big whoop -- this is a
matter of programming practiciality rather than mathematical purity.
Personally, I argue that sorting is something that you do to lists, and
that all lists should be sortable regardless of whether the objects within
them have a total order or not. If not, the list should impose a sensible
ordering on them, such as (perhaps) lexicographical order ("dictionary
order").


To reply to the grandparent poster here: that's not always easy to do.
A "sensible order" isn't always easy to define on a set where a partial
order exists, if it includes that order already. Adding
comparison-pairs to a partially ordered set (where incomparable elements
throw an exception, rather than just return False which is confusing
from sort's perspective) can easily result in a graph that isn't
actually transitive and antisymmetric, i.e.:

A > B > C, but
C > D > A for some operator ">"

With this in mind, the only sensible thing for .sort to do when it
encounters an exception is to fall back to its "backup" comparator (id,
for example), and resort /the entire list/ using that comparator. The
results returned will then be valid by sort's comparison, but a subset
of that list containing only "good" objects (like integers) may not (and
probably won't be) sorted itself using the object's comparison.

The difficulty arises because while we may consider a single type to be
fully ordered, it may also have some comparisons with other types;
sorting the list by type, then value seems like it would work (and
that's what the library does), but since we can sometimes compare across
types, this may not give a valid comparison. Consider:

[apple, wheat, cow]
apple < cow (defined, totally ordered)
Mammal > Grain > Fruit (accident of type id numbers)
apple < wheat throws TypeError, also wheat < cow

Since we encounter a TypeError when sorting this list, we first sort by
type:
[cow, wheat, apple]
and would then sort-by-comparison within types, if we had more than one.

But since apple < cow, the sublist of this list [cow, apple] is not
itself sorted. This is paradoxical under a single comparison, because a
sublist of any sorted list is itself supposed to be sorted.

The ordering that .sort used on the full list is well-defined and total,
but it's also useless.

But will all libraries sort the shelves in the same way. As far as I
understand a book gets somekind of code in a library and books are
sorted according to these codes. Now everybody sorts these codes the
same way, but the attibution of the code may differ from library to
library.


Irrelevant; if you swap the order of any two (nonidentical) books in a
single library, then the library is unsorted. The library has imposed a
total order on books in its collection.

The library example is irrelevant, anyway; two books can always be
compared by the keys (author last, author first/middle, [other authors],
title, editor, year published, publisher, number of pages, colour of the
brightest line in the cover's emmission spectrum at 2000k in spectral
order [within visible spectrum]), and for the most part this is a useful
keyspace. Libraries just choose to impose a category/numbering at the
head of the key.
Oct 25 '05 #22

P: n/a
Op 2005-10-25, Christopher Subich schreef <cs****************@spam.subich.block.com>:
Antoon Pardon wrote:
It *is* a definition of an ordering.

For something to be an ordering it has to be anti symmetric and transitive.

The subset relationships on sets conform to these conditions so it is a (partial)
ordering. Check your mathematic books, Why you would think this is abuse is beyond me
Which is exactly why a < b on sets returns True xor False, but cmp(a,b)
throws an exception.


I don't see the conection.

The documentation states that cmp(a,b) will return a negative value if
a < b. So why should it throw an exception?
a <COMPARE> b is a local comparison, asking only for the relationship
between two elements. In some bases, like the complex numbers, some
comparisons are ill-defined.; in others, like sets, they're well-defined
but don't give a total ordering.

cmp(a,b) asks for their relative rankings in some total ordering.
The documentation doesn't state that. I also didn't find anything in
the documentation on how the programmer should code in order to
enforce this.

So someone who just use the rich comparisons to implement his
partial order will screw up this total order that cmp is somehow
providing.

And cmp doesn't provide a total ordering anyway. Sets are clearly
uncomparable using cmp, so cmp doesn't provide a total order.

Maybe the original idea was that cmp provided some total ordering,
maybe the general idea is that cmp should provide a total ordering,
but that is not documented, nor is there any documentation in
helping the programmer in this.

And even if we leave sets aside it still isn't true.
from decimal import Decimal
Zero = Decimal(0)
cmp( ( ) , Zero) -1 cmp(Zero, 1) -1 cmp(1, ( ) )

-1

For a
space that does not have a total ordering, cmp(a,b) is meaningless at
best and dangerous at worst.
The current specs and implemantation are.

I see nothing wrong with a function that would provide four kinds of
results when given two elements. The same results as cmp gives now
when it applies and None or other appropiate value or a raised
exception when not.

Given how little this functionality differs from the current cmp,
I don't see why it couldn't replace the current cmp.
It /should/ throw an exception when the
results of cmp aren't well-defined, consistent, antisymmetric, and
transitive.


That an order is not total in no way implies any of the above.
The order on sets is well-defined, consistent, antisymmetric
and transitive.

Python may not be able to handle it in a well-defined, consistent
manner, but that is IMO a problem of python not of partial orders.

--
Antoon Pardon
Oct 25 '05 #23

P: n/a
On Tue, 25 Oct 2005 10:09:29 -0400, Christopher Subich wrote:
By analogy, one can ask, "is the cat inside the box?" and get the answer
"No", but this does not imply that therefore the box must be inside the
cat.

Bad analogy, this doesn't define a mathematical ordering, the subset
relationship does.


Yes, it does. Consider "in" as a mathematical operator:

For the set (box, cat-in-box)

box in box: False
box in cat-in-box: False
cat-in-box in box: True
cat-in-box in cat-in-box: False

For the set (box, smart-cat) # cat's too smart to get in the box

box in box: False
box in smart-cat: False
smart-cat in box: False
smart-cat in smart-cat: False

In both these cases, the "in" operator is irreflexive, asymmetric, and
transitive (extend to mouse-in-cat if you really care about transitive),
so "in" is a partial order sans equality. A useless one, but a partial
order nonetheless.


What do you mean "in" is a useless ordering? It makes a huge difference
whether "nuclear bomb in New York" is true or not.

In fact, I'm quite surprised that Antoon should object to "in" as "this
doesn't define a mathematical ordering, the subset relationship does" when
"subset" is just "in" for sets: set S is a subset of set T if for all
elements x in S, x is also in T. Informally, subset S is in set T.

Can somebody remind me, what is the problem Antoon is trying to solve here?
--
Steven.

Oct 25 '05 #24

P: n/a
Antoon Pardon wrote:
Op 2005-10-25, Christopher Subich schreef <cs****************@spam.subich.block.com>:
Which is exactly why a < b on sets returns True xor False, but cmp(a,b)
throws an exception.

I don't see the conection.

The documentation states that cmp(a,b) will return a negative value if
a < b. So why should it throw an exception?


Because it's useful that cmp(a1, a2) should either (return a value) or
(throw an exception) for any element a1, a2 within type(a1) cross
type(a2). If cmp sometimes is okay and sometimes throws an exception,
then it leads to weird borkage in things like trees.

With that in mind, not all sets are comparable. {a} and {b} have no
comparison relationship, as you've pointed out, aside from not-equal.
I'll get to why "not-equal" is a bad idea below.
cmp(a,b) asks for their relative rankings in some total ordering.

The documentation doesn't state that. I also didn't find anything in
the documentation on how the programmer should code in order to
enforce this.


Most programmers are simply programmers; they haven't had the benefit of
a couple years' pure-math education, so the distinction between "partial
order" and "total order" is esoteric at best.

With that in mind, compare should conform, as best as possible, to
"intuitive" behavior of comparison. Since comparisons are mostly done
on numbers, an extension of comparisons should behave "as much like
numbers" as possible.

So someone who just use the rich comparisons to implement his
partial order will screw up this total order that cmp is somehow
providing.

And cmp doesn't provide a total ordering anyway. Sets are clearly
uncomparable using cmp, so cmp doesn't provide a total order.
Cmp isn't supposed to "provide" a total order, it's supposed to reflect
relative standing in one if one already exists for P1 x P2. If one
doesn't exist, I'd argue that it's the best course of action to throw an
exception.

After all, rich comparisons were put in precisely to allow support of
limited comparisons when a total order (or indeed full comparison) isn't
appropriate.

Maybe the original idea was that cmp provided some total ordering,
maybe the general idea is that cmp should provide a total ordering,
but that is not documented, nor is there any documentation in
helping the programmer in this.
I doubt that anyone was thinking about it in such depth. My bet is that
the thought process goes this way:

Og compare numbers. Numbers good, compare good. Grunt grunt.
<some aeons pass>
Language designers: Wouldn't it be nice if we could allow user-defined
objects, such as numbers with units, to compare properly with each
other? This would let us say (1 m) < (.5 mile) pretty easily, eh?

Guido: Let's let classes override a __cmp__ function for comparisons.

In programming language theory, comparisons were firstly about numbers,
and their leading-order behaviour has always stayed about numbers.
Comparing entities which are best represented in an... interesting
formal mathematical way (i.e. partial orders, objects for which some
comparisons are Just Plain Weird) works only as a side-effect of
number-like behavior.

The lesson to take home from this: the less a custom class behaves like
a number, the less intutitively meaningful (or even valid) comparisons
will be on it.

And even if we leave sets aside it still isn't true.

from decimal import Decimal
Zero = Decimal(0)
cmp( ( ) , Zero)
-1
cmp(Zero, 1)
-1
cmp(1, ( ) )
-1
I'd argue that the wart here is that cmp doesn't throw an exception, not
that it returns inconsistent results. This is a classic case of
incomparable objects, and saying that 1 < an empty tuple is bordering on
meaningless.

For a
space that does not have a total ordering, cmp(a,b) is meaningless at
best and dangerous at worst.

The current specs and implemantation are.

I see nothing wrong with a function that would provide four kinds of
results when given two elements. The same results as cmp gives now
when it applies and None or other appropiate value or a raised
exception when not.

Given how little this functionality differs from the current cmp,
I don't see why it couldn't replace the current cmp.


My biggest complaint here is about returning None or IncomparableValue;
if that happens, then all code that relies on cmp returning a numeric
result will have to be rewritten.

Comparing incomparables is an exceptional case, and hence it should
raise an exception.

As for saying that cmp should return some times and raise an exception
other times, I just find it squicky. Admittedly, this is entirely up to
the class designer, and your proposed guideline wouldn't change cmp's
behavior for clases that /are/ totally ordered.

Then again, sets excepted, your guideline doesn't seem very applicable
in standard library code.
It /should/ throw an exception when the
results of cmp aren't well-defined, consistent, antisymmetric, and
transitive.

That an order is not total in no way implies any of the above.
The order on sets is well-defined, consistent, antisymmetric
and transitive.


A partial order is by definition consistent, antisymmetric, and
transitive. A total order is furthermore universal, such that if a !=
b, aRb xor bRa.

This matches the normal use of cmp, and comparing two incomparable
objects is an exceptional case -- hence the exception. Although again,
you're free to provide a cmp for your classes that operates only on a
subset of type x type.
Oct 25 '05 #25

P: n/a
Steven D'Aprano wrote:
On Tue, 25 Oct 2005 10:09:29 -0400, Christopher Subich wrote:

By analogy, one can ask, "is the cat inside the box?" and get the answer
"No", but this does not imply that therefore the box must be inside the
cat.
Bad analogy, this doesn't define a mathematical ordering, the subset
relationship does.


Yes, it does. Consider "in" as a mathematical operator:

For the set (box, cat-in-box)

box in box: False
box in cat-in-box: False
cat-in-box in box: True
cat-in-box in cat-in-box: False

For the set (box, smart-cat) # cat's too smart to get in the box

box in box: False
box in smart-cat: False
smart-cat in box: False
smart-cat in smart-cat: False

In both these cases, the "in" operator is irreflexive, asymmetric, and
transitive (extend to mouse-in-cat if you really care about transitive),
so "in" is a partial order sans equality. A useless one, but a partial
order nonetheless.

What do you mean "in" is a useless ordering? It makes a huge difference
whether "nuclear bomb in New York" is true or not.

In fact, I'm quite surprised that Antoon should object to "in" as "this
doesn't define a mathematical ordering, the subset relationship does" when
"subset" is just "in" for sets: set S is a subset of set T if for all
elements x in S, x is also in T. Informally, subset S is in set T.

Can somebody remind me, what is the problem Antoon is trying to solve here?

Being Belgian, I suspect.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Oct 25 '05 #26

P: n/a
Op 2005-10-25, Christopher Subich schreef <cs****************@spam.subich.block.com>:
Antoon Pardon wrote:
Op 2005-10-25, Christopher Subich schreef <cs****************@spam.subich.block.com>:
Which is exactly why a < b on sets returns True xor False, but cmp(a,b)
throws an exception.

I don't see the conection.

The documentation states that cmp(a,b) will return a negative value if
a < b. So why should it throw an exception?


Because it's useful that cmp(a1, a2) should either (return a value) or
(throw an exception) for any element a1, a2 within type(a1) cross
type(a2). If cmp sometimes is okay and sometimes throws an exception,
then it leads to weird borkage in things like trees.


Oh no, not "we have to protect the programmer" argument again.

This kind of behaviour will manifest itself soon enough in testing.
Almost eacht time some addition is suggested that could help
with finding bugs, it gets rejected that because such bugs will be
found soon enough with proper testing.

And in order to protect one specific case, you limit programmers
the use in all other cases where it could be usefull.
With that in mind, not all sets are comparable. {a} and {b} have no
comparison relationship, as you've pointed out, aside from not-equal.
I'll get to why "not-equal" is a bad idea below.
cmp(a,b) asks for their relative rankings in some total ordering.

The documentation doesn't state that. I also didn't find anything in
the documentation on how the programmer should code in order to
enforce this.


Most programmers are simply programmers; they haven't had the benefit of
a couple years' pure-math education, so the distinction between "partial
order" and "total order" is esoteric at best.


So. these people are not familiar with higher order functions either.
Do you think we should make python a language only for the not too
educated.
With that in mind, compare should conform, as best as possible, to
"intuitive" behavior of comparison. Since comparisons are mostly done
on numbers, an extension of comparisons should behave "as much like
numbers" as possible.
IMO cmp(set([0]), set([0,1])) giving a result of -1, is much more
number like behaviour than throwing an exception. With numbers
we have that a < b implies cmp(a,b) < 0. So the most number like
behaviour for sets would also be that a < b would imply cmp(a,b) < 0.
Maybe the original idea was that cmp provided some total ordering,
maybe the general idea is that cmp should provide a total ordering,
but that is not documented, nor is there any documentation in
helping the programmer in this.


I doubt that anyone was thinking about it in such depth. My bet is that
the thought process goes this way:


Which seems to be my biggest frustration with python. Things are
often enough not thought out in depth.
>from decimal import Decimal
>Zero = Decimal(0)
>cmp( ( ) , Zero)


-1
>cmp(Zero, 1)


-1
>cmp(1, ( ) )


-1


I'd argue that the wart here is that cmp doesn't throw an exception, not
that it returns inconsistent results. This is a classic case of
incomparable objects, and saying that 1 < an empty tuple is bordering on
meaningless.


I wont argue with you here, but it is what python gives as now.
Changing this behaviour is not going to happen.
For a
space that does not have a total ordering, cmp(a,b) is meaningless at
best and dangerous at worst.

The current specs and implemantation are.

I see nothing wrong with a function that would provide four kinds of
results when given two elements. The same results as cmp gives now
when it applies and None or other appropiate value or a raised
exception when not.

Given how little this functionality differs from the current cmp,
I don't see why it couldn't replace the current cmp.


My biggest complaint here is about returning None or IncomparableValue;
if that happens, then all code that relies on cmp returning a numeric
result will have to be rewritten.


I don't know. There are two possibilities.

1) The user is working with a total order. In that case the None
or IncomparableValues won't be returned anyway so nothing about
his code has to change.

2) The user is working with a partial order. In that case cmp
doesn't provide consistent results so the use of cmp in this
case was a bug anyway.
Comparing incomparables is an exceptional case, and hence it should
raise an exception.

As for saying that cmp should return some times and raise an exception
other times, I just find it squicky.
But cmp already behaves this way. The only difference would be that
the decision would be made based on the values of the objects instead
of only their class.
Admittedly, this is entirely up to
the class designer, and your proposed guideline wouldn't change cmp's
behavior for clases that /are/ totally ordered.

Then again, sets excepted, your guideline doesn't seem very applicable
in standard library code.


Well AFAIAC this isn't specific about library code.

--
Antoon Pardon
Oct 26 '05 #27

P: n/a
Op 2005-10-25, Christopher Subich schreef <cs****************@spam.subich.block.com>:
Antoon Pardon wrote:
I also think there is the problem that people aren't used to partial
ordering. There is an ordering over sets, it is just not a total
ordering. But that some pairs are uncomparable (meaning that neither
one is smaller or greater) doesn't imply that comparing them is
ill defined.
It depends on your definition of "comparison." Admittedly, <, =, !=,
and > can be defined for a partial ordering, but classical mathematics,
as understood by most people (even programmers), assumes that unless a
== b, a > b or a < b.

The comparisons, as defined this way, are done on a totally ordered set,
yes. But since most comparisons ever done -are- done on a totally
ordered set (namely the integers or reals), it's entirely fair to say
that "a programmer's expectation" is that comparisons should work more
or less like a totally ordered list.
With that caevat in mind, comparison on a partially ordered domain /is/
ill-defined; it can give inconsistent results from the "a < b or a > b
or a == b" rule.


Against people expectations is not the same as inconsistent.

Well it is a wrong assumption is general. There is nothing impure about
partial ordering.


Impure? Probably not. Useless from many perspective when "comparisons"
are needed, to the point where it's probably safer to throw an exception
than define results.
That is IMO irrelevant. The subset relationship is an ordering and as
such has all characteristics of other orderings like "less than",
except that the subset relationship is partial and the less than
relationship is total. How it is called "subset" vs "less than" is
IMO irrelevant. It is about mathematical characteristics.


Still accident. < wouldn't be used for sets if we had a subset symbol
on the standard keyboard, APL fans notwithstanding.


That is only because people usualy work with numbers and sets at the
same time and using different symbols helps making the distinction.

But using <= for subset is mathematical completely correct. If you
want to go purely formal, numbers are just a particular family of
sets where the less than or equal relation of numbers coincides with
the subset relation of sets in that family.

People use the multiplication symbol for multiplying matrices although
matrix multiplication is something different from number multiplication.

If the multiplications of matrices is enough like the multiplication of
numbers to use the same symbol, I see nothing wrong in using the same
symbol for the (strict) subset relation as is used for the less than
(or equal) relation on numbers.
Simce programmers
familiar with Python understand that < on sets isn't a "real" comparison
(i.e. provide a total order), they don't expect unique, consistent
results from something like sort (or a tree, for that matter).
By analogy, one can ask, "is the cat inside the box?" and get the answer
"No", but this does not imply that therefore the box must be inside the
cat.

Bad analogy, this doesn't define a mathematical ordering, the subset
relationship does.


Yes, it does. Consider "in" as a mathematical operator:


My appologies, I though you were going in the "I can in my coat,
my coat can in the box" kind of direction.
For the set (box, cat-in-box)

box in box: False
box in cat-in-box: False
cat-in-box in box: True
cat-in-box in cat-in-box: False

For the set (box, smart-cat) # cat's too smart to get in the box

box in box: False
box in smart-cat: False
smart-cat in box: False
smart-cat in smart-cat: False

In both these cases, the "in" operator is irreflexive, asymmetric, and
transitive (extend to mouse-in-cat if you really care about transitive),
so "in" is a partial order sans equality. A useless one, but a partial
order nonetheless.
I don't know if it is useless. This kind of relationship seems very
common in political geography.

This village is in this county, this county is in this state, this state
is in this nation.

Also magizines that test certain equipment to give advice on the best
buy, usualy work with methods that result in partial orderings.
Notice that L1 and L2 contain the same elements, just in different orders.
Sorting those two lists should give the same order, correct?

No. Since we don't have a total ordering, sorting doesn't has to give
the same result. For as far as sorting is defined on this kind of
object, the only thing I would expect is that after a sort the
following condition holds.

for all i,j if i < j then not L[i] > L[j]


Highly formal, aren't you?


I like the precision it provides.
Again, common programming allows for "not a > b"
== "a <= b", so your condition becomes the more familiar:

for all i,j in len(list), if i < j then L[i] <= L[j]
So, should we limit our programming to the common cases
or should a language also provide help for the uncommon
case.
This condition is also assumed implicitly by many other assumed "rules"
of sorted lists, namely their uniqueness:

"If L is a list of unique keys, then sort(L) is a unique list of keys in
sorted order."

Yes, this assumes that L has a total order. Big whoop -- this is a
matter of programming practiciality rather than mathematical purity.


I don't find it so practical if the language limits the programmer
to the common cases. I can understand that implementation issues
impose limits on what can pratically realised and often enough that
is the more common practice. So in this case it is about programming
practiciality. But you seem to think it is programming practiciality
if you limit yourself to the common case from the beginning, even
if implementing for a more general case wouldn't be much harder
then implementing only the common case.

--
Antoon Pardon
Oct 26 '05 #28

P: n/a
Op 2005-10-25, Steven D'Aprano schreef <st***@REMOVETHIScyber.com.au>:
On Tue, 25 Oct 2005 10:09:29 -0400, Christopher Subich wrote:
By analogy, one can ask, "is the cat inside the box?" and get the answer
"No", but this does not imply that therefore the box must be inside the
cat.
Bad analogy, this doesn't define a mathematical ordering, the subset
relationship does.
Yes, it does. Consider "in" as a mathematical operator:

For the set (box, cat-in-box)

box in box: False
box in cat-in-box: False
cat-in-box in box: True
cat-in-box in cat-in-box: False

For the set (box, smart-cat) # cat's too smart to get in the box

box in box: False
box in smart-cat: False
smart-cat in box: False
smart-cat in smart-cat: False

In both these cases, the "in" operator is irreflexive, asymmetric, and
transitive (extend to mouse-in-cat if you really care about transitive),
so "in" is a partial order sans equality. A useless one, but a partial
order nonetheless.


What do you mean "in" is a useless ordering? It makes a huge difference
whether "nuclear bomb in New York" is true or not.

In fact, I'm quite surprised that Antoon should object to "in" as "this
doesn't define a mathematical ordering, the subset relationship does" when
"subset" is just "in" for sets: set S is a subset of set T if for all
elements x in S, x is also in T. Informally, subset S is in set T.


I was misunderstaning where Christopher was heading to.
Can somebody remind me, what is the problem Antoon is trying to solve here?


Well there are two issues. One about correct behaviour and one about
practicallity.

The first problem is cmp. This is what the documentation states:

cmp(x, y)
Compare the two objects x and y and return an integer according to
the outcome. The return value is negative if x < y, zero if x == y
and strictly positive if x > y.

The problem is, that if someone implements a partial ordered class,
with the rich comparisons, cmp doesn't behave correct. It can't
behave correct because it will always give an number as a result
and such a number implies one of three relations between two
elements, but with partial ordered classes, it is possible none
of these relations exist between those two elements. So IMO cmp
should be changed to reflect this. This can be done either by cmp
returning None or UnequalValues. or by cmp raising UnequalValues
in such a case.

The second part is one of practicallity. Once you have accepted cmp,
should reflect this possibility, it makes sense that the __cmp__
method should get the same possibilities, so that where it is more
pratical to do so, the programmer can implement his partial order
through one __cmp__ method instead of having to write six.
In my opinion such a change would break no existing code. This
is because there are two possibilities.

1) The user is using cmp on objects that form a total order.
In such circumstances the behaviour of cmp doesn't change.

2) The user is using cmp on objects that don't form a total
order. In such circumstances cmp doesn't give consistent
results, so the code is already broken.

--
Antoon Pardon
Oct 26 '05 #29

P: n/a


Antoon Pardon wrote:
Op 2005-10-25, Steven D'Aprano schreef <st***@REMOVETHIScyber.com.au>:
Can somebody remind me, what is the problem Antoon is trying to solve here?

Well there are two issues. One about correct behaviour and one about
practicallity.

The first problem is cmp. This is what the documentation states:

cmp(x, y)
Compare the two objects x and y and return an integer according to
the outcome. The return value is negative if x < y, zero if x == y
and strictly positive if x > y.

The problem is, that if someone implements a partial ordered class,
with the rich comparisons, cmp doesn't behave correct. It can't
behave correct because it will always give an number as a result
and such a number implies one of three relations between two
elements, but with partial ordered classes, it is possible none
of these relations exist between those two elements. So IMO cmp
should be changed to reflect this. This can be done either by cmp
returning None or UnequalValues. or by cmp raising UnequalValues
in such a case.


Instead of changing cmp, why not just have several common versions to
choose from instead of trying to make one tool do it all?
The second part is one of practicallity. Once you have accepted cmp,
should reflect this possibility, it makes sense that the __cmp__
method should get the same possibilities, so that where it is more
pratical to do so, the programmer can implement his partial order
through one __cmp__ method instead of having to write six.
In my opinion such a change would break no existing code. This
is because there are two possibilities.

1) The user is using cmp on objects that form a total order.
In such circumstances the behaviour of cmp doesn't change.

2) The user is using cmp on objects that don't form a total
order. In such circumstances cmp doesn't give consistent
results, so the code is already broken.

Adding complexity to cmp may not break code, but it could probably slow
down sorting in general. So I would think what ever improvements or
alternatives needs to be careful not to slow down existing sorting cases.

Cheers,
Ron

Oct 26 '05 #30

P: n/a
Antoon Pardon wrote:
Op 2005-10-25, Christopher Subich schreef <cs****************@spam.subich.block.com>:

My biggest complaint here is about returning None or IncomparableValue;
if that happens, then all code that relies on cmp returning a numeric
result will have to be rewritten.

I don't know. There are two possibilities.

1) The user is working with a total order. In that case the None
or IncomparableValues won't be returned anyway so nothing about
his code has to change.

2) The user is working with a partial order. In that case cmp
doesn't provide consistent results so the use of cmp in this
case was a bug anyway.


Case 3) The user is working with an unknown class, using duck typing,
and expects a total order. If cmp suddenly returned Incomparable or
None, the code would break in Unexpected Ways, with Undefined Behavior.

This is a classic "exception versus error code" argument; in this case,
returning None would be the error flag. It's almost always better to
just throw the exception, because then this allows error-checking at a
more natural level.
As for saying that cmp should return some times and raise an exception
other times, I just find it squicky.

But cmp already behaves this way. The only difference would be that
the decision would be made based on the values of the objects instead
of only their class.

Admittedly, this is entirely up to
the class designer, and your proposed guideline wouldn't change cmp's
behavior for clases that /are/ totally ordered.

Then again, sets excepted, your guideline doesn't seem very applicable
in standard library code.

Well AFAIAC this isn't specific about library code.


A change that cmp return a 4th possible "value" (None or Incomparable)
is a fundamental language change. Because this would break large
amounts of existing code, it's a bad idea.

A change that cmp throw an exception iff the two objects, rather than
the two classes, were incomparable (allowing comparisons of( 1+0j and
2+0j) and ({a} and {a,b}) but not (1+1j and 2+0j) and ({a} and {b})) is
a stylistic guideline, since it's already possible to write your own
classes this way. The only place this change would matter is in the
standard library code, and in just a few places at that.
Oct 26 '05 #31

P: n/a
Antoon Pardon wrote:
Christopher Subich schreef :
Antoon Pardon wrote:
>>>from decimal import Decimal
>>>Zero = Decimal(0)
>>>cmp( ( ) , Zero)
-1
>>>cmp(Zero, 1)
-1
>>>cmp(1, ( ) )
-1


I'd argue that the wart here is that cmp doesn't throw an exception, not
that it returns inconsistent results. This is a classic case of
incomparable objects, and saying that 1 < an empty tuple is bordering on
meaningless.


I wont argue with you here, but it is what python gives as now.
Changing this behaviour is not going to happen.


FWIW, Guido has said a few times that in Python 3.0 we should "Raise an
exception when making comparisons (other than equality and inequality)
between two incongruent types."[1] But yes, the behavior won't change
in the Python 2.X series due to backwards compatibility concerns.

STeVe

[1] http://wiki.python.org/moin/Python3%2e0
Oct 26 '05 #32

P: n/a
Op 2005-10-26, Ron Adam schreef <rr*@ronadam.com>:


Antoon Pardon wrote:
Op 2005-10-25, Steven D'Aprano schreef <st***@REMOVETHIScyber.com.au>:
Can somebody remind me, what is the problem Antoon is trying to solve here?

Well there are two issues. One about correct behaviour and one about
practicallity.

The first problem is cmp. This is what the documentation states:

cmp(x, y)
Compare the two objects x and y and return an integer according to
the outcome. The return value is negative if x < y, zero if x == y
and strictly positive if x > y.

The problem is, that if someone implements a partial ordered class,
with the rich comparisons, cmp doesn't behave correct. It can't
behave correct because it will always give an number as a result
and such a number implies one of three relations between two
elements, but with partial ordered classes, it is possible none
of these relations exist between those two elements. So IMO cmp
should be changed to reflect this. This can be done either by cmp
returning None or UnequalValues. or by cmp raising UnequalValues
in such a case.


Instead of changing cmp, why not just have several common versions to
choose from instead of trying to make one tool do it all?


That would be an option. Lets call such an alternative comp.
Would that mean also have a __comp__ method for customization.
The second part is one of practicallity. Once you have accepted cmp,
should reflect this possibility, it makes sense that the __cmp__
method should get the same possibilities, so that where it is more
pratical to do so, the programmer can implement his partial order
through one __cmp__ method instead of having to write six.
In my opinion such a change would break no existing code. This
is because there are two possibilities.

1) The user is using cmp on objects that form a total order.
In such circumstances the behaviour of cmp doesn't change.

2) The user is using cmp on objects that don't form a total
order. In such circumstances cmp doesn't give consistent
results, so the code is already broken.


Adding complexity to cmp may not break code, but it could probably slow
down sorting in general.


The evidence suggests that cmp is not used in sorting. If you have a
list of sets, sort will happily try to sort it, while calling cmp
with a set as an argument throws an exception.

I have a feeling that adding comp (and its brother __comp__) would
have a detrimental effect on sorting, because using '<' would then
mean one extra method to look for that could be used in determining
if a < b or not.
So I would think what ever improvements or
alternatives needs to be careful not to slow down existing sorting cases.


I have done some tests and have come to the conclusion that other
factors will have a greater influence on sorting times. I have written
the following program to estimate effects:

from random import shuffle

from time import time

class UnequalValues(Exception):
pass

class cla:
def __init__(self, i):
self.value = int(i)

def __cmp__(self, other):
return self.value - other.value

class clb:
def __init__(self, i):
self.value = int(i)

def __lt__(self, other):
return self.value < other.value

class clc:
def __init__(self, i):
self.value = int(i)

def __comp__(self, other):
return self.value - other.value

def __lt__(self, other):
try:
return self.__comp__(other) < 0
except UnequalValues:
return False

def test(lng, rep):

for cl in cla, clb, clc:
total = 0.0
for _ in xrange(rep):
lst = [cl(i) for i in xrange(lng)]
shuffle(lst)
start = time()
lst.sort()
stop = time()
total += stop - start
print "%s: %d repeats, %d long, %9.6f secs" % (cl, rep, lng, total)

test(1000,1000)
This is the result I get:

__main__.cla: 1000 repeats, 1000 long, 31.986618 secs
__main__.clb: 1000 repeats, 1000 long, 8.963896 secs
__main__.clc: 1000 repeats, 1000 long, 16.893321 secs
I find it very odd that clc sorts about twice as fast as cla.

That means that every class that has a __cmp__ method can be
speeded up with sorting by writing a similar __lt__ method
as in clc. I do wonder what is causing this.

--
Antoon Pardon
Oct 27 '05 #33

P: n/a
Op 2005-10-26, Christopher Subich schreef <cs****************@spam.subich.block.com>:
Antoon Pardon wrote:
Op 2005-10-25, Christopher Subich schreef <cs****************@spam.subich.block.com>:

My biggest complaint here is about returning None or IncomparableValue;
if that happens, then all code that relies on cmp returning a numeric
result will have to be rewritten.

I don't know. There are two possibilities.

1) The user is working with a total order. In that case the None
or IncomparableValues won't be returned anyway so nothing about
his code has to change.

2) The user is working with a partial order. In that case cmp
doesn't provide consistent results so the use of cmp in this
case was a bug anyway.


Case 3) The user is working with an unknown class, using duck typing,
and expects a total order. If cmp suddenly returned Incomparable or
None, the code would break in Unexpected Ways, with Undefined Behavior.


This is case 2. Under the current circumstances working with cmp with
a partial order will give inconsistent behaviour. So your code is
already broken for relying on cmp to give consistent results in
circumstances it can't.
This is a classic "exception versus error code" argument; in this case,
returning None would be the error flag. It's almost always better to
just throw the exception, because then this allows error-checking at a
more natural level.


If you prefer a raised exception, I could live with that.
As for saying that cmp should return some times and raise an exception
other times, I just find it squicky.

But cmp already behaves this way. The only difference would be that
the decision would be made based on the values of the objects instead
of only their class.

Admittedly, this is entirely up to
the class designer, and your proposed guideline wouldn't change cmp's
behavior for clases that /are/ totally ordered.

Then again, sets excepted, your guideline doesn't seem very applicable
in standard library code.

Well AFAIAC this isn't specific about library code.


A change that cmp return a 4th possible "value" (None or Incomparable)
is a fundamental language change. Because this would break large
amounts of existing code, it's a bad idea.


I have always included the option that cmp would raise an exception
instead of returning a fourth value.

--
Antoon Pardon
Oct 27 '05 #34

P: n/a
On 27 Oct 2005 08:12:15 GMT, Antoon Pardon <ap*****@forel.vub.ac.be> wrote:
[...]

The evidence suggests that cmp is not used in sorting. If you have a
list of sets, sort will happily try to sort it, while calling cmp
with a set as an argument throws an exception.

A data point re evidence:
class C: ... def __getattr__(self, attr): print attr; raise AttributeError
...
sorted((C(),C())) __lt__
__gt__
__gt__
__lt__
__coerce__
__coerce__
__cmp__
__cmp__
[__repr__
<__main__.C instance at 0x02EF388C>, __repr__
<__main__.C instance at 0x02EF38CC>]

I think it will be slightly different if you define those methods
in a new-style class -- oh, heck, why not do it:
class D(object): ... def __lt__(*ignore): print '__lt__'; return NotImplemented
... def __gt__(*ignore): print '__gt__'; return NotImplemented
... def __coerce__(*ignore): print '__coerce__'; return NotImplemented
... def __cmp__(*ignore): print '__cmp__'; return NotImplemented
... sorted((D(),D()))

__lt__
__gt__
__cmp__
__cmp__

(I haven't followed the thread much, so please excuse if irrelevant ;-)

Regards,
Bengt Richter
Oct 27 '05 #35

P: n/a
Op 2005-10-26, Ron Adam schreef <rr*@ronadam.com>:

Adding complexity to cmp may not break code, but it could probably slow
down sorting in general. So I would think what ever improvements or
alternatives needs to be careful not to slow down existing sorting cases.


As a result of Bengt's post, I rewrote my test program, this is the
program now.

from random import shuffle

from time import time

class UnequalValues(Exception):
pass

__metaclass__ = type

class cla:
def __init__(self, i):
self.value = int(i)

def __cmp__(self, other):
return self.value - other.value

class clb:
def __init__(self, i):
self.value = int(i)

def __lt__(self, other):
return self.value < other.value

class clc:
def __init__(self, i):
self.value = int(i)

def __gt__(self, other):
return self.value > other.value

class cld:
def __init__(self, i):
self.value = int(i)

def __comp__(self, other):
return self.value - other.value

def __lt__(self, other):
return self.__comp__(other) < 0

class cle:
def __init__(self, i):
self.value = int(i)

def __comp__(self, other):
return self.value - other.value

def __lt__(self, other):
try:
return self.__comp__(other) < 0
except UnequalValues:
return False

def test(lng, rep):

for cl in cla, clb, clc, cld, cle:
total = 0.0
for _ in xrange(rep):
lst = [cl(i) for i in xrange(lng)]
shuffle(lst)
start = time()
lst.sort()
stop = time()
total += stop - start
for i in xrange(1,rep):
assert lst[i - 1] < lst[i]
print "%s: %d repeats, %d long, %9.6f secs" % (cl, rep, lng, total)

test(1000,1000)

---

These are the results.

<class '__main__.cla'>: 1000 repeats, 1000 long, 10.061425 secs
<class '__main__.clb'>: 1000 repeats, 1000 long, 9.544035 secs
<class '__main__.clc'>: 1000 repeats, 1000 long, 10.450864 secs
<class '__main__.cld'>: 1000 repeats, 1000 long, 15.566061 secs
<class '__main__.cle'>: 1000 repeats, 1000 long, 15.776443 secs

Results on a longer sequence were:

<class '__main__.cla'>: 1000 repeats, 10000 long, 146.722443 secs
<class '__main__.clb'>: 1000 repeats, 10000 long, 139.480863 secs
<class '__main__.clc'>: 1000 repeats, 10000 long, 152.623424 secs
<class '__main__.cld'>: 1000 repeats, 10000 long, 224.630926 secs
<class '__main__.cle'>: 1000 repeats, 10000 long, 228.663825 secs

The most interesting result of this test is the difference between
cld and cle. IMO this difference is an indication of the cost
that my idea would have on sorting, should it be implemented.
That would be around 2%. I think that is bearable. Especially
since this were very simple routines. The moment the comparison
calculation become heavier, the relative contribution of the
try: except will diminuish.

If you are concerned about sorting times, I think you should
be more concerned about Guido's idea of doing away with __cmp__.
Sure __lt__ is faster. But in a number of cases writing __cmp__
is of the same complexity as writing __lt__. So if you then
need a __lt__, __le__, __eq__, __ne__, __gt__ and __ge__ it
would be a lot easier to write a __cmp__ and have all rich
comparisons methods call this instead of duplicating the code
about six times. So you would be more or less forced to write
your class as class cld or cle. This would have a bigger
impact on sorting times than my suggestion.

--
Antoon Pardon
Oct 28 '05 #36

P: n/a
Op 2005-10-28, Antoon Pardon schreef <ap*****@forel.vub.ac.be>:
Op 2005-10-26, Ron Adam schreef <rr*@ronadam.com>:


These are the results.

<class '__main__.cla'>: 1000 repeats, 1000 long, 10.061425 secs
<class '__main__.clb'>: 1000 repeats, 1000 long, 9.544035 secs
<class '__main__.clc'>: 1000 repeats, 1000 long, 10.450864 secs
<class '__main__.cld'>: 1000 repeats, 1000 long, 15.566061 secs
<class '__main__.cle'>: 1000 repeats, 1000 long, 15.776443 secs

Results on a longer sequence were:

<class '__main__.cla'>: 1000 repeats, 10000 long, 146.722443 secs
<class '__main__.clb'>: 1000 repeats, 10000 long, 139.480863 secs
<class '__main__.clc'>: 1000 repeats, 10000 long, 152.623424 secs
<class '__main__.cld'>: 1000 repeats, 10000 long, 224.630926 secs
<class '__main__.cle'>: 1000 repeats, 10000 long, 228.663825 secs

The most interesting result of this test is the difference between
cld and cle. IMO this difference is an indication of the cost
that my idea would have on sorting, should it be implemented.
That would be around 2%. I think that is bearable. Especially
since this were very simple routines. The moment the comparison
calculation become heavier, the relative contribution of the
try: except will diminuish.

If you are concerned about sorting times, I think you should
be more concerned about Guido's idea of doing away with __cmp__.
Sure __lt__ is faster. But in a number of cases writing __cmp__
is of the same complexity as writing __lt__. So if you then
need a __lt__, __le__, __eq__, __ne__, __gt__ and __ge__ it
would be a lot easier to write a __cmp__ and have all rich
comparisons methods call this instead of duplicating the code
about six times. So you would be more or less forced to write
your class as class cld or cle. This would have a bigger
impact on sorting times than my suggestion.


And as an afterthought: Adding __slots__ increased the
sorting speed between 7.5 and 9.0%

--
Antoon Pardon
Oct 28 '05 #37

P: n/a


Antoon Pardon wrote:
Op 2005-10-26, Ron Adam schreef <rr*@ronadam.com>:
Adding complexity to cmp may not break code, but it could probably slow
down sorting in general. So I would think what ever improvements or
alternatives needs to be careful not to slow down existing sorting cases.
As a result of Bengt's post, I rewrote my test program, this is the
program now.


....
These are the results.

<class '__main__.cla'>: 1000 repeats, 1000 long, 10.061425 secs
<class '__main__.clb'>: 1000 repeats, 1000 long, 9.544035 secs
<class '__main__.clc'>: 1000 repeats, 1000 long, 10.450864 secs
<class '__main__.cld'>: 1000 repeats, 1000 long, 15.566061 secs
<class '__main__.cle'>: 1000 repeats, 1000 long, 15.776443 secs

Results on a longer sequence were:

<class '__main__.cla'>: 1000 repeats, 10000 long, 146.722443 secs
<class '__main__.clb'>: 1000 repeats, 10000 long, 139.480863 secs
<class '__main__.clc'>: 1000 repeats, 10000 long, 152.623424 secs
<class '__main__.cld'>: 1000 repeats, 10000 long, 224.630926 secs
<class '__main__.cle'>: 1000 repeats, 10000 long, 228.663825 secs

The most interesting result of this test is the difference between
cld and cle. IMO this difference is an indication of the cost
that my idea would have on sorting, should it be implemented.
That would be around 2%. I think that is bearable. Especially
since this were very simple routines. The moment the comparison
calculation become heavier, the relative contribution of the
try: except will diminuish.

class cle:
def __init__(self, i):
self.value = int(i)

def __comp__(self, other):
return self.value - other.value

def __lt__(self, other):
try:
return self.__comp__(other) < 0
except UnequalValues:
return False
This would only work with numeric types. I believe that cmp is closer to
the following.

return ( self.value < other.value and -1
or self.value > self.value and 1
or 0 )

I don't think it effects the speed a great deal however.

If you are concerned about sorting times, I think you should
be more concerned about Guido's idea of doing away with __cmp__.
Sure __lt__ is faster. But in a number of cases writing __cmp__
is of the same complexity as writing __lt__. So if you then
need a __lt__, __le__, __eq__, __ne__, __gt__ and __ge__ it
would be a lot easier to write a __cmp__ and have all rich
comparisons methods call this instead of duplicating the code
about six times. So you would be more or less forced to write
your class as class cld or cle. This would have a bigger
impact on sorting times than my suggestion.


I haven't heard he was removing __cmp__, but I would think the sort or
sorted functions would just use the available comparisons methods or
equivalent C code for base types. So I expect it would only matter if
you need a custom or modified sort.

Although It is a thought that these cases could be improved by making
the sort value available to the underlying C sort function. Something
on the order of:

__sortvalue__ == self.value.

Cheers,
Ron

Oct 28 '05 #38

P: n/a
Antoon Pardon wrote:
If you are concerned about sorting times, I think you should
be more concerned about Guido's idea of doing away with __cmp__.
Sure __lt__ is faster. But in a number of cases writing __cmp__
is of the same complexity as writing __lt__. So if you then
need a __lt__, __le__, __eq__, __ne__, __gt__ and __ge__ it
would be a lot easier to write a __cmp__ and have all rich
comparisons methods call this instead of duplicating the code
about six times. So you would be more or less forced to write
your class as class cld or cle. This would have a bigger
impact on sorting times than my suggestion.


Honestly, I don't really mind the idea of __cmp__ going away; for
classes that behave Normally with respect to a single __cmp__ value,
it's easily possible to write a CompareMixin that defines __lt__,
__gt__, etc. for suitable __cmp__ values.

Much like DictMixin is part of the standard library.
Oct 28 '05 #39

P: n/a
Op 2005-10-28, Ron Adam schreef <rr*@ronadam.com>:

I haven't heard he was removing __cmp__,
I read somewhere he was considering it.
but I would think the sort or
sorted functions would just use the available comparisons methods or
equivalent C code for base types. So I expect it would only matter if
you need a custom or modified sort.

Although It is a thought that these cases could be improved by making
the sort value available to the underlying C sort function. Something
on the order of:

__sortvalue__ == self.value.


I doubt that will be possible in general. You can't even calculate such
a __sortvalue__ for lists.

--
Antoon Pardon
Oct 31 '05 #40

This discussion thread is closed

Replies have been disabled for this discussion.