472,096 Members | 1,224 Online

boolean operations on sets

Hi, I have been playing with set operations lately and came across a
kind of surprising result given that it is not mentioned in the
standard Python tutorial:

with python sets, intersections and unions are supposed to be done
like this:
In [7]:set('casa') & set('porca')
Out[7]:set(['a', 'c'])

In [8]:set('casa') | set('porca')
Out[8]:set(['a', 'c', 'o', 'p', 's', 'r'])

and they work correctly. Now what is confusing is that if you do:

In [5]:set('casa') and set('porca')
Out[5]:set(['a', 'p', 'c', 'r', 'o'])

In [6]:set('casa') or set('porca')
Out[6]:set(['a', 'c', 's'])

The results are not what you would expect from an AND or OR
operation, from the mathematical point of view! aparently the "and"
operation is returning the the second set, and the "or" operation is
returning the first.

If python developers wanted these operations to reflect the
traditional (Python) truth value for data structures: False for empty
data structures and True otherwise, why not return simply True or
False?

So My question is: Why has this been implemented in this way? I can
see this confusing many newbies...

Aug 6 '07 #1
7 1643
Flavio wrote:
Hi, I have been playing with set operations lately and came across a
kind of surprising result given that it is not mentioned in the
standard Python tutorial:

with python sets, intersections and unions are supposed to be done
like this:
In [7]:set('casa') & set('porca')
Out[7]:set(['a', 'c'])

In [8]:set('casa') | set('porca')
Out[8]:set(['a', 'c', 'o', 'p', 's', 'r'])

and they work correctly. Now what is confusing is that if you do:

In [5]:set('casa') and set('porca')
Out[5]:set(['a', 'p', 'c', 'r', 'o'])

In [6]:set('casa') or set('porca')
Out[6]:set(['a', 'c', 's'])

The results are not what you would expect from an AND or OR
operation, from the mathematical point of view! aparently the "and"
operation is returning the the second set, and the "or" operation is
returning the first.

If python developers wanted these operations to reflect the
traditional (Python) truth value for data structures: False for empty
data structures and True otherwise, why not return simply True or
False?

So My question is: Why has this been implemented in this way? I can
see this confusing many newbies...
it has been implemented in this way to conform with the definitions of
"and" and "or", which have never been intended to apply to set
operations. The result of these operations has always returned one of
the operands in the case where possible, and they continue to do so with
set operands.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

Aug 6 '07 #2
On Monday 06 August 2007, Flavio wrote:
So My question is: Why has this been implemented in this way? I can
see this confusing many newbies...
I did not implement this, so I cannot say, but it does have useful
side-effects, for example:

x = A or B

is equivalent to:

if A:
x = A
else:
x = B

also, in python implementations without the (y if x else z) syntax, you can
use (x and y or z) with nearly the same result*. Also, this implementation of
and/or might well be faster ;-)
*: this doesn't work the same if y is a false value; (x and [y] or [z])[0] is
less readable, but works for all y

--
Regards, Thomas Jollans
GPG key: 0xF421434B may be found on various keyservers, eg pgp.mit.edu
Hacker key <http://hackerkey.com/>:
v4sw6+8Yhw4/5ln3pr5Ock2ma2u7Lw2Nl7Di2e2t3/4TMb6HOPTen5/6g5OPa1XsMr9p-7/-6

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQBGtz6FJpinDvQhQ0sRAsN8AJ9SsIx6gj3fG+VHtXvp1a aCJ3E2WgCfeh+y
rx90H88SVRlBZbVRXmIG9Lo=
=Qgsq
-----END PGP SIGNATURE-----

Aug 6 '07 #3
Flavio wrote:
Hi, I have been playing with set operations lately and came across a
kind of surprising result given that it is not mentioned in the
standard Python tutorial:

with python sets, intersections and unions are supposed to be done
like this:
In [7]:set('casa') & set('porca')
Out[7]:set(['a', 'c'])

In [8]:set('casa') | set('porca')
Out[8]:set(['a', 'c', 'o', 'p', 's', 'r'])

and they work correctly. Now what is confusing is that if you do:

In [5]:set('casa') and set('porca')
Out[5]:set(['a', 'p', 'c', 'r', 'o'])

In [6]:set('casa') or set('porca')
Out[6]:set(['a', 'c', 's'])

The results are not what you would expect from an AND or OR
operation, from the mathematical point of view! aparently the "and"
operation is returning the the second set, and the "or" operation is
returning the first.

If python developers wanted these operations to reflect the
traditional (Python) truth value for data structures: False for empty
data structures and True otherwise, why not return simply True or
False?

So My question is: Why has this been implemented in this way? I can
see this confusing many newbies...
It has nothing to do with sets - it stems from the fact that certain values
in python are considered false, and all others true. And these semantics
were introduced at a point where there was no explicit True/False, so the
operators were defined in exact the way you observed.

Consider this:

"foo" or "bar" -"foo"

So - nothing to do with sets.

Diez
Aug 6 '07 #4
On Mon, 06 Aug 2007 14:13:51 +0000, Flavio wrote:
Hi, I have been playing with set operations lately and came across a
kind of surprising result given that it is not mentioned in the standard
Python tutorial:

with python sets, intersections and unions are supposed to be done
like this:
In [7]:set('casa') & set('porca')
Out[7]:set(['a', 'c'])

In [8]:set('casa') | set('porca')
Out[8]:set(['a', 'c', 'o', 'p', 's', 'r'])

and they work correctly. Now what is confusing is that if you do:

In [5]:set('casa') and set('porca')
Out[5]:set(['a', 'p', 'c', 'r', 'o'])

In [6]:set('casa') or set('porca')
Out[6]:set(['a', 'c', 's'])

The results are not what you would expect from an AND or OR operation,
from the mathematical point of view! aparently the "and" operation is
returning the the second set, and the "or" operation is returning the
first.
That might be, because `and` and `or` are not mathematical in Python (at
least not as you think). All the operator bits, e.g. `|` and `&`, are
overloadable. You can just give them any meaning you want.

The `and` and `or` operator, though, are implemented in Python and there
is no way you can make them behave different from how they do it by
default. It has been discussed to remove this behaviour or make them
overloadable as well but this hasn't made it far, as far as I remember.
If python developers wanted these operations to reflect the traditional
(Python) truth value for data structures: False for empty data
structures and True otherwise, why not return simply True or False?
Because in the most cases, returning True of False simply has no
advantage. But returning the actual operands has been of fairly large
use, e.g. for replacing the if expression ("ternary operator") ``THEN if
COND else DEFAULT`` with ``COND and THEN or DEFAULT`` (which has some bad
corner cases, though).
So My question is: Why has this been implemented in this way? I can see
this confusing many newbies...
Hmm, you could be right there. But they shouldn't be biased by default
boolean behaviour, then, anyways.
Aug 6 '07 #5
In article <5h*************@mid.uni-berlin.de>,
"Diez B. Roggisch" <de***@nospam.web.dewrote:
Flavio wrote:
Hi, I have been playing with set operations lately and came across a
kind of surprising result given that it is not mentioned in the
standard Python tutorial:

with python sets, intersections and unions are supposed to be done
like this:
[...]

The results are not what you would expect from an AND or OR
operation, from the mathematical point of view! aparently the "and"
operation is returning the the second set, and the "or" operation is
returning the first.
[...]

It has nothing to do with sets - it stems from the fact that certain values
in python are considered false, and all others true. And these semantics
were introduced at a point where there was no explicit True/False, so the
operators were defined in exact the way you observed.

Consider this:

"foo" or "bar" -"foo"

So - nothing to do with sets.
In addition to what Diez wrote above, it is worth noting that the
practise of returning the value of the determining expression turns out
to be convenient for the programmer in some cases. Consider the
following example:

x = some_function(a, b, c) or another_function(d, e)

This is a rather nice shorthand notation for the following behaviour:

t = some_function(a, b, c)
if t:
x = t
else:
x = another_function(d, e)

In other words, the short-circuit behaviour of the logical operators
gives you a compact notation for evaluating certain types of conditional
expressions and capturing their values. If the "or" operator converted
the result to True or False, you could not use it this way.

Similarly,

x = some_function(a, b, c) and another_function(d, e)

.... behaves as if you had written:

x = some_function(a, b, c)
if x:
x = another_function(d, e)

Again, as above, if the results were forcibly converted to Boolean
values, you could not use the shorthand.

Now that Python provides an expression variety of "if", this is perhaps
not as useful as it once was; however, it still has a role. Suppose,
for example, that a call to some_function() is very time-consuming; you
would not want to write:

x = some_function(a, b, c) \
if some_function(a, b, c) else another_function(d, e)

.... because then some_function would get evaluated twice. Python does
not permit assignment within an expression, so you can't get rid of the
second call without changing the syntax.

Also, it is a common behaviour in many programming languages for logical
connectives to both short-circuit and yield their values, so I'd argue
that most programmers are proabably accustomed to it. The && and ||
operators of C and its descendants also behave in this manner, as do the
AND and OR of Lisp or Scheme. It is possible that beginners may find it
a little bit confusing at first, but I believe such confusion is minor
and easily remedied.

Cheers,
-M

--
Michael J. Fromberger | Lecturer, Dept. of Computer Science
http://www.dartmouth.edu/~sting/ | Dartmouth College, Hanover, NH, USA
Aug 6 '07 #6
Michael J. Fromberger <Mi******************@Clothing.Dartmouth.EDU>
wrote:
...
Also, it is a common behaviour in many programming languages for logical
connectives to both short-circuit and yield their values, so I'd argue
that most programmers are proabably accustomed to it. The && and ||
operators of C and its descendants also behave in this manner, as do the
Untrue, alas...:

brain:~ alex\$ cat a.c
#include <stdio.h>

int main()
{
printf("%d\n", 23 && 45);
return 0;
}
brain:~ alex\$ gcc a.c
brain:~ alex\$ ./a.out
1

In C, && and || _do_ "short circuit", BUT they always return 0 or 1,
*NOT* "yield their values" (interpreted as "return the false or true
value of either operand", as in Python).
Alex

Aug 7 '07 #7
Flavio a écrit :
Hi, I have been playing with set operations lately and came across a
kind of surprising result given that it is not mentioned in the
standard Python tutorial:

with python sets, intersections and unions are supposed to be done
like this:
In [7]:set('casa') & set('porca')
Out[7]:set(['a', 'c'])

In [8]:set('casa') | set('porca')
Out[8]:set(['a', 'c', 'o', 'p', 's', 'r'])

and they work correctly. Now what is confusing is that if you do:

In [5]:set('casa') and set('porca')
Out[5]:set(['a', 'p', 'c', 'r', 'o'])

In [6]:set('casa') or set('porca')
Out[6]:set(['a', 'c', 's'])

The results are not what you would expect from an AND or OR
operation, from the mathematical point of view! aparently the "and"
operation is returning the the second set, and the "or" operation is
returning the first.
the semantic of 'and' and 'or' operators in Python is well defined and
works the same for all types AFAIK.
If python developers wanted these operations to reflect the
traditional (Python) truth value for data structures: False for empty
data structures and True otherwise, why not return simply True or
False?

So My question is: Why has this been implemented in this way?
Because Python long lived without the 'bool' type - considering None,
numeric zero, empty string and empty containers as false (ie :
'nothing', and anything else as true (ie : 'something').
I can
see this confusing many newbies...
Yes, and this has been one of the arguments against the introduction of
the bool type. Changing this behaviour would have break lot of existing
code, and indeed, not changing it makes things confusing.

OTHO - and while I agree that there may be cases of useless complexities
in Python -, stripping a language from anything that might confuse a
newbie doesn't make great languages.
Aug 7 '07 #8