By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,258 Members | 1,258 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,258 IT Pros & Developers. It's quick & easy.

Why can't I xor strings?

P: n/a
I wrote a function to compare whether two strings are "similar" because
I'm using python to make a small text adventure engine and I want to it
to be responsive to slight mispellings, like "inevtory" and the like. To
save time the first thing my function does is check if I'm comparing an
empty vs. a non-empty string, because they are to be never considered
similar. Right now I have to write out the check like this:

if str1 and str2:
if not str1 or not str2:
return 0

Because python won't let me do str1 ^ str2. Why does this only work for
numbers? Just treat empty strings as 0 and nonempty as 1.

Regardless of whether this is the best implementation for detecting if
two strings are similar, I don't see why xor for strings shouldn't be
supported. Am I missing something? Inparticular, I think it'd be cool to
have "xor" as opposed to "^". The carrot would return the resulting
value, while "xor" would act like and/or do and return the one that was
true (if any).
Jul 18 '05 #1
Share this Question
Share on Google+
50 Replies


P: n/a
dataangel <k0*****@kzoo.edu> writes:
To save time the first thing my function does is check if
I'm comparing an empty vs. a non-empty string, because they are to be
never considered similar. Right now I have to write out the check like this:

if str1 and str2:
if not str1 or not str2:
return 0


How about:

def nonempty(s):
return (len(s) > 0)

and then

if nonempty(str1) != nonempty(str2):
return 0

Of course there's other ways to write the same thing. Python 2.3 has
a "bool" function which when called on a string, returns true iff the
string is nonempty.
Jul 18 '05 #2

P: n/a
Hi DataAngel,

Welcome to the Python community!

From reading your post, it sounds like you are wanting to use XOR
encryption because there are other ways to quickly and easily compare
strings to see if they are equal.

To compare a string, use the following:

name = "Steven"
name2 = "Steven"
print name == name2

The result will be "True." However, if my initial guess is correct that
you are wanting to use XOR for strings, it probably because you are
wanting a quick way of encrypting data so that basic users won't be able
to view the information. If this is what you are wanting, I might have
something for you...

Let me know,

Byron
---
dataangel wrote:
I wrote a function to compare whether two strings are "similar" because
I'm using python to make a small text adventure engine and I want to it
to be responsive to slight mispellings, like "inevtory" and the like. To
save time the first thing my function does is check if I'm comparing an
empty vs. a non-empty string, because they are to be never considered
similar. Right now I have to write out the check like this:

if str1 and str2:
if not str1 or not str2:
return 0

Because python won't let me do str1 ^ str2. Why does this only work for
numbers? Just treat empty strings as 0 and nonempty as 1.

Regardless of whether this is the best implementation for detecting if
two strings are similar, I don't see why xor for strings shouldn't be
supported. Am I missing something? Inparticular, I think it'd be cool to
have "xor" as opposed to "^". The carrot would return the resulting
value, while "xor" would act like and/or do and return the one that was
true (if any).

Jul 18 '05 #3

P: n/a
Hi DataAngel,

Welcome to the Python community!

From reading your post, it sounds like you are wanting to use XOR for
"encryption." The reason why I say this is because there are many other
ways of comparing the content of two strings to see if they are exactly
the same. An example of how to do such a thing is listed below:

# How to compare two strings to see if they are exact matches.
name = "Steven"
name2 = "Steven"
print name == name2
The result will be "True."

However, the only reason that one might want to use XOR for a string is
if he or she wanted to use XOR based encryption in order to keep data
semi-private.

Let me know,

Byron
---

dataangel wrote:
I wrote a function to compare whether two strings are "similar" because
I'm using python to make a small text adventure engine and I want to it
to be responsive to slight mispellings, like "inevtory" and the like. To
save time the first thing my function does is check if I'm comparing an
empty vs. a non-empty string, because they are to be never considered
similar. Right now I have to write out the check like this:

if str1 and str2:
if not str1 or not str2:
return 0

Because python won't let me do str1 ^ str2. Why does this only work for
numbers? Just treat empty strings as 0 and nonempty as 1.

Regardless of whether this is the best implementation for detecting if
two strings are similar, I don't see why xor for strings shouldn't be
supported. Am I missing something? Inparticular, I think it'd be cool to
have "xor" as opposed to "^". The carrot would return the resulting
value, while "xor" would act like and/or do and return the one that was
true (if any).

Jul 18 '05 #4

P: n/a
On 2004-10-08, Byron <De*********@netscape.net> wrote:
Hi DataAngel,

Welcome to the Python community!

From reading your post, it sounds like you are wanting to use XOR
encryption
No, he wants to do an exclusive or of the boolean values of the
strings.
The result will be "True." However, if my initial guess is correct that
you are wanting to use XOR for strings, it probably because you are
wanting a quick way of encrypting data so that basic users won't be able
to view the information.


No, he explained exactly what he was trying to do, and it had
nothing to do with encryption. He wants to know if exactly one
(1) of the strings is the empty string.

--
Grant Edwards grante Yow! I hope something GOOD
at came in the mail today so
visi.com I have a REASON to live!!
Jul 18 '05 #5

P: n/a
Grant Edwards wrote:
No, he explained exactly what he was trying to do, and it had
nothing to do with encryption. He wants to know if exactly one
(1) of the strings is the empty string.


Hi Grant,

Opps, that's what happens when on skims through a post a light speed...
lol :-)

Byron
---
Jul 18 '05 #6

P: n/a
Grant Edwards wrote:
No, he explained exactly what he was trying to do, and it had
nothing to do with encryption. He wants to know if exactly one
(1) of the strings is the empty string.

Hi Grant,

Opps, that's what happens when one skims through a post a light speed...
lol :-)

Byron
---
Jul 18 '05 #7

P: n/a
Grant Edwards wrote:
No, he explained exactly what he was trying to do, and it had
nothing to do with encryption. He wants to know if exactly one
(1) of the strings is the empty string.

Hi Grant,

Opps, that's what happens when one skims through a post at light
speed... lol :-)

Byron
---
Jul 18 '05 #8

P: n/a
Grant Edwards:
No, he explained exactly what he was trying to do, and it had
nothing to do with encryption. He wants to know if exactly one
(1) of the strings is the empty string.


BTW, another way to get that is

if bool(str1) + bool(str2) == 1:
print "one and only one of them was empty"
Andrew
da***@dalkescientific.com
Jul 18 '05 #9

P: n/a
Grant Edwards <grante <at> visi.com> writes:
No, he wants to do an exclusive or of the boolean values of the
strings.


I'm guessing the OP has already guessed this solution from the variety already
provided, but the direct translation of this statement would be:

bool(x) ^ bool(y)

for example:
bool("") ^ bool("a") True bool("") ^ bool("") False bool("a") ^ bool("a") False bool("a") ^ bool("")

True
Steve


Jul 18 '05 #10

P: n/a
On 2004-10-09, Steven Bethard <st************@gmail.com> wrote:
Grant Edwards <grante <at> visi.com> writes:
No, he wants to do an exclusive or of the boolean values of the
strings.


I'm guessing the OP has already guessed this solution from the
variety already provided, but the direct translation of this
statement would be:

bool(x) ^ bool(y)


:)

It dawned on me later that perhaps the jump from what I wrote
to the code you wrote wasn't as obvious as I thought.

--
Grant Edwards grante Yow!
at TAILFINS!!... click...
visi.com
Jul 18 '05 #11

P: n/a
On Fri, 08 Oct 2004 16:19:22 -0400, dataangel wrote:
Regardless of whether this is the best implementation for detecting if
two strings are similar, I don't see why xor for strings shouldn't be
supported. Am I missing something?


The basic problem is that there is no obvious "xor on string" operation
*in general*, even if you stipulate "bitwise". In particular, what does it
mean to bitwise xor two different length strings?

"In general" is the key, here. You actually don't have strings,
conceptually, you have "tokens" or "commands" or something that happen to
be strings. I recommend creating a class to match this concept by
subclassing str... or even just write it directly without subclassing
string. You can then make __xor__(self, other) for that class do whatever
makes sense with your concept.

http://docs.python.org/ref/numeric-types.html

I can almost guarantee that you will later find other things to put into
that class. It may very well clean up a lot of code.
Jul 18 '05 #12

P: n/a
On 2004-10-09, Jeremy Bowers <je**@jerf.org> wrote:
Regardless of whether this is the best implementation for
detecting if two strings are similar, I don't see why xor for
strings shouldn't be supported. Am I missing something?


The basic problem is that there is no obvious "xor on string" operation


Sure there is. Strings have a boolean value, and the xor
operation on boolean values is well-defined.

--
Grant Edwards grante Yow! Are you mentally here
at at Pizza Hut??
visi.com
Jul 18 '05 #13

P: n/a
Grant Edwards <gr****@visi.com> writes:
On 2004-10-09, Jeremy Bowers <je**@jerf.org> wrote:
Regardless of whether this is the best implementation for
detecting if two strings are similar, I don't see why xor for
strings shouldn't be supported. Am I missing something?


The basic problem is that there is no obvious "xor on string" operation


Sure there is. Strings have a boolean value, and the xor
operation on boolean values is well-defined.


That's an operation, but I'm not sure that's the obvious one. For my
part, if I saw "string1 ^ string2" I'd probably expect a byte by byte
xor with the result being a new string.

It doesn't feel natural to me to have my strings suddenly interpreted
as a new data type based on the operation at hand. Logical operators
work that way but not numerics (it would be in the same vein as string
+ number interpreting the string as a number - that way lies Perl :-))

-- David
Jul 18 '05 #14

P: n/a
On 2004-10-09, David Bolen <db**@fitlinxx.com> wrote:
The basic problem is that there is no obvious "xor on string"
operation
Sure there is. Strings have a boolean value, and the xor
operation on boolean values is well-defined.


That's an operation, but I'm not sure that's the obvious one.
For my part, if I saw "string1 ^ string2" I'd probably expect
a byte by byte xor with the result being a new string.


Only because Python lacks a logical xor operator, so you're
used to thinking of ^ as a bitwise operator. What if you saw

string1 xor string2?

Wouldn't you expect it to be equivalent to

(string1 and (not string2)) or ((not string1) and string2)
It doesn't feel natural to me to have my strings suddenly
interpreted as a new data type based on the operation at hand.
Logical operators work that way but not numerics
I don't know what you mean by that. Nobody seems to have a
problem with "and" "or" and "not" operators using the truth
values of strings. What is there about "xor" that precludes it
from behaving similarly?
(it would be in the same vein as string + number interpreting
the string as a number - that way lies Perl :-))


I don't see that at all.

--
Grant Edwards grante Yow! Let's go to CHURCH!
at
visi.com
Jul 18 '05 #15

P: n/a
David Bolen wrote:
Grant Edwards <gr****@visi.com> writes:
On 2004-10-09, Jeremy Bowers <je**@jerf.org> wrote:

The basic problem is that there is no obvious "xor on string" operation


Sure there is. Strings have a boolean value, and the xor
operation on boolean values is well-defined.


That's an operation, but I'm not sure that's the obvious one. For my
part, if I saw "string1 ^ string2" I'd probably expect a byte by byte
xor with the result being a new string.


.... but you'd get a traceback. ;) As pointed out earlier in this
thread, what works is "bool(string1) ^ bool(string2)", which
certainly doesn't violate the law of least astonishment.
Why would anything else be needed?
Jul 18 '05 #16

P: n/a
>>>>> "Grant" == Grant Edwards <gr****@visi.com> writes:

Grant> I don't know what you mean by that. Nobody seems to have a
Grant> problem with "and" "or" and "not" operators using the truth
Grant> values of strings. What is there about "xor" that
Grant> precludes it from behaving similarly?

It's just that logical xor is an extremely rare beast. I would
probably prefer to see the operation expanded to the more typical
and-or-nots in real code.

--
Ville Vainio http://tinyurl.com/2prnb
Jul 18 '05 #17

P: n/a
Ville Vainio <vi***@spammers.com> writes:
It's just that logical xor is an extremely rare beast. I would
probably prefer to see the operation expanded to the more typical
and-or-nots in real code.


I think != is appropriate in this situation.
if bool(x) != bool(y): ...
Jul 18 '05 #18

P: n/a
Ville Vainio wrote:
Grant> I don't know what you mean by that. Nobody seems to have a
Grant> problem with "and" "or" and "not" operators using the truth
Grant> values of strings. What is there about "xor" that
Grant> precludes it from behaving similarly?

It's just that logical xor is an extremely rare beast. I would
probably prefer to see the operation expanded to the more typical
and-or-nots in real code.


"imp" and "eqv", anyone?

</F>

Jul 18 '05 #19

P: n/a
Grant Edwards wrote:
What if you saw

string1 xor string2?

Wouldn't you expect it to be equivalent to

(string1 and (not string2)) or ((not string1) and string2)


I would expect it to give a syntax error.

If not, and 'xor' did become a new boolean operator
in Python I would expect it to act more like

xor_f(x, y)

where 'xor_f' the function is defined as

def xor_f(x, y):
x = bool(x)
y = bool(y)
return (x and not y) or (not x and y)
Why the distinction? In your code you call bool on
an object at least once and perhaps twice. The
truth of an object should only be checked once. You
also have asymmetric return values. Consider

s1 s2 s1 xor s2
"A" "B" False
"A" "" True
"" "B" "B"
"" "" False

Esthetics suggest that either "A"/"" return "A" or that
""/"B" return True. Mine does the latter. Yours does
neither. Probably the Pythonic way, assuming 'xor'
can be considered Pythonic, is to return the object
which gave the True value, like this

def xor_f(x, y):
bx = bool(x)
by = bool(y)
if bx:
if not by:
return bx
return False
else:
if by:
return by
return False

In any case, 'xor' the binary operator is rarely
needed and as you've shown can be interpreted in
a couple different ways. Each alone weighs against
it. Both together make it almost certainly a bad
idea for Python.
Andrew
da***@dalkescientific.com
Jul 18 '05 #20

P: n/a
Paul Rubin wrote:
I think != is appropriate in this situation.
if bool(x) != bool(y): ...


while I prefer my already mentioned

if bool(x) + bool(y) == 1:
print "One and only one given"

That's because the pure logic ones, both != and
xor, confuse me while arithmetic doesn't.

It's also extensible to, say, "all three
parameters are either empty strings or non-empty
strings".

if 0 < bool(x) + bool(y) + bool(z) < 3:
print "that's not allowed"

BTW, I've used an idiom like this with some
success

def blah(filename = None, infile = None, url = None):
if (bool(filename is None) + bool(infile is None) +
bool(url is None)) != 2:
raise TypeError("Must specify one and only one of "
"'filename', 'infile', or 'url'")
if filename is not None:
infile = open(filename)
elif url is not None:
infile = urllib.urlopen(url)
...
Try that with an xor.

Andrew
da***@dalkescientific.com
Jul 18 '05 #21

P: n/a
On 2004-10-10, Ville Vainio <vi***@spammers.com> wrote:
>> "Grant" == Grant Edwards <gr****@visi.com> writes:

Grant> I don't know what you mean by that. Nobody seems to have a
Grant> problem with "and" "or" and "not" operators using the truth
Grant> values of strings. What is there about "xor" that
Grant> precludes it from behaving similarly?

It's just that logical xor is an extremely rare beast.


Probably so, but that doesn't support the arguement that
there's something wrong with a logical xor argument coercing
it's operands to boolean values unless one also argues that
the logical "and", "or" and "not" operators should also not
coerce their operands to booleans.
I would probably prefer to see the operation expanded to the
more typical and-or-nots in real code.


If logical operators shouldn't coerce operands, then what you
should see is:

if (bool(string1) and (not bool(string2)) or ((not bool(string1)) and bool(string2))

--
Grant Edwards grante Yow! A shapely CATHOLIC
at SCHOOLGIRL is FIDGETING
visi.com inside my costume...
Jul 18 '05 #22

P: n/a
On 2004-10-10, Andrew Dalke <ad****@mindspring.com> wrote:
Why the distinction? In your code you call bool on
an object at least once and perhaps twice. The
truth of an object should only be checked once. You
also have asymmetric return values. Consider

s1 s2 s1 xor s2
"A" "B" False
"A" "" True
"" "B" "B"
"" "" False
Hey, _that_ particular bug isn't my fault. Somebody else
decided that "and" and "or" don't return boolean values like
they should. ;)
Esthetics suggest that either "A"/"" return "A" or that
""/"B" return True. Mine does the latter. Yours does
neither. Probably the Pythonic way, assuming 'xor'
can be considered Pythonic, is to return the object
which gave the True value, like this

def xor_f(x, y):
bx = bool(x)
by = bool(y)
if bx:
if not by:
return bx
return False
else:
if by:
return by
return False


That's ugly. But then again, "A" or "B" returning "A" is ugly.

While I agree with your points, they're immaterial to the
argument I was making. The poster to which I responded was
arguing that "xor" didn't make sense because having it coerce
it's arguments to booleans was "wrong".

--
Grant Edwards grante Yow! Zippy's brain cells
at are straining to bridge
visi.com synapses...
Jul 18 '05 #23

P: n/a
On Sun, 10 Oct 2004 15:30:32 +0000, Grant Edwards wrote:
While I agree with your points, they're immaterial to the
argument I was making. The poster to which I responded was
arguing that "xor" didn't make sense because having it coerce
it's arguments to booleans was "wrong".


I didn't say "wrong", I said "non-obvious".

And, with respect, the fact that you have to argue your point is evidence
in my favor. Of course "proof" would require a comprehensive survey, and I
think we can all agree this point isn't worth such effort :-)

But I do think that a bitwise operator should silently transform to a
logical operator for strings is not obvious.

What is 388488839405842L ^ "hello"?

Python says that's an "unsupported operand type(s) for ^: 'long' and
'str'". Why is it wrong?

This would also work if Python were more like C++ and we could define

xor(string, string)
xor(int, int)

and be done with it, but in Python, there should be an obvious meaning for
int ^ string, and there isn't.

It is also true that I recommended the OP consider subclassing string to
make ^ do what he wants. But it seems to be reasonably well expected that
while user classes can do what they like with operators (as long as they
are willing to pay the sometimes-subtle prices, especially the ones
involved with poorly defining __cmp__), that the core language should not
do such things.
Jul 18 '05 #24

P: n/a
On 2004-10-10, Jeremy Bowers <je**@jerf.org> wrote:
But I do think that a bitwise operator should silently
transform to a logical operator for strings is not obvious.

What is 388488839405842L ^ "hello"?


No, I wouldn't want that either. What I was talking about was
the possibility of an "xor" logical operator that would behave
in a manner similar to the other logical operators in that it
coerces it's parameters to boolean values.

--
Grant Edwards grante Yow! I am covered with
at pure vegetable oil and I am
visi.com writing a best seller!
Jul 18 '05 #25

P: n/a
On Sun, 10 Oct 2004 19:25:16 GMT, Jeremy Bowers <je**@jerf.org> wrote:
On Sun, 10 Oct 2004 15:30:32 +0000, Grant Edwards wrote:
While I agree with your points, they're immaterial to the
argument I was making. The poster to which I responded was
arguing that "xor" didn't make sense because having it coerce
it's arguments to booleans was "wrong".
I didn't say "wrong", I said "non-obvious".

And, with respect, the fact that you have to argue your point is evidence
in my favor. Of course "proof" would require a comprehensive survey, and I
think we can all agree this point isn't worth such effort :-)

But I do think that a bitwise operator should silently transform to a
logical operator for strings is not obvious.

What is 388488839405842L ^ "hello"?

Python says that's an "unsupported operand type(s) for ^: 'long' and
'str'". Why is it wrong?

I agree it is not wrong. The user is reminded that s/he should be explicit
and create a type that behaves as desired. However (;-) ...

What's right about accepting 3^7 ? Why should xor be defined for integers?
IMO we have implicit subtyping of integers as vectors or column matrices
of bools and operations element by element and implicit re-presentation
of the result as integer.

This dual interpretation of course reflects CPU hardware functionality, and
I would argue that familiarity (not to mention the expediency of conciseness)
has bred acceptance of operator notation without explicit cruft such as
int(boolvec(3)^boolvec(7)) instead of 3^7. The trouble is that a boolvec
should have a length, and boolvec(3) really hides implicit determination of length
(by dropping sign bits of unsigned value or extending signed integer sign bits infinitely),
and the ^ operation between boolvecs of different length hides implicit normalization to
the length of the longer (or infinity with compressed representation)). Again,
CPU hardware legacy comes into play, with the hidden imposition of length=32, sometimes
64 now, and we are forced (happily IMO) into defining what we mean in terms of
physical-representation-independent abstractions.

So the question becomes

What is boolvec(388488839405842L) ^ boolvec("hello")?

and boolvec("hello") is the more ambiguous one. If we wrote

boolvec("hello", as_bytes=True)

I think most would have an idea of what it meant -- until they
started to think about endianness ;-) I.e., what is the value of the following?

(boolvec("hello", asbytes=True) & boolvec(0xffff)).as_string() #note '&' for simpler example

should it be

"he\x00\x00\x00"

or do you see it as big-endian

"\x00\x00\x00lo"

and should "sign" bits be dropped, so the result would be "he" or "lo" ?
Or -- should boolvec("hello") default to boolvec(bool("hello"), length=1) ?

This would also work if Python were more like C++ and we could define

xor(string, string)
xor(int, int)

and be done with it, but in Python, there should be an obvious meaning for
int ^ string, and there isn't.
Notice that no one complains about intgr^intgr
not being defined as int(bool(intgr)^bool(intgr)) ?;-)

It is also true that I recommended the OP consider subclassing string to
make ^ do what he wants. But it seems to be reasonably well expected that
while user classes can do what they like with operators (as long as they
are willing to pay the sometimes-subtle prices, especially the ones
involved with poorly defining __cmp__), that the core language should not
do such things.

Except where there is an accepted legacy of such things being done already ;-)

BTW, what is the rationale behind this:
['',(),[],0, 0.0, 0L].count(0) 3 ['',(),[],0, 0.0, 0L].count(()) 1 ['',(),[],0, 0.0, 0L].count([]) 1 ['',(),[],0, 0.0, 0L].count('') 1 ['',(),[],0, 0.0, 0L].count(0.0) 3 ['',(),[],0, 0.0, 0L].count(0L)

3

BTW2, just had the thought that ',' could be generalized as an operator. E.g.,

obj, x

would mean

type(obj).__dict__['__comma__'](obj, x)

unless __comma__ was undefined. And you could have __rcomma__ for x, obj.
Returning the same type object would support multiple commas as in obj, x, y

Of course, you would get interesting effects in such contexts as print obj,x,y
and foo(obj,x,y) or even (obj,x,y) vs (obj,x,y,) vs ((obj,x,y),) ;-)

Probably sequence-building should have priority and be overridden by parenthesized
expression. So print obj,x would be effectively be print str(obj),x and
print (obj,x,y) would be print str(type(obj).__dict__['__comma__'](obj,x)).
Similarly for arg list formation.

Regards,
Bengt Richter
Jul 18 '05 #26

P: n/a
Grant Edwards <gr****@visi.com> writes:
Only because Python lacks a logical xor operator, so you're
used to thinking of ^ as a bitwise operator. What if you saw

string1 xor string2?

Wouldn't you expect it to be equivalent to

(string1 and (not string2)) or ((not string1) and string2)


Yes, no problem. I was definitely working with the bitwise operator
which is what I thought the OP was originally desiring.
It doesn't feel natural to me to have my strings suddenly
interpreted as a new data type based on the operation at hand.
Logical operators work that way but not numerics


I don't know what you mean by that. Nobody seems to have a
problem with "and" "or" and "not" operators using the truth
values of strings. What is there about "xor" that precludes it
from behaving similarly?


I think we're just talking about different items. I was referring to
the bitwise (^) xor operator (numeric), but not to a logical xor
operator. I'd have no problem with a separate logical operator that
behaved like the other logical operators.

The fact that saying "xor" can imply either isn't helping :-)

-- David
Jul 18 '05 #27

P: n/a
Grant Edwards <gr****@visi.com> writes:
While I agree with your points, they're immaterial to the
argument I was making. The poster to which I responded was
arguing that "xor" didn't make sense because having it coerce
it's arguments to booleans was "wrong".


Well, actually I said:

"Logical operators work that way but not numerics ..."

so I'm not arguing against a logical "xor" working that way, but not
the existing xor (^) operation in Python, which is bitwise.

-- David
Jul 18 '05 #28

P: n/a
On Sun, 10 Oct 2004 22:03:08 +0000, Bengt Richter wrote:
What's right about accepting 3^7 ? Why should xor be defined for integers?
IMO we have implicit subtyping of integers as vectors or column matrices
of bools and operations element by element and implicit re-presentation
of the result as integer.
I'm intrigued but torn by your arguments.

For positive numbers, a number really is its vector of bits. In the
mathematical sense of "equal" (which I usually express in English as "two
equal things are fully substitutable with each other in all relevant
contexts"), where all base numbers are written in base 10:

10 base 10 = 11 base 9 = 22 base 4 = 1010 base 2

The fact that it happens to really be a bit vector in the computer in this
case actually doesn't count for anything, because just as that bit vector
*is*, in every conceivable mathematical way, a base 10 number, it is also
a base 3 number. I can and have defined ternary math before, and
experimental computers have been built on ternary bases and ternary logic,
and 44 in a binary computer is 44 in a ternary computer.

Only our choice of negative numbers, practically speaking, makes a
difference between the number-as-bitstring and number-qua-number, and
something as high level as Python can conceptually use an extra "negative"
bit so even that distinction goes away. In fact, playing with xor'ing
longs makes me wonder if Python doesn't *already* do that.
What is boolvec(388488839405842L) ^ boolvec("hello")?
I'd rather see this as the fairly meaningless "388488839405842L base 2",
meaningless because the base of a number does not affect its value, and
bitwise-xor, as the name implies, operates on the number base two.

Since I reject the need to cast ^ in terms of boolvec, I don't feel
compelled to try to define "boolvec" for strings. Cast it into a number
explicitly, refusing the temptation to guess.
Except where there is an accepted legacy of such things being done already ;-)

BTW, what is the rationale behind this:
>>> ['',(),[],0, 0.0, 0L].count(0) 3 >>> ['',(),[],0, 0.0, 0L].count(()) 1 >>> ['',(),[],0, 0.0, 0L].count([]) 1 >>> ['',(),[],0, 0.0, 0L].count('') 1 >>> ['',(),[],0, 0.0, 0L].count(0.0) 3 >>> ['',(),[],0, 0.0, 0L].count(0L)

3


Again, considered abstractly as numbers, 0 is zero no matter how you slice
it. Abstractly, even floats have a binary representation, just as they
have a decimal representation, and we could even xor them. Realistically,
it is much less useful and there is the infinite-length problem of floats
to deal with.

Theoretically, no float should ever equal an int or a long, since an Int
or Long conceptually identifies exactly one point and a float should be
seen as representing a range of numbers depending on the precision that
can be made arbitrarily small but never truly identifies one point. This
is another one of those "practicality beats purity" things.
Jul 18 '05 #29

P: n/a
On Sun, 10 Oct 2004 22:24:57 GMT, Jeremy Bowers <je**@jerf.org> wrote:
On Sun, 10 Oct 2004 22:03:08 +0000, Bengt Richter wrote:
What's right about accepting 3^7 ? Why should xor be defined for integers?
IMO we have implicit subtyping of integers as vectors or column matrices
of bools and operations element by element and implicit re-presentation
of the result as integer.
I'm intrigued but torn by your arguments.

For positive numbers, a number really is its vector of bits. In the

^^^^^[1] [2]^^ ^^^[3] ^^^^[4]
[1] Abstract, non-negative integers?
[2] "is" as in "is identical with"? I doubt it.
[3] In what sense does an abstract number posess (its) a vector of bits?
I take it 'its' is in the sense of a 1:1 mapping to the corresponding
(unique) vector of "bits".
[4] A two's complement representation of an (abstract) integer, has "bits"
as _numerical_ values in the sum series with corresponding numerical
coefficients that are powers of two. But "bits" in the context of
bitwise ^ or & or | do not represent abstract _numerical_ values or 0 or 1,
they represent logical values False or True (and 0<->False, 1<->True is
just a convention that we're used to.

mathematical sense of "equal" (which I usually express in English as "two
equal things are fully substitutable with each other in all relevant
contexts"), where all base numbers are written in base 10:

10 base 10 = 11 base 9 = 22 base 4 = 1010 base 2

The fact that it happens to really be a bit vector in the computer in this [1]^^ [2]^^^^^^^^^ ^^^[3]
[1] The abstract integer value?
[2] have 1:1 mapping to?
[3] bit is short for binary _digit_, not binary logic-value -- i.e., it's numeric,
and ties in with your base 2 representation.case actually doesn't count for anything, because just as that bit vector
*is*, in every conceivable mathematical way, a base 10 number, it is also 1:1 correspondence is not identity. IOW, there's no such thing as "a base 10 number"
There is a set of abstract integers, and each can be put into 1:1 correpondence with
a set of symbols and rules. A "base 10 number" is composed with a set of digits [0-9]
and a rule for ordered composition and a rule for an interpretation to map back to
the abstract value. We can discuss each part of this process, but the bottom line is
that when we say that represented values are equal, we are saying that representations
map back to the same unique abstract value, not that representations per se are identical.
a base 3 number. I can and have defined ternary math before, and
experimental computers have been built on ternary bases and ternary logic,
and 44 in a binary computer is 44 in a ternary computer. Sure, but again we are talking unique abstract integers vs various representations
and the rules for composing them and interpreting them. BTW, did you ever hear
of bi-quinary? That was a 4-bit representation of a decimal digit where the msb
had a "coefficient" of 5 and the 3 lsb's counted in binary modulo 5 ;-)

Only our choice of negative numbers, practically speaking, makes a
difference between the number-as-bitstring and number-qua-number, and
something as high level as Python can conceptually use an extra "negative"
bit so even that distinction goes away. In fact, playing with xor'ing
longs makes me wonder if Python doesn't *already* do that. Yes, it plays the as-if-extended-infinitely-to-the-left game with sign bits,
but the hex representation of negative longs is not much good for seeing what
the bits are in a negative number >;-(
What is boolvec(388488839405842L) ^ boolvec("hello")?
I'd rather see this as the fairly meaningless "388488839405842L base 2",
meaningless because the base of a number does not affect its value, and
bitwise-xor, as the name implies, operates on the number base two.

Well, to be nit-picky, I still think it does not really operate on a number
at all. It converts a number to an ordered set of bools, by way of an intermediate
two's complement _representation_ whose _numeric_ bit values are mapped to bool values.
Then, after the operating with the ordered sets of bools, the resulting set is mapped
back to a two's complement integer representation again, which can be interpreted as
an abstract integer value again ;-)

Since I reject the need to cast ^ in terms of boolvec, I don't feel
compelled to try to define "boolvec" for strings. Cast it into a number
explicitly, refusing the temptation to guess.
Except where there is an accepted legacy of such things being done already ;-)

BTW, what is the rationale behind this:
>>> ['',(),[],0, 0.0, 0L].count(0) 3
>>> ['',(),[],0, 0.0, 0L].count(())

1
>>> ['',(),[],0, 0.0, 0L].count([])

1
>>> ['',(),[],0, 0.0, 0L].count('')

1
>>> ['',(),[],0, 0.0, 0L].count(0.0)

3
>>> ['',(),[],0, 0.0, 0L].count(0L)

3

I guess this shows what's happening
class joker(object): ... def __cmp__(self, other): return 0
... ['',(),[],0, 0.0, 0L, joker()].count('') 2 ['',(),[],0, 0.0, 0L, joker()].count(()) 2 ['',(),[],0, 0.0, 0L, joker()].count(0) 4 ['',(),[],0, 0.0, 0L, joker()].count(0.0) 4 ['',(),[],0, 0.0, 0L, joker()].count(0L) 4 ['',(),[],0, 0.0, 0L, joker()].count(joker()) 7
Again, considered abstractly as numbers, 0 is zero no matter how you slice
it. Abstractly, even floats have a binary representation, just as they I think of "float" as indicating representation technique, as opposed to "real"
which I think of as a point in the continuous abstract -+infinity interval
of real numbers.
have a decimal representation, and we could even xor them. Realistically,
it is much less useful and there is the infinite-length problem of floats
to deal with. Infinite-length problem...must...not...go...there... ;-)
Theoretically, no float should ever equal an int or a long, since an Int
or Long conceptually identifies exactly one point and a float should be
seen as representing a range of numbers depending on the precision that IMO a float should be seen as what it is (a _representation_), which may be
interpreted variously, just like integers ;-)

IOW, a typical float is a 64-bit IEEE-754 double, so there are 2**64
possible states of the representation machinery. Not all of
those states are legal floating point number representations, but each one
that _is_ legal corresponds to an _exact_ abstract real value. It _can_ be
the exact value you wanted to represent, or not, depending on your programming
problem. Inexactness in not a property of floating point representations, it is
a property of their _relation_ to the exact abstract values the programmer is trying
to represent. 0.0 represents the exact same abstract value as integer 0
(perhaps by way of mapping integers conceived as separate to a subset of reals).
can be made arbitrarily small but never truly identifies one point. This I disagree. 0.0 truly identifies one point. So does

3.141592653589793115997963468544185161590576171875

which is the _exact_ value of math.pi in decimal representation.
But this is _not_ the exact value of the abstract mathematical pi.

But that is not a problem with floating point numbers' not representing
exact values. It's that the set of available exact values is finite, and
you have to choose one to approximate (including right on sometimes) an abstract value.
(Or you might choose two to represent _exact_ bounds on a value, for interval math).
is another one of those "practicality beats purity" things.

I think the fact that repr(math.pi) is not printed as above is
a case of "practicality beats purity" ;-)
import math
repr(math.pi) '3.1415926535897931' float(repr(math.pi)) 3.1415926535897931 float(repr(math.pi)) == float(3.141592653589793115997963468544185161590576 171875)

True

IMO floats have their own kind of purity, besides being practical ;-)

Regards,
Bengt Richter
Jul 18 '05 #30

P: n/a
On Mon, 11 Oct 2004 02:19:03 +0000, Bengt Richter wrote:
On Sun, 10 Oct 2004 22:24:57 GMT, Jeremy Bowers <je**@jerf.org> wrote:
For positive numbers, a number really is its vector of bits. In the ^^^^^[1] [2]^^ ^^^[3] ^^^^[4]
[1] Abstract, non-negative integers?


Integers is more convenient here, but any one dimensional number will do.
It so happens that computers do not encode "reals" in the naive way, to
allow for more variation in magnitude.
[2] "is" as in "is identical with"? I doubt it.
With respect, this isn't something to doubt or not doubt. There is one,
and only one, way to represent any positive number in base two, since
encoding sign is not an issue. Assuming an extra bit to show sign, there
is one and only one way to represent any negative number, too. (Zero gets
to be the exception since then you can have positive and negative zero,
which are theoretically identical, but on some machines that implement
them have practical differences.)

Endianness is just some wackiness in the ordering, but does not really
change that. 60000 + 50000 = 110000 no matter what endianess you have.
[3] In what sense does an abstract number posess (its) a vector of bits?
It does not. It *is* a series of ones and zeros, in base two. It *is* a
series of ones, zeros, and twos, in base three. It *is* a series of [0-9]
in base ten. It is all of these things at once, and an infinite number of
other things besides.

1 + 1 fully *is* 2. There is no difference mathematically. If there is,
your math is broken.
[4] A two's complement representation of an (abstract) integer, has
"bits"
as _numerical_ values in the sum series with corresponding numerical
coefficients that are powers of two. But "bits" in the context of
bitwise ^ or & or | do not represent abstract _numerical_ values or
0 or 1, they represent logical values False or True (and 0<->False,
1<->True is just a convention that we're used to.
No. They represent both of those things at once. 1 *is* True, and 0 *is*
false, in the domain of bitwise operators.

There is no "either/or" here, only "both/and".
IOW, there's no such thing as "a base 10 number"... A "base 10 number" is composed with a set of
digits [0-9] and a rule for ordered composition and a rule for an
interpretation to map back to the abstract value.
Well, I thank you for supplying the missing definition, I suppose.
We can discuss each
part of this process, but the bottom line is that when we say that
represented values are equal, we are saying that representations map
back to the same unique abstract value, not that representations per se
are identical.
Yep, that is part of my point, see below.
What is boolvec(388488839405842L) ^ boolvec("hello")?


I'd rather see this as the fairly meaningless "388488839405842L base 2",
meaningless because the base of a number does not affect its value, and
bitwise-xor, as the name implies, operates on the number base two.

Well, to be nit-picky, I still think it does not really operate on a
number at all. It converts a number to an ordered set of bools, by way
of an intermediate two's complement _representation_ whose _numeric_ bit
values are mapped to bool values. Then, after the operating with the
ordered sets of bools, the resulting set is mapped back to a two's
complement integer representation again, which can be interpreted as an
abstract integer value again ;-)


In the domain of binary logic, I do not concede a difference between "1"
and "True", on the grounds that there is no way in the domain of binary
logic to distinguish the two.
IOW, a typical float is a 64-bit IEEE-754 double, so there are 2**64
possible states of the representation machinery. Not all of those states
are legal floating point number representations, but each one that _is_
legal corresponds to an _exact_ abstract real value. It _can_ be the
exact value you wanted to represent, or not, depending on your
programming problem. Inexactness in not a property of floating point
representations, it is a property of their _relation_ to the exact
abstract values the programmer is trying to represent. 0.0 represents
the exact same abstract value as integer 0 (perhaps by way of mapping
integers conceived as separate to a subset of reals).


This is a matter of definition. My problem with your definition mostly
lies in the fact that it is not as practically useful as mine. Someone
operating on the "floats are almost, but not quite, exact values" is
*much* less well equipped to handle the way floats bite you than someone
who operates with "floats represent a value and have an error range around
that value", because then they should understand they need to compute not
just with the values, but track the buildup of that error range as well.
It's a point-of-view thing.
can be made arbitrarily small but never truly identifies one point. This

I disagree. 0.0 truly identifies one point. So does

3.141592653589793115997963468544185161590576171875

which is the _exact_ value of math.pi in decimal representation. But
this is _not_ the exact value of the abstract mathematical pi.


Python 2.3.4 (#1, Sep 28 2004, 20:15:25)
[GCC 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
3.141592653589793115997963468544185161590576171875 8327502398752309 == 3.141592653589793115997963468544185161590576171875 00000000000

True

Look at the last digits. (Python is just a convenience here, this is
fundamental to all float representation schemes, as opposed to things like
Decimal.)

This is why I say they represent a range. In set theory, we'd call this an
equivalence class for equality. There is *no* way to distinguish between
3.141592653589793115997963468544185161590576171875 8327502398752309 and
3.141592653589793115997963468544185161590576171875 00000000000 with floats
(of the size Python uses, add more digits for some other bit count), not
even in theory. They are the same in every way. That's because the closest
float represents a range that covers them both. You can shrink that range
arbitrarily small but you can not get to a unique point, such that only
one true real will compare equal to that number and all others do not, not
even in theory.
Jul 18 '05 #31

P: n/a
On 2004-10-11, Jeremy Bowers <je**@jerf.org> wrote:
With respect, this isn't something to doubt or not doubt.
There is one, and only one, way to represent any positive
number in base two, since encoding sign is not an issue.
Assuming an extra bit to show sign, there is one and only one
way to represent any negative number, too.
That's news to me. I've used three different base-2
representations for negative numbers in the past week, and I
can think of at least one other one I've used in the past.
(Zero gets to be the exception since then you can have
positive and negative zero,


That depends on which base-2 representation you've chosen. In
two's compiliment and excess-N representations, there is only
one zero value. In signed-magnitude there may be two.

--
Grant Edwards grante Yow! I've been WRITING
at to SOPHIA LOREN every 45
visi.com MINUTES since JANUARY 1ST!!
Jul 18 '05 #32

P: n/a
Andrew Dalke <ad****@mindspring.com> wrote:
Grant Edwards:
No, he explained exactly what he was trying to do, and it had
nothing to do with encryption. He wants to know if exactly one
(1) of the strings is the empty string.


BTW, another way to get that is

if bool(str1) + bool(str2) == 1:
print "one and only one of them was empty"


If this is getting into a many-ways-to-skin-the-cat context, I want to
make sure I get my entry in:

[str1, str2].count('') == 1

Other possibilities (I think -- untested, and it's late here):

len([x for x in (str1, str2) if x]) == 1

'' == min(str1, str2) != max(str1, str2)

len(str1+str2) == max(len(str1),len(str2)) != 0

sum(map(bool, (str1, str2))) == 1

not str1 ^ not str2

(str1 or str2) and not min(str1, str2)

not str1 != not str2

str1 != str2 and (str1+str2) == (str2+str1)

I guess I'd better stop here because I can't actually _prove_ the last
one of these actually implies one of the strings is empty but I can't
find counterexamples either...!-)
Alex
Jul 18 '05 #33

P: n/a
Jeremy Bowers <je**@jerf.org> wrote:
On Fri, 08 Oct 2004 16:19:22 -0400, dataangel wrote:
Regardless of whether this is the best implementation for detecting if
two strings are similar, I don't see why xor for strings shouldn't be
supported. Am I missing something?


The basic problem is that there is no obvious "xor on string" operation
*in general*, even if you stipulate "bitwise". In particular, what does it
mean to bitwise xor two different length strings?


What I, personally, would expect here, would be to leave the trailing
part of the longer string "untouched". I.e., just like:

def bitwisexor(s1, s2):
result = [ chr(ord(a) ^ ord(b)) for a, b in zip(s1, s2) ]
# at most one of the following 2 stmts actually does anything:
result.extend(s1[len(result):])
result.extend(s2[len(result):])

return ''.join(result)

I realize this is the same as imagining the shorter string to be
conceptually padded with as many '\x0' characters as needed -- that just
seems the only "intuitively natural" way to extend "sequence xor'ing" to
sequences of different length...
Alex
Jul 18 '05 #34

P: n/a
Grant Edwards <gr****@visi.com> wrote:
...
used to thinking of ^ as a bitwise operator. What if you saw

string1 xor string2?

Wouldn't you expect it to be equivalent to

(string1 and (not string2)) or ((not string1) and string2)


Personally, I would not expect 'xor' to be non-commutative, and yet the
expression YOU apparently would expect it to be equivalent to obviousy
is...:

'foo' xor ''

would be True, while

'' xor 'foo'

would be 'foo'. At least, that's how the "equivalent to" expression you
give evaluates, I believe. The problem is that "not x" is a bool for
any x, while "x and y" isn't.

If I had to design the least-astonishing definition of xor in Python, I
would probably have to settle for a mix of the type-producing behavior
of and/or (always return one of the two operands, a wonderful thing) vs
not (always return a bool, and it's hard to imagine it doing otherwise).
"a xor b" would return one of the two operands if exactly one of them
evaluates to true in a boolean context, but False if both or neither do
(sigh).
E.g.,

def xor(a, b):
if (a and b) or (not a and not b): return False
else: return a or b

Unfortunately, it seems to me that his unholy admixture of return types
(hard to avoid...), as well as the inevitable lack of short-circuiting,
makes such a hypothetical 'xor' _WAY_ less useful than the existing
'and'/'or' operators are:-(
Alex
Jul 18 '05 #35

P: n/a
On Mon, 11 Oct 2004 04:08:49 GMT, Jeremy Bowers <je**@jerf.org> wrote:
On Mon, 11 Oct 2004 02:19:03 +0000, Bengt Richter wrote:
On Sun, 10 Oct 2004 22:24:57 GMT, Jeremy Bowers <je**@jerf.org> wrote:
For positive numbers, a number really is its vector of bits. In the ^^^^^[1] [2]^^ ^^^[3] ^^^^[4]
[1] Abstract, non-negative integers?


Integers is more convenient here, but any one dimensional number will do.
It so happens that computers do not encode "reals" in the naive way, to
allow for more variation in magnitude.

True. But my point (just about everywhere in the post) was to draw attention
to the distinction between abstract entities (e.g. numbers) and concrete representations.
[2] "is" as in "is identical with"? I doubt it.
With respect, this isn't something to doubt or not doubt. There is one,
and only one, way to represent any positive number in base two, since

I think we are focusing on different things. When you say "a number really is
its vector of bits" I think wait: a number is a number and a vector of bits
is a vector of bits, but you can't say one _is_ the other. You can say they
correspond in some way. When you say of a number "its vector of bits" I presume
that "its" refers to "its" partner in the number<->bit vector relationship,
not in the sense that an aggregate has its components. To me, a (one-dimensional)
number is a primitive entity, and "has" no parts, though it may correspond to
a representation that does have parts and order etc.

I am hard put to think of anything that is _identical_ with any of the
possible representations that it may be put in correspondence with.
encoding sign is not an issue. Assuming an extra bit to show sign, there
is one and only one way to represent any negative number, too. (Zero gets
to be the exception since then you can have positive and negative zero,
which are theoretically identical, but on some machines that implement
them have practical differences.)

Endianness is just some wackiness in the ordering, but does not really
change that. 60000 + 50000 = 110000 no matter what endianess you have. Wackiness must be accounted for ;-)
[3] In what sense does an abstract number posess (its) a vector of bits?
It does not. It *is* a series of ones and zeros, in base two. It *is* a
series of ones, zeros, and twos, in base three. It *is* a series of [0-9]
in base ten. It is all of these things at once, and an infinite number of
other things besides.

Depends on what the meaning of "is" is ;-) Your "is" seems to mean
"can be represented by" in my terms. That's not "is" in my book ;-)

1 + 1 fully *is* 2. There is no difference mathematically. If there is,
your math is broken. You are apparently ignoring representation and just thinking in terms of
what is indicated. So yes, '1 + 1' represents something and '2' represents
something, and for some getabstractvalue, getabstractvalue('1 + 1') _is_
getabstractvalue('2'). But whatever your representation, there is that
function that maps to the abstract mathematical value.
[4] A two's complement representation of an (abstract) integer, has
"bits"
as _numerical_ values in the sum series with corresponding numerical
coefficients that are powers of two. But "bits" in the context of
bitwise ^ or & or | do not represent abstract _numerical_ values or
0 or 1, they represent logical values False or True (and 0<->False,
1<->True is just a convention that we're used to.
No. They represent both of those things at once. 1 *is* True, and 0 *is*
false, in the domain of bitwise operators.

ISTM we have different concepts of "is" ;-)
There is no "either/or" here, only "both/and".
IOW, there's no such thing as "a base 10 number"... A "base 10 number" is composed with a set of
digits [0-9] and a rule for ordered composition and a rule for an
interpretation to map back to the abstract value.
Well, I thank you for supplying the missing definition, I suppose.

Well, I made up an ad hoc definition to try to express my thinking on it ;-)
We can discuss each
part of this process, but the bottom line is that when we say that
represented values are equal, we are saying that representations map
back to the same unique abstract value, not that representations per se
are identical.


Yep, that is part of my point, see below.
What is boolvec(388488839405842L) ^ boolvec("hello")?

I'd rather see this as the fairly meaningless "388488839405842L base 2",
meaningless because the base of a number does not affect its value, and
bitwise-xor, as the name implies, operates on the number base two.

Well, to be nit-picky, I still think it does not really operate on a
number at all. It converts a number to an ordered set of bools, by way
of an intermediate two's complement _representation_ whose _numeric_ bit
values are mapped to bool values. Then, after the operating with the
ordered sets of bools, the resulting set is mapped back to a two's
complement integer representation again, which can be interpreted as an
abstract integer value again ;-)


In the domain of binary logic, I do not concede a difference between "1"
and "True", on the grounds that there is no way in the domain of binary
logic to distinguish the two.

It's a matter of interpretation. I view 1 and True as different, sort of like
type(1), type(bool(1))
(<type 'int'>, <type 'bool'>)
IOW, a typical float is a 64-bit IEEE-754 double, so there are 2**64
possible states of the representation machinery. Not all of those states
are legal floating point number representations, but each one that _is_
legal corresponds to an _exact_ abstract real value. It _can_ be the
exact value you wanted to represent, or not, depending on your
programming problem. Inexactness in not a property of floating point
representations, it is a property of their _relation_ to the exact
abstract values the programmer is trying to represent. 0.0 represents
the exact same abstract value as integer 0 (perhaps by way of mapping
integers conceived as separate to a subset of reals).
This is a matter of definition. My problem with your definition mostly
lies in the fact that it is not as practically useful as mine. Someone
operating on the "floats are almost, but not quite, exact values" is

I didn't say that. I said they do represent _exact_ values, and that you
can use the available exact values as you like, including intepreting them
as tokens representing an interval on the real axis containing all the values
that will round to the exact value in question.
*much* less well equipped to handle the way floats bite you than someone
who operates with "floats represent a value and have an error range around
that value", because then they should understand they need to compute not
just with the values, but track the buildup of that error range as well.
It's a point-of-view thing.
can be made arbitrarily small but never truly identifies one point. This I disagree. 0.0 truly identifies one point. So does

3.141592653589793115997963468544185161590576171875

which is the _exact_ value of math.pi in decimal representation. But
this is _not_ the exact value of the abstract mathematical pi.


Python 2.3.4 (#1, Sep 28 2004, 20:15:25)
[GCC 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
3.141592653589793115997963468544185161590576171875 8327502398752309 == 3.141592653589793115997963468544185161590576171875 00000000000

True

Two different numbers were rounded to the same exact number ;-)
Look at the last digits. (Python is just a convenience here, this is
fundamental to all float representation schemes, as opposed to things like
Decimal.)

This is why I say they represent a range. In set theory, we'd call this an I agree they can be used to _represent_ a range. The range (actually a half-open
continuous interval I think) will be defined in terms of the _exact_ value
and the neighboring _exact_ values available in the set of values with concrete
representations in the particular hardware.
equivalence class for equality. There is *no* way to distinguish between
3.14159265358979311599796346854418516159057617187 58327502398752309 and
3.14159265358979311599796346854418516159057617187 500000000000 with floats
(of the size Python uses, add more digits for some other bit count), not
even in theory. They are the same in every way. That's because the closest
float represents a range that covers them both. You can shrink that range Now tell me _exactly_ what that "range" is. I know it's a messy algorithm,
but the intevals are _exactly_ defined, so where will the _exact_ mathematical
values of the limits come from? ;-) (we'll ignore fpu rounding modes ;-)arbitrarily small but you can not get to a unique point, such that only
one true real will compare equal to that number and all others do not, not
even in theory.

Well, yes, we can only create finite sets of representations ;-)
But you can "get to a unique point" by switching interpretation. I.e.,
if you take the floating point number's exact value as such, instead of
using that value and neighbors as a means to define an interval, then
you do have a unique point. E.g., all the exact values of a 53-bit integer
can be represented _exactly_ by a typical 64-bit double. The floating point
representation is no less exact than the corresponding int or long.

Regards,
Bengt Richter
Jul 18 '05 #36

P: n/a
al*****@yahoo.com (Alex Martelli) writes:
Grant Edwards:
> No, he explained exactly what he was trying to do, and it had
> nothing to do with encryption. He wants to know if exactly one
> (1) of the strings is the empty string. [...]
not str1 ^ not str2 [...] not str1 != not str2
These give syntax errors, actually. I wouldn't have expected that
either, though :)

str1 != str2 and (str1+str2) == (str2+str1)


Counter example: str1 = "x"; str2 = "xx"
str1 != str2 and (str1+str2) == (str2+str1)

True
Bernhard
--
Intevation GmbH http://intevation.de/
Skencil http://sketch.sourceforge.net/
Thuban http://thuban.intevation.org/
Jul 18 '05 #37

P: n/a
Bernhard Herzog <bh@intevation.de> wrote:
...
not str1 ^ not str2
not str1 != not str2


These give syntax errors, actually. I wouldn't have expected that
either, though :)


Yep, you need parentheses, as in (not str1) &c.
str1 != str2 and (str1+str2) == (str2+str1)


Counter example:
str1 = "x"; str2 = "xx"


Yep, excellent counterexample, thanks. So let's try instead:

(str1+str2==str2!=str1) or (str2+str1==str1!=str2)

I _do_ really want something based on concatenation equaling one of the
strings iff the other's empty...
Alex
Jul 18 '05 #38

P: n/a
Grant Edwards <gr****@visi.com> wrote:
...
Probably so, but that doesn't support the arguement that
there's something wrong with a logical xor argument coercing
it's operands to boolean values unless one also argues that
the logical "and", "or" and "not" operators should also not
coerce their operands to booleans.


One of the greatest sources of usefulness for 'and' and 'or' is exactly
that these operators _don't_ coerce their operands -- they always return
one of the objects that are given as their operands, never any
'coercion' of it. This lets one code, e.g.,

k, v = (onedict or another).popitem()

instead of

if onedict:
k, v = onedict.popitem()
else:
k, v = another.popitem()

Operator 'xor' couldn't possibly ensure this useful behavior, alas.
Alex
Jul 18 '05 #39

P: n/a
On 2004-10-11, Alex Martelli <al*****@yahoo.com> wrote:
Grant Edwards <gr****@visi.com> wrote:
...
Probably so, but that doesn't support the arguement that
there's something wrong with a logical xor argument coercing
it's operands to boolean values unless one also argues that
the logical "and", "or" and "not" operators should also not
coerce their operands to booleans.


One of the greatest sources of usefulness for 'and' and 'or' is exactly
that these operators _don't_ coerce their operands


Surely they must coerce the operands for the "test" part if not
for the "return" part.

--
Grant Edwards grante Yow! Being a BALD HERO
at is almost as FESTIVE as a
visi.com TATTOOED KNOCKWURST.
Jul 18 '05 #40

P: n/a
Grant Edwards <gr****@visi.com> wrote:
On 2004-10-11, Alex Martelli <al*****@yahoo.com> wrote:
Grant Edwards <gr****@visi.com> wrote:
...
Probably so, but that doesn't support the arguement that
there's something wrong with a logical xor argument coercing
it's operands to boolean values unless one also argues that
the logical "and", "or" and "not" operators should also not
coerce their operands to booleans.


One of the greatest sources of usefulness for 'and' and 'or' is exactly
that these operators _don't_ coerce their operands


Surely they must coerce the operands for the "test" part if not
for the "return" part.


We may have different conceptions about the meaning of the word
"coerce", perhaps...? For example, Python may be doing len(x) when I
use x in a boolean context, but I sure don't see computing len(x) as
"coercing" anything. To me, Coercion, in Python, is what's documented
at <http://docs.python.org/ref/coercion-rules.html> and is quite a
different concept from Truth Value Testing, which is documented at
<http://docs.python.org/lib/truth.html>.
Alex
Jul 18 '05 #41

P: n/a
On 2004-10-11, Alex Martelli <al*****@yahoo.com> wrote:
Surely they must coerce the operands for the "test" part if not
for the "return" part.
We may have different conceptions about the meaning of the word
"coerce", perhaps...? For example, Python may be doing len(x) when I
use x in a boolean context,


I guess I was assuming Python was doing the equivalent of
bool(x) when x was used as an operand of a logical operator.
That's coercion to me.
but I sure don't see computing len(x) as "coercing" anything.
To me, Coercion, in Python, is what's documented at
<http://docs.python.org/ref/coercion-rules.html> and is quite
a different concept from Truth Value Testing, which is
documented at <http://docs.python.org/lib/truth.html>.


I guess I'm misusing the term coercion (at least in a Python
context). To _me_ operands of logical operators are being
coerced to boolean.

--
Grant Edwards grante Yow! I'm gliding over a
at NUCLEAR WASTE DUMP near
visi.com ATLANTA, Georgia!!
Jul 18 '05 #42

P: n/a
Grant Edwards <gr****@visi.com> wrote:
On 2004-10-11, Alex Martelli <al*****@yahoo.com> wrote:
Surely they must coerce the operands for the "test" part if not
for the "return" part.


We may have different conceptions about the meaning of the word
"coerce", perhaps...? For example, Python may be doing len(x) when I
use x in a boolean context,


I guess I was assuming Python was doing the equivalent of
bool(x) when x was used as an operand of a logical operator.
That's coercion to me.


Well, Python is doing just the same thing it was doing before 'bool' was
introduced (in 2.2.1, I think) -- Truth Value Testing. I believe a very
similar transition happened, for example, in the C++ language: any
number or pointer was and still is acceptable in a "condition" context
(e.g., for C, 'if(x) ...'). That used to be quite different from
coercions. Then a 'bool' type was introduced -- the language still kept
doing exactly the same thing for 'if(x)' and the like, but even though
nothing was changed, it ``feels'' taxonomically different (in C++ the
case is stronger, because a new 'operator bool' was also introduced; and
with it new and important pitfalls, cfr
<http://www.artima.com/cppsource/safebool.html> ).

but I sure don't see computing len(x) as "coercing" anything.
To me, Coercion, in Python, is what's documented at
<http://docs.python.org/ref/coercion-rules.html> and is quite
a different concept from Truth Value Testing, which is
documented at <http://docs.python.org/lib/truth.html>.


I guess I'm misusing the term coercion (at least in a Python
context). To _me_ operands of logical operators are being
coerced to boolean.


Didactically, I still prefer to say they're "being truth-value tested",
because the concept of a hypothetical coercion which happens during the
testing then "un-happens" to determine the result is just impossible to
get across -- even if I thought it somehow more correct or preferable,
which I don't. Note, again, that in C++ the case to consider
truth-value testing as coercion is stronger, because && and ||
(shortcircuiting logical operators) _DO_ undertake to always return a
bool, when applied to C++'s built-in types. I consider this a
suboptimal decision, prompting people to code 'x?x:y' or 'x?y:x' where
they might have coded, more simply, x||y or x&&y respectively, were it
not for the coercion which the latter forms imply.

BTW, if you consider a form such as 'x?x:y', which is a better match for
the semantics of Python's 'x or y' than C++'s || operator, you will see
why speaking of operandS (plural) as being 'coerced' is surely
misleading. Under _no_ conceivable circumstances is the RHS operand
subject to any operation whatsoever by 'x or y'. Whatever "coercion"
is, it definitely cannot be happening here on y. Similarly for the
analogous 'x and y' -- never is any operation at all done on y.

Truth Value Testing (which you want to call coercion) happens on the
first (LHS) operand, but not on the second (RHS) one, ever...
Alex
Jul 18 '05 #43

P: n/a
On Mon, 11 Oct 2004 04:24:28 +0000, Grant Edwards wrote:
On 2004-10-11, Jeremy Bowers <je**@jerf.org> wrote:
With respect, this isn't something to doubt or not doubt.
There is one, and only one, way to represent any positive
number in base two, since encoding sign is not an issue.
Assuming an extra bit to show sign, there is one and only one
way to represent any negative number, too.


That's news to me. I've used three different base-2
representations for negative numbers in the past week, and I
can think of at least one other one I've used in the past.


I am aware of only one encoding that uses a single bit to represent sign,
as I stipulated, and discarding endianness issues I'm having a hard time
imagining what reasonable alternatives there are.
(Zero gets to be the exception since then you can have
positive and negative zero,


That depends on which base-2 representation you've chosen. In
two's compiliment and excess-N representations, there is only
one zero value. In signed-magnitude there may be two.


I explicitly only discussed signed-magnitude: "Assuming an extra bit to
show sign".

Normally I'd exhort you to read more carefully but Bengt pointed out some
places I wasn't rigorous with my terms. :-) So I earned this.
Jul 18 '05 #44

P: n/a
On 2004-10-11, Jeremy Bowers <je**@jerf.org> wrote:
With respect, this isn't something to doubt or not doubt.
There is one, and only one, way to represent any positive
number in base two, since encoding sign is not an issue.
Assuming an extra bit to show sign, there is one and only one
way to represent any negative number, too.


That's news to me. I've used three different base-2
representations for negative numbers in the past week, and I
can think of at least one other one I've used in the past.


I am aware of only one encoding that uses a single bit to
represent sign, as I stipulated, and discarding endianness
issues I'm having a hard time imagining what reasonable
alternatives there are.


* Two's compliment.
* One's compliment.
* Signed-magnitude with a "1" sign bit being positive.
* Signed-magnitude with a "1" sign bit being negative.
* Excess-N notation.

Four of the five are in use in the software I'm working on
today.
(Zero gets to be the exception since then you can have
positive and negative zero,


That depends on which base-2 representation you've chosen. In
two's compiliment and excess-N representations, there is only
one zero value. In signed-magnitude there may be two.


I explicitly only discussed signed-magnitude: "Assuming an
extra bit to show sign".


I don't know what you mean. Two's compliment, one's
compliment, signed-magnitude, and excess-N _all_ use a single
bit for sign.

--
Grant Edwards grante Yow! Kids, don't gross me
at off... "Adventures with
visi.com MENTAL HYGIENE" can be
carried too FAR!
Jul 18 '05 #45

P: n/a
On Mon, 11 Oct 2004 19:35:55 +0000, Grant Edwards wrote:
I don't know what you mean. Two's compliment, one's
compliment, signed-magnitude, and excess-N _all_ use a single
bit for sign.


I meant a single bit exclusively for sign, such that

1 00001001

and

0 00001001

are the same absolute value. The complements don't work that way. I
guess that does leave a question for whether 1 is positive or negative,
but for the purposes of my point, that doesn't matter much.

Jul 18 '05 #46

P: n/a
On 2004-10-11, Jeremy Bowers <je**@jerf.org> wrote:
On Mon, 11 Oct 2004 19:35:55 +0000, Grant Edwards wrote:
I don't know what you mean. Two's compliment, one's
compliment, signed-magnitude, and excess-N _all_ use a single
bit for sign.


I meant a single bit exclusively for sign, such that

1 00001001

and

0 00001001

are the same absolute value. The complements don't work that
way. I guess that does leave a question for whether 1 is
positive or negative, but for the purposes of my point, that
doesn't matter much.


OK, if you meant signed-magnitude, you should have said that. ;)

AFAICT, the only people that use signed-magnitude for _integer_
quantities [IEEE FP representations are signed-magnitude] are
the analog guys who design A/D converters [and somebody needs
to slap them and tell them that out in the real world we want
2's compliment].

I haven't _ever_ seen a signed-magnitude integer ALU, but I'm
sure some existed back before I started doing this sort of
stuff 25 years ago.

--
Grant Edwards grante Yow! My TOYOTA is built
at like a... BAGEL with CREAM
visi.com CHEESE!!
Jul 18 '05 #47

P: n/a
On Mon, 11 Oct 2004 22:25:26 +0000, Grant Edwards wrote:
OK, if you meant signed-magnitude, you should have said that. ;)

AFAICT, the only people that use signed-magnitude for _integer_
quantities [IEEE FP representations are signed-magnitude] are


I never said they existed :-)
Jul 18 '05 #48

P: n/a
Grant Edwards wrote:
One of the greatest sources of usefulness for 'and' and 'or' is exactly
that these operators _don't_ coerce their operands

Surely they must coerce the operands for the "test" part if not
for the "return" part.


Is calling a method cosidered coercing? The "test" part must call the
object's __nonzero__ method but no Python boolean object is ever created.

Cheers,
Brian

Jul 18 '05 #49

P: n/a
I suggest you have a look at the following:
http://aspn.activestate.com/ASPN/Coo.../Recipe/252237

I believe it does what you want (and has little to do with the
follow-up discussion so far!)

Andre Roberge

dataangel <k0*****@kzoo.edu> wrote in message news:<ma**************************************@pyt hon.org>...
I wrote a function to compare whether two strings are "similar" because
I'm using python to make a small text adventure engine and I want to it
to be responsive to slight mispellings, like "inevtory" and the like. To
save time the first thing my function does is check if I'm comparing an
empty vs. a non-empty string, because they are to be never considered
similar. Right now I have to write out the check like this:

if str1 and str2:
if not str1 or not str2:
return 0

Because python won't let me do str1 ^ str2. Why does this only work for
numbers? Just treat empty strings as 0 and nonempty as 1.

Regardless of whether this is the best implementation for detecting if
two strings are similar, I don't see why xor for strings shouldn't be
supported. Am I missing something? Inparticular, I think it'd be cool to
have "xor" as opposed to "^". The carrot would return the resulting
value, while "xor" would act like and/or do and return the one that was
true (if any).

Jul 18 '05 #50

50 Replies

This discussion thread is closed

Replies have been disabled for this discussion.