By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,806 Members | 1,335 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,806 IT Pros & Developers. It's quick & easy.

python string comparison oddity

P: n/a

Hi everybody,

I was wondering if anyone can explain this. My understanding is that 'is'
checks if the object is the same. However, in that case, why this
inconsistency for short strings? I would expect a 'False' for all three
comparisons. This is reproducible across two different machines, so it is
not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
default).
Thanks, Faheem.

In [1]: a = '--'

In [2]: a is '--'
Out[2]: False

In [4]: a = '-'

In [5]: a is '-'
Out[5]: True

In [6]: a = 'foo'

In [7]: a is 'foo'
Out[7]: True
Jun 27 '08 #1
Share this Question
Share on Google+
6 Replies

P: n/a
Lie
On Jun 19, 2:26*am, Faheem Mitha <fah...@email.unc.eduwrote:
Hi everybody,

I was wondering if anyone can explain this. My understanding is that 'is'
checks if the object is the same. However, in that case, why this
inconsistency for short strings? I would expect a 'False' for all three
comparisons. This is reproducible across two different machines, so it is
not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
default).
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Thanks, Faheem.

In [1]: a = '--'

In [2]: a is '--'
Out[2]: False

In [4]: a = '-'

In [5]: a is '-'
Out[5]: True

In [6]: a = 'foo'

In [7]: a is 'foo'
Out[7]: True
Yes, this happens because of small objects caching. When small
integers or short strings are created, there are possibility that they
might refer to the same objects behind-the-scene. Don't rely on this
behavior.
Jun 27 '08 #2

P: n/a
On Wed, 18 Jun 2008 12:57:44 -0700 (PDT), Lie <Li******@gmail.comwrote:
On Jun 19, 2:26*am, Faheem Mitha <fah...@email.unc.eduwrote:
>Hi everybody,

I was wondering if anyone can explain this. My understanding is that 'is'
checks if the object is the same. However, in that case, why this
inconsistency for short strings? I would expect a 'False' for all three
comparisons. This is reproducible across two different machines, so it is
not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
default).
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Thanks, Faheem.

In [1]: a = '--'

In [2]: a is '--'
Out[2]: False

In [4]: a = '-'

In [5]: a is '-'
Out[5]: True

In [6]: a = 'foo'

In [7]: a is 'foo'
Out[7]: True

Yes, this happens because of small objects caching. When small
integers or short strings are created, there are possibility that they
might refer to the same objects behind-the-scene. Don't rely on this
behavior.
Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
the basis of the choice is?
Faheem.
Jun 27 '08 #3

P: n/a
Faheem Mitha wrote:
On Wed, 18 Jun 2008 12:57:44 -0700 (PDT), Lie <Li******@gmail.comwrote:
>On Jun 19, 2:26 am, Faheem Mitha <fah...@email.unc.eduwrote:
>>Hi everybody,

I was wondering if anyone can explain this. My understanding is that 'is'
checks if the object is the same. However, in that case, why this
inconsistency for short strings? I would expect a 'False' for all three
comparisons. This is reproducible across two different machines, so it is
not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
default).
Thanks, Faheem.

In [1]: a = '--'

In [2]: a is '--'
Out[2]: False

In [4]: a = '-'

In [5]: a is '-'
Out[5]: True

In [6]: a = 'foo'

In [7]: a is 'foo'
Out[7]: True
Yes, this happens because of small objects caching. When small
integers or short strings are created, there are possibility that they
might refer to the same objects behind-the-scene. Don't rely on this
behavior.

Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
the basis of the choice is?
Shortish Python identifiers and operators, I think. Plus a handful like '\x00'.
The source would know for sure, but alas, I am lazy.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Jun 27 '08 #4

P: n/a
Lie
On Jun 19, 5:13*am, Faheem Mitha <fah...@email.unc.eduwrote:
On Wed, 18 Jun 2008 12:57:44 -0700 (PDT), Lie <Lie.1...@gmail.comwrote:
On Jun 19, 2:26*am, Faheem Mitha <fah...@email.unc.eduwrote:
Hi everybody,
I was wondering if anyone can explain this. My understanding is that 'is'
checks if the object is the same. However, in that case, why this
inconsistency for short strings? I would expect a 'False' for all three
comparisons. This is reproducible across two different machines, so itis
not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
default).
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Thanks, Faheem.
In [1]: a = '--'
In [2]: a is '--'
Out[2]: False
In [4]: a = '-'
In [5]: a is '-'
Out[5]: True
In [6]: a = 'foo'
In [7]: a is 'foo'
Out[7]: True
Yes, this happens because of small objects caching. When small
integers or short strings are created, there are possibility that they
might refer to the same objects behind-the-scene. Don't rely on this
behavior.

Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
the basis of the choice is?
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *Faheem.
Yes, but we're already warned not to rely on it since the basis of
what may be cached and what-not might be arbitrary. Personally, I'd
not delve deeply into them, they aren't a reliable behavior.
Jun 27 '08 #5

P: n/a
Faheem Mitha <fa****@email.unc.eduwrote:
>>In [1]: a = '--'

In [2]: a is '--'
Out[2]: False

In [4]: a = '-'

In [5]: a is '-'
Out[5]: True

In [6]: a = 'foo'

In [7]: a is 'foo'
Out[7]: True

Yes, this happens because of small objects caching. When small
integers or short strings are created, there are possibility that they
might refer to the same objects behind-the-scene. Don't rely on this
behavior.

Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
the basis of the choice is?
Also note that the behaviour you saw above changes if you put code into a
script rather than running it interactively (the string '--' will be re-
used within a single compilation unit). So even if you understand all of
the choices made in your particular release of Python (and they do vary
between releases) it would be very unwise to rely on this behaviour.

--
Duncan Booth http://kupuguy.blogspot.com
Jun 27 '08 #6

P: n/a
Faheem Mitha <fa****@email.unc.eduwrites:
Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
the basis of the choice is?
Caches such as intern dictionary/set and one-character cache are
specific to the implementation (and also to its version version,
etc.). In this case '-' is a 1-character string, all of which are
cached. Python also interns strings that show up in Python source as
literals that can be interpreted as identifiers. It also reuses
string literals within a single expression. None of this should be
relied on, but it's interesting to get insight into the implementation
by examining the different cases:
>>'--' is '--'
True # string repeated within an expression is simply reused
>>a = '--'
b = '--'
a is b
False # not cached
>>a = '-'
b = '-'
a is b
False # all 1-character strings are cached
>>a = 'flobozz'
b = 'flobozz'
a is b
True # flobozz is a valid identifier, so it's cached
>>a = 'flo-bozz'
b = 'flo-bozz'
a is b
False
Jun 27 '08 #7

This discussion thread is closed

Replies have been disabled for this discussion.