473,396 Members | 1,929 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

python string comparison oddity


Hi everybody,

I was wondering if anyone can explain this. My understanding is that 'is'
checks if the object is the same. However, in that case, why this
inconsistency for short strings? I would expect a 'False' for all three
comparisons. This is reproducible across two different machines, so it is
not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
default).
Thanks, Faheem.

In [1]: a = '--'

In [2]: a is '--'
Out[2]: False

In [4]: a = '-'

In [5]: a is '-'
Out[5]: True

In [6]: a = 'foo'

In [7]: a is 'foo'
Out[7]: True
Jun 27 '08 #1
6 2438
Lie
On Jun 19, 2:26*am, Faheem Mitha <fah...@email.unc.eduwrote:
Hi everybody,

I was wondering if anyone can explain this. My understanding is that 'is'
checks if the object is the same. However, in that case, why this
inconsistency for short strings? I would expect a 'False' for all three
comparisons. This is reproducible across two different machines, so it is
not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
default).
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Thanks, Faheem.

In [1]: a = '--'

In [2]: a is '--'
Out[2]: False

In [4]: a = '-'

In [5]: a is '-'
Out[5]: True

In [6]: a = 'foo'

In [7]: a is 'foo'
Out[7]: True
Yes, this happens because of small objects caching. When small
integers or short strings are created, there are possibility that they
might refer to the same objects behind-the-scene. Don't rely on this
behavior.
Jun 27 '08 #2
On Wed, 18 Jun 2008 12:57:44 -0700 (PDT), Lie <Li******@gmail.comwrote:
On Jun 19, 2:26*am, Faheem Mitha <fah...@email.unc.eduwrote:
>Hi everybody,

I was wondering if anyone can explain this. My understanding is that 'is'
checks if the object is the same. However, in that case, why this
inconsistency for short strings? I would expect a 'False' for all three
comparisons. This is reproducible across two different machines, so it is
not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
default).
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Thanks, Faheem.

In [1]: a = '--'

In [2]: a is '--'
Out[2]: False

In [4]: a = '-'

In [5]: a is '-'
Out[5]: True

In [6]: a = 'foo'

In [7]: a is 'foo'
Out[7]: True

Yes, this happens because of small objects caching. When small
integers or short strings are created, there are possibility that they
might refer to the same objects behind-the-scene. Don't rely on this
behavior.
Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
the basis of the choice is?
Faheem.
Jun 27 '08 #3
Faheem Mitha wrote:
On Wed, 18 Jun 2008 12:57:44 -0700 (PDT), Lie <Li******@gmail.comwrote:
>On Jun 19, 2:26 am, Faheem Mitha <fah...@email.unc.eduwrote:
>>Hi everybody,

I was wondering if anyone can explain this. My understanding is that 'is'
checks if the object is the same. However, in that case, why this
inconsistency for short strings? I would expect a 'False' for all three
comparisons. This is reproducible across two different machines, so it is
not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
default).
Thanks, Faheem.

In [1]: a = '--'

In [2]: a is '--'
Out[2]: False

In [4]: a = '-'

In [5]: a is '-'
Out[5]: True

In [6]: a = 'foo'

In [7]: a is 'foo'
Out[7]: True
Yes, this happens because of small objects caching. When small
integers or short strings are created, there are possibility that they
might refer to the same objects behind-the-scene. Don't rely on this
behavior.

Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
the basis of the choice is?
Shortish Python identifiers and operators, I think. Plus a handful like '\x00'.
The source would know for sure, but alas, I am lazy.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Jun 27 '08 #4
Lie
On Jun 19, 5:13*am, Faheem Mitha <fah...@email.unc.eduwrote:
On Wed, 18 Jun 2008 12:57:44 -0700 (PDT), Lie <Lie.1...@gmail.comwrote:
On Jun 19, 2:26*am, Faheem Mitha <fah...@email.unc.eduwrote:
Hi everybody,
I was wondering if anyone can explain this. My understanding is that 'is'
checks if the object is the same. However, in that case, why this
inconsistency for short strings? I would expect a 'False' for all three
comparisons. This is reproducible across two different machines, so itis
not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
default).
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Thanks, Faheem.
In [1]: a = '--'
In [2]: a is '--'
Out[2]: False
In [4]: a = '-'
In [5]: a is '-'
Out[5]: True
In [6]: a = 'foo'
In [7]: a is 'foo'
Out[7]: True
Yes, this happens because of small objects caching. When small
integers or short strings are created, there are possibility that they
might refer to the same objects behind-the-scene. Don't rely on this
behavior.

Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
the basis of the choice is?
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *Faheem.
Yes, but we're already warned not to rely on it since the basis of
what may be cached and what-not might be arbitrary. Personally, I'd
not delve deeply into them, they aren't a reliable behavior.
Jun 27 '08 #5
Faheem Mitha <fa****@email.unc.eduwrote:
>>In [1]: a = '--'

In [2]: a is '--'
Out[2]: False

In [4]: a = '-'

In [5]: a is '-'
Out[5]: True

In [6]: a = 'foo'

In [7]: a is 'foo'
Out[7]: True

Yes, this happens because of small objects caching. When small
integers or short strings are created, there are possibility that they
might refer to the same objects behind-the-scene. Don't rely on this
behavior.

Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
the basis of the choice is?
Also note that the behaviour you saw above changes if you put code into a
script rather than running it interactively (the string '--' will be re-
used within a single compilation unit). So even if you understand all of
the choices made in your particular release of Python (and they do vary
between releases) it would be very unwise to rely on this behaviour.

--
Duncan Booth http://kupuguy.blogspot.com
Jun 27 '08 #6
Faheem Mitha <fa****@email.unc.eduwrites:
Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
the basis of the choice is?
Caches such as intern dictionary/set and one-character cache are
specific to the implementation (and also to its version version,
etc.). In this case '-' is a 1-character string, all of which are
cached. Python also interns strings that show up in Python source as
literals that can be interpreted as identifiers. It also reuses
string literals within a single expression. None of this should be
relied on, but it's interesting to get insight into the implementation
by examining the different cases:
>>'--' is '--'
True # string repeated within an expression is simply reused
>>a = '--'
b = '--'
a is b
False # not cached
>>a = '-'
b = '-'
a is b
False # all 1-character strings are cached
>>a = 'flobozz'
b = 'flobozz'
a is b
True # flobozz is a valid identifier, so it's cached
>>a = 'flo-bozz'
b = 'flo-bozz'
a is b
False
Jun 27 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: ajikoe | last post by:
Hi, I tried to follow the example in swig homepage. I found error which I don't understand. I use bcc32, I already include directory where my python.h exist in bcc32.cfg. /* File : example.c...
77
by: Ben Finney | last post by:
Howdy all, PEP 354: Enumerations in Python has been accepted as a draft PEP. The current version can be viewed online: <URL:http://www.python.org/peps/pep-0354.html> Here is the...
0
by: Kurt B. Kaiser | last post by:
Patch / Bug Summary ___________________ Patches : 380 open (-36) / 3658 closed (+65) / 4038 total (+29) Bugs : 965 open ( -9) / 6555 closed (+35) / 7520 total (+26) RFE : 272 open...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.