By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,089 Members | 2,418 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,089 IT Pros & Developers. It's quick & easy.

cPickle.dumps differs from Pickle.dumps; looks like a bug.

P: n/a
Hello list,

I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False
vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>quit()
vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386

May 16 '07 #1
Share this Question
Share on Google+
8 Replies


P: n/a
On May 16, 1:13 pm, Victor Kryukov <victor.kryu...@gmail.comwrote:
Hello list,

I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False

vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.>>quit()

vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386
If you unpickle though will the results be the same? I suspect they
will be. That should matter most of all (unless you plan to compare
objects' identity based on their pickled version.)

Remember, that by default pickle and cPickle will create a longer
ASCII representation, for a binary representation use a higher pickle
protocol -- 2 instead of 1.

Hope that helps,
-Nick Vatamaniuc

May 16 '07 #2

P: n/a
I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False

vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.>>>
quit()

vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386

If you unpickle though will the results be the same? I suspect they
will be. That should matter most of all (unless you plan to compare
objects' identity based on their pickled version.)
The OP was not comparing identity but equality. So it looks like a
real bug, I think the following should be True for any function f:

if a == b: f(a) == f(b)

or not?

Daniel
May 16 '07 #3

P: n/a
On May 16, 1:13 pm, Victor Kryukov <victor.kryu...@gmail.comwrote:
Hello list,

I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False

vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.>>quit()

vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386
I might have found the culprit: see http://svn.python.org/projects/pytho...ules/cPickle.c
Function static int put2(...) has the following code block in it :

---------cPickle.c-----------
int p;
....
if ((p = PyDict_Size(self->memo)) < 0) goto finally;
/* Make sure memo keys are positive! */
/* XXX Why?
* XXX And does "positive" really mean non-negative?
* XXX pickle.py starts with PUT index 0, not 1. This makes for
* XXX gratuitous differences between the pickling modules.
*/
p++;
-------------------------------

p++ will cause the difference. It seems the developers are not quite
sure why it's there or whether memo key sizes can be 0 or have to be
1.

Here is corresponding section for the Python version (pickle.py) taken
from Python 2.5
---------pickle.py----------
def memoize(self, obj):
"""Store an object in the memo."""
# The Pickler memo is a dictionary mapping object ids to 2-
tuples
# that contain the Unpickler memo key and the object being
memoized.
# The memo key is written to the pickle and will become
# the key in the Unpickler's memo. The object is stored in
the
# Pickler memo so that transient objects are kept alive during
# pickling.

# The use of the Unpickler memo length as the memo key is just
a
# convention. The only requirement is that the memo values be
unique.
# But there appears no advantage to any other scheme, and this
# scheme allows the Unpickler memo to be implemented as a
plain (but
# growable) array, indexed by memo key.
if self.fast:
return
assert id(obj) not in self.memo
memo_len = len(self.memo)
self.write(self.put(memo_len))
self.memo[id(obj)] = memo_len, obj

# Return a PUT (BINPUT, LONG_BINPUT) opcode string, with argument
i.
def put(self, i, pack=struct.pack):
if self.bin:
if i < 256:
return BINPUT + chr(i)
else:
return LONG_BINPUT + pack("<i", i)
return PUT + repr(i) + '\n'
------------------------------------------

In memoize memo_len is the 'int p' from the c version. The size is 0
and is kept 0 while in the C version the size initially is 0 but then
is incremented with p++;

Any developers that know more about this?

-Nick Vatamaniuc

May 16 '07 #4

P: n/a
In <ma***************************************@python. org>, Daniel Nogradi
wrote:
The OP was not comparing identity but equality. So it looks like a
real bug, I think the following should be True for any function f:

if a == b: f(a) == f(b)

or not?
In [74]: def f(x):
....: return x / 2
....:

In [75]: a = 5

In [76]: b = 5.0

In [77]: a == b
Out[77]: True

In [78]: f(a) == f(b)
Out[78]: False

And `f()` doesn't even use something like `random()` or `time()` here. ;-)

Ciao,
Marc 'BlackJack' Rintsch
May 16 '07 #5

P: n/a
On 5/16/07, Daniel Nogradi <no*****@gmail.comwrote:
I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?
>
Best regards,
Victor.
>
from pickle import dumps
from cPickle import dumps as cdumps
>
print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))
>
outputs
>
True
False
>
vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.>>>
quit()
>
vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386
If you unpickle though will the results be the same? I suspect they
will be. That should matter most of all (unless you plan to compare
objects' identity based on their pickled version.)

The OP was not comparing identity but equality. So it looks like a
real bug, I think the following should be True for any function f:

if a == b: f(a) == f(b)

or not?
Obviously not, in the general case. random.random(x) is the most
obvious example, but there's any number functions which don't return
the same value for equal inputs. Take file() or open() - since you get
a new file object with new state, it obviously will not be equal even
if it's the same file path.

For certain inputs, cPickle doesn't print the memo information that is
used to support recursive and shared data structures. I'm not sure how
it tells the difference, perhaps it has something to do with
refcounts. In any case, it's an optimization of the pickle output, not
a bug.
May 16 '07 #6

P: n/a
I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False

vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more
information.>>>
quit()

vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386
>
If you unpickle though will the results be the same? I suspect they
will be. That should matter most of all (unless you plan to compare
objects' identity based on their pickled version.)
The OP was not comparing identity but equality. So it looks like a
real bug, I think the following should be True for any function f:

if a == b: f(a) == f(b)

or not?

Obviously not, in the general case. random.random(x) is the most
obvious example, but there's any number functions which don't return
the same value for equal inputs. Take file() or open() - since you get
a new file object with new state, it obviously will not be equal even
if it's the same file path.
Right, sorry about that, posted too quickly :)
I was thinking for a while about a deterministic
For certain inputs, cPickle doesn't print the memo information that is
used to support recursive and shared data structures. I'm not sure how
it tells the difference, perhaps it has something to do with
refcounts. In any case, it's an optimization of the pickle output, not
a bug.
Caching?
>>from cPickle import dumps
dumps('0') == dumps(str(0))
True
>>dumps('1') == dumps(str(1))
True
>>dumps('2') == dumps(str(2))
True
.........
.........
>>dumps('9') == dumps(str(9))
True
>>dumps('10') == dumps(str(10))
False
>>dumps('11') == dumps(str(11))
False
Daniel
May 16 '07 #7

P: n/a
Daniel Nogradi wrote:
Caching?
>>>from cPickle import dumps
dumps('0') == dumps(str(0))
True
>>>dumps('1') == dumps(str(1))
True
>>>dumps('2') == dumps(str(2))
True
........
........
>>>dumps('9') == dumps(str(9))
True
>>>dumps('10') == dumps(str(10))
False
>>>dumps('11') == dumps(str(11))
False
All strings of length 0 (there is 1) and 1 (there are 256) are interned.

- Josiah
May 17 '07 #8

P: n/a
En Thu, 17 May 2007 02:09:02 -0300, Josiah Carlson
<jo************@sbcglobal.netescribió:
All strings of length 0 (there is 1) and 1 (there are 256) are interned.
I thought it was the case too, but not always:

pya = "a"
pyb = "A".lower()
pya==b
True
pya is b
False
pya is intern(a)
True
pyb is intern(b)
False

--
Gabriel Genellina

May 17 '07 #9

This discussion thread is closed

Replies have been disabled for this discussion.