473,320 Members | 1,900 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

cPickle.dumps differs from Pickle.dumps; looks like a bug.

Hello list,

I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False
vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>quit()
vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386

May 16 '07 #1
8 2280
On May 16, 1:13 pm, Victor Kryukov <victor.kryu...@gmail.comwrote:
Hello list,

I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False

vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.>>quit()

vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386
If you unpickle though will the results be the same? I suspect they
will be. That should matter most of all (unless you plan to compare
objects' identity based on their pickled version.)

Remember, that by default pickle and cPickle will create a longer
ASCII representation, for a binary representation use a higher pickle
protocol -- 2 instead of 1.

Hope that helps,
-Nick Vatamaniuc

May 16 '07 #2
I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False

vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.>>>
quit()

vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386

If you unpickle though will the results be the same? I suspect they
will be. That should matter most of all (unless you plan to compare
objects' identity based on their pickled version.)
The OP was not comparing identity but equality. So it looks like a
real bug, I think the following should be True for any function f:

if a == b: f(a) == f(b)

or not?

Daniel
May 16 '07 #3
On May 16, 1:13 pm, Victor Kryukov <victor.kryu...@gmail.comwrote:
Hello list,

I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False

vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.>>quit()

vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386
I might have found the culprit: see http://svn.python.org/projects/pytho...ules/cPickle.c
Function static int put2(...) has the following code block in it :

---------cPickle.c-----------
int p;
....
if ((p = PyDict_Size(self->memo)) < 0) goto finally;
/* Make sure memo keys are positive! */
/* XXX Why?
* XXX And does "positive" really mean non-negative?
* XXX pickle.py starts with PUT index 0, not 1. This makes for
* XXX gratuitous differences between the pickling modules.
*/
p++;
-------------------------------

p++ will cause the difference. It seems the developers are not quite
sure why it's there or whether memo key sizes can be 0 or have to be
1.

Here is corresponding section for the Python version (pickle.py) taken
from Python 2.5
---------pickle.py----------
def memoize(self, obj):
"""Store an object in the memo."""
# The Pickler memo is a dictionary mapping object ids to 2-
tuples
# that contain the Unpickler memo key and the object being
memoized.
# The memo key is written to the pickle and will become
# the key in the Unpickler's memo. The object is stored in
the
# Pickler memo so that transient objects are kept alive during
# pickling.

# The use of the Unpickler memo length as the memo key is just
a
# convention. The only requirement is that the memo values be
unique.
# But there appears no advantage to any other scheme, and this
# scheme allows the Unpickler memo to be implemented as a
plain (but
# growable) array, indexed by memo key.
if self.fast:
return
assert id(obj) not in self.memo
memo_len = len(self.memo)
self.write(self.put(memo_len))
self.memo[id(obj)] = memo_len, obj

# Return a PUT (BINPUT, LONG_BINPUT) opcode string, with argument
i.
def put(self, i, pack=struct.pack):
if self.bin:
if i < 256:
return BINPUT + chr(i)
else:
return LONG_BINPUT + pack("<i", i)
return PUT + repr(i) + '\n'
------------------------------------------

In memoize memo_len is the 'int p' from the c version. The size is 0
and is kept 0 while in the C version the size initially is 0 but then
is incremented with p++;

Any developers that know more about this?

-Nick Vatamaniuc

May 16 '07 #4
In <ma***************************************@python. org>, Daniel Nogradi
wrote:
The OP was not comparing identity but equality. So it looks like a
real bug, I think the following should be True for any function f:

if a == b: f(a) == f(b)

or not?
In [74]: def f(x):
....: return x / 2
....:

In [75]: a = 5

In [76]: b = 5.0

In [77]: a == b
Out[77]: True

In [78]: f(a) == f(b)
Out[78]: False

And `f()` doesn't even use something like `random()` or `time()` here. ;-)

Ciao,
Marc 'BlackJack' Rintsch
May 16 '07 #5
On 5/16/07, Daniel Nogradi <no*****@gmail.comwrote:
I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?
>
Best regards,
Victor.
>
from pickle import dumps
from cPickle import dumps as cdumps
>
print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))
>
outputs
>
True
False
>
vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.>>>
quit()
>
vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386
If you unpickle though will the results be the same? I suspect they
will be. That should matter most of all (unless you plan to compare
objects' identity based on their pickled version.)

The OP was not comparing identity but equality. So it looks like a
real bug, I think the following should be True for any function f:

if a == b: f(a) == f(b)

or not?
Obviously not, in the general case. random.random(x) is the most
obvious example, but there's any number functions which don't return
the same value for equal inputs. Take file() or open() - since you get
a new file object with new state, it obviously will not be equal even
if it's the same file path.

For certain inputs, cPickle doesn't print the memo information that is
used to support recursive and shared data structures. I'm not sure how
it tells the difference, perhaps it has something to do with
refcounts. In any case, it's an optimization of the pickle output, not
a bug.
May 16 '07 #6
I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False

vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more
information.>>>
quit()

vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386
>
If you unpickle though will the results be the same? I suspect they
will be. That should matter most of all (unless you plan to compare
objects' identity based on their pickled version.)
The OP was not comparing identity but equality. So it looks like a
real bug, I think the following should be True for any function f:

if a == b: f(a) == f(b)

or not?

Obviously not, in the general case. random.random(x) is the most
obvious example, but there's any number functions which don't return
the same value for equal inputs. Take file() or open() - since you get
a new file object with new state, it obviously will not be equal even
if it's the same file path.
Right, sorry about that, posted too quickly :)
I was thinking for a while about a deterministic
For certain inputs, cPickle doesn't print the memo information that is
used to support recursive and shared data structures. I'm not sure how
it tells the difference, perhaps it has something to do with
refcounts. In any case, it's an optimization of the pickle output, not
a bug.
Caching?
>>from cPickle import dumps
dumps('0') == dumps(str(0))
True
>>dumps('1') == dumps(str(1))
True
>>dumps('2') == dumps(str(2))
True
.........
.........
>>dumps('9') == dumps(str(9))
True
>>dumps('10') == dumps(str(10))
False
>>dumps('11') == dumps(str(11))
False
Daniel
May 16 '07 #7
Daniel Nogradi wrote:
Caching?
>>>from cPickle import dumps
dumps('0') == dumps(str(0))
True
>>>dumps('1') == dumps(str(1))
True
>>>dumps('2') == dumps(str(2))
True
........
........
>>>dumps('9') == dumps(str(9))
True
>>>dumps('10') == dumps(str(10))
False
>>>dumps('11') == dumps(str(11))
False
All strings of length 0 (there is 1) and 1 (there are 256) are interned.

- Josiah
May 17 '07 #8
En Thu, 17 May 2007 02:09:02 -0300, Josiah Carlson
<jo************@sbcglobal.netescribió:
All strings of length 0 (there is 1) and 1 (there are 256) are interned.
I thought it was the case too, but not always:

pya = "a"
pyb = "A".lower()
pya==b
True
pya is b
False
pya is intern(a)
True
pyb is intern(b)
False

--
Gabriel Genellina

May 17 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: Drochom | last post by:
Hello, I have a huge problem with loading very simple structure into memory it is a list of tuples, it has 6MB and consists of 100000 elements >import cPickle >plik = open("mealy","r")...
2
by: sh | last post by:
Hi guys, Well, I have a (maybe dumb) question. I want to write my own little blog using Python (as a fairly small but doable project for myself to learn more deaply Python in a web context). ...
3
by: Chris Curvey | last post by:
Hi all, I have this program class Company: def __init__(self, revenues, costs): self.revenues = revenues self.costs = costs def __getattr__(self, name):
1
by: A.B., Khalid | last post by:
I wonder if someone can explain what is wrong here. I am pickling a list of dictionaries (see code attached) and unpickling it back using the HIGHEST_PROTOCOL of pickle and cPickle. I am getting an...
0
by: Al Franz | last post by:
I believe there is a memory leak in cPickle. I am using python2.2. I have a parallel code which uses array() and indices() from Numeric to massage data buffers before being sent and received by...
2
by: David Bear | last post by:
I'm rather new to pickling but I have some dictionaries and lists I want to package and send to another process (on another machine). I was hoping I could just send a stringified pickle. However,...
8
by: Jeff Poole | last post by:
This is going to be a pretty vague message because it involves a large block of code I'd rather avoid posting. Basically, I've been pickling a dictionary of instances of a class I've created...
0
by: Bart Ogryczak | last post by:
It seems, that on Solaris cPickle is unable to unpickle some values, which it is able to pickle. 'F9.9999999999999694e-311\n.' Traceback (most recent call last): File "<stdin>", line 1, in ?...
5
by: Victor Kryukov | last post by:
Hello list, The following behavior is completely unexpected. Is it a bug or a by- design feature? Regards, Victor. -----------------
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.