473,569 Members | 2,598 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

The Art of Pickling: Binary vs Ascii difficulties

Bix
As this is my very first post, I'd like to give thanks to all who
support this with their help. Hopefully, this question hasn't been
answered (too many times) before...

If anyone could explain this behavior, I'd greatly appreciate it.

I'm leaving the example at the bottom. There is a variable, fmt,
within the test0 function which can be changed from -1
(pickle.HIGHEST _PROTOCOL) to 0 (ascii). The behavior between the two
pickle formats is not consistent. I'm hoping for an explaination and
a possible solution; I'd like to store my data in binary.

Thanks in advance!
# example.py
import pickle
class node (object):
def __init__ (self, *args, **kwds):
self.args = args
self.kwds = kwds
self.reset()

def reset(self):
self.name = None
self.node = 'node'
self.attributes = {}
self.children = []
self.update(*se lf.args,**self. kwds)

def update(*args,** kwds):
for k,v in kwds.items():
if k in self.__dict__.k eys():
self.__dict__[k] = v

def test0 (x,fmt=-1):
fn = 'out.bin'
pickle.Pickler( open(fn,'w'),fm t).dump(x)
obj = pickle.Unpickle r(open(fn,'r')) .load()
return obj

def test1 ():
x = node()
return test0(x)

def test2 ():
x = node()
y = node()
x.children.appe nd(y)
return test0(x)

def test3 ():
w = node()
x = node()
y = node()
z = node()
w.children.appe nd(x)
x.children.appe nd(y)
y.children.appe nd(z)
return test0(w)

def test4 ():
w = node()
x = node()
y = node()
z = node()
w.children.appe nd(x)
x.children.appe nd(y)
y.children.appe nd(z)
return test0(w,0)

def makeAttempt(cal l,name):
try:
call()
print '%s passed' % name
except:
print '%s failed' % name

if __name__ == "__main__":
makeAttempt(tes t1,'test1') # should run
makeAttempt(tes t2,'test2') # should run
makeAttempt(tes t3,'test3') # should fail
makeAttempt(tes t4,'test4') # should run
Jul 18 '05 #1
7 2376
I'm leaving the example at the bottom. There is a variable, fmt,
within the test0 function which can be changed from -1
(pickle.HIGHEST _PROTOCOL) to 0 (ascii). The behavior between the two
pickle formats is not consistent. I'm hoping for an explaination and
a possible solution; I'd like to store my data in binary.
If you want to store data in binary, and are running on windows, you
must make sure to open all files with the binary flag, 'b'.
pickle.Pickler( open(fn,'w'),fm t).dump(x)
obj = pickle.Unpickle r(open(fn,'r')) .load()


The above should be open(fn, 'wb')... and open(fn, 'rb')... respectively.

Changing those two made all of them pass for me, and I would expect no
less.

Oh, and so that you get into the habit early; tabs are frowned upon as
indentation in Python. The standard is 4 spaces, no tabs.

- Josiah

Jul 18 '05 #2
Bix wrote:
I'm leaving the example at the bottom. There is a variable, fmt,
within the test0 function which can be changed from -1
(pickle.HIGHEST _PROTOCOL) to 0 (ascii). The behavior between the two
pickle formats is not consistent. I'm hoping for an explaination and
a possible solution; I'd like to store my data in binary.


What's the inconsistancy?

Ahh, I see the comments down at the end of your file. I
assume you think they should all pass?

They all pass for me.

I'll guess you're on MS Windows. You need to open the file
in binary mode instead of ascii, which is the default.
Try changing

pickle.Pickler( open(fn,'w'),fm t).dump(x)
obj = pickle.Unpickle r(open(fn,'r')) .load()

to

pickle.Pickler( open(fn,'wb'),f mt).dump(x)
obj = pickle.Unpickle r(open(fn,'rb') ).load()
This isn't clear in the documentation, as Skip complained
about last year in the thread starting at
http://mail.python.org/pipermail/pyt...ry/033362.html

Though to be precise, this isn't actually a pickle
issue.

Andrew
da***@dalkescie ntific.com
Jul 18 '05 #3
Bix wrote:
As this is my very first post, I'd like to give thanks to all who
support this with their help. Hopefully, this question hasn't been
answered (too many times) before...
If anyone could explain this behavior, I'd greatly appreciate it.
You clearly spent some effort on this, but you could have boiled this
down to a smaller, more direct question.

The short answer is, "when reading and/or writing binary data,
the files must be opened in binary." Pickles in "ascii" are not
in a binary format, but the others are.

The longer answer includes:
You should handle files a bit more carefully. Don't presume they get
automatically get closed.
I'd change: fn = 'out.bin'
pickle.Pickler( open(fn,'w'),fm t).dump(w)
obj = pickle.Unpickle r(open(fn,'r')) .load()

to:
fn = 'out.bin'
dest = open(fn, 'w')
try:
pickle.Pickler( dest, fmt).dump(w)
finally:
dest.close()
source = open(fn, 'r')
try:
return pickle.Unpickle r(source).load( )
finally:
source.close()

Then the problem (the mode in which you open the file) shows up to a
practiced eye.
dest = open(fn, 'w') ... source = open(fn, 'r')
should either be:
dest = open(fn, 'wb') ... source = open(fn, 'rb')
which works "OK" for ascii, but is not in machine-native text format.
or:
if fmt:
readmode, writemode = 'rb', 'wb'
else:
readmode, writemode = 'r', 'b'
...
dest = open(fn, writemode) ... source = open(fn, readmode)

By the way, the reason that binary mode sometimes works (which is,
I suspect, what is troubling you), is that not all bytes are necessarily
written out as-is in text mode. On Windows and MS-DOS systems,
a byte with value 10 is written as a pair of bytes, 13 followed by 10.
On Apple systems, another translation happens. On unix (and hence
linux) there is no distinction between data written as text and the
C representation of '\n' for line breaks. This means nobody on linux
who ran your example saw a problem, I suspect.

This C convention is a violation of the ASCII code as it was then
defined, in order to save a byte per line (treating '\n' as end-of-line,
not line-feed). An ASCII-conforming printer when fed 'a\nb\nc\r\n.\r \n'
should print:
a
b
c
..

My idea of the right question would be, roughly:

Why does test(0) succeed (pickle format 0 = ascii),
but test(-1) fail (pickle format -1 = pickle.HIGHEST_ PROTOCOL)?
I am using python 2.4 on Windows2000

import pickle
class node (object):
def __init__ (self, *args, **kwds):
self.args = args
self.kwds = kwds
self.reset()

def reset(self):
self.name = None
self.node = 'node'
self.attributes = {}
self.children = []
self.update(*se lf.args,**self. kwds)

def update(*args,** kwds):
for k,v in kwds.items():
if k in self.__dict__.k eys():
self.__dict__[k] = v

def test(fmt=-1):
w = node()
x = node()
y = node()
z = node()
w.children.appe nd(x)
x.children.appe nd(y)
y.children.appe nd(z)
fn = 'out.bin'
pickle.Pickler( open(fn,'w'),fm t).dump(w)
obj = pickle.Unpickle r(open(fn,'r')) .load()
return obj

The error message is:
Traceback (most recent call last):
File "<pyshell#2 4>", line 1, in -toplevel-
test()
File "<pyshell#2 2>", line 11, in test
obj = pickle.Unpickle r(open(fn,'r')) .load()
File "C:\Python24\li b\pickle.py", line 872, in load
dispatch[key](self)
File "C:\Python24\li b\pickle.py", line 1189, in load_binput
i = ord(self.read(1 ))
TypeError: ord() expected a character, but string of length 0 found
-Scott David Daniels
Sc***********@A cm.Org
Jul 18 '05 #4
Scott David Daniels
This C convention is a violation of the ASCII code as it was then
defined, in order to save a byte per line (treating '\n' as end-of-line,
not line-feed). An ASCII-conforming printer when fed 'a\nb\nc\r\n.\r \n'
should print:
a
b
c
..


Standards wonk that I am, I was curious about this. I've
never read the ASCII spec before. In my somewhat cursory
search I couldn't find something authoritative on-line that
claimed to be "the" ASCII spec. I did find RFC 20 "ASCII
format for network interchange" dated October 16, 1969,
so before the C convention was defined. Here's one copy
http://www.faqs.org/rfcs/rfc20.html

It says
LF (Line Feed): A format effector which controls the movement of
the printing position to the next printing line. (Applicable also to
display devices.) Where appropriate, this character may have the
meaning "New Line" (NL), a format effector which controls the
movement of the printing point to the first printing position on the
next printing line. Use of this convention requires agreement
between sender and recipient of data.

So it seems that it's not a violation, just a convention.

It happens that MS Windows and Unix (and old Macs) have
different conventions.

Andrew
da***@dalkescie ntific.com
Jul 18 '05 #5
>>>>> "Andrew" == Andrew Dalke <ad****@mindspr ing.com> writes:

Andrew> Standards wonk that I am, I was curious about this. I've

Well, if you are a standards wonk and emacs user, you might have fun
with this little bit of python and emacs code. If you place rfc.py in
your PATH

#!/usr/bin/env python
# Print an RFC indicated by a command line arg to stdout
# > rfc.py 822

import urllib, sys

try: n = int(sys.argv[1])
except:
print 'Example usage: %s 822' % sys.argv[0]
sys.exit(1)

print urllib.urlopen( 'http://www.ietf.org/rfc/rfc%d.txt' % n).read()
and add this function to your .emacs

;;** RFC
(defun rfc (num)
"Insert RFC indicated by num into buffer *RFC<num>*"
(interactive "sRFC: ")
(shell-command
(concat "rfc.py " num)
(concat "*RFC" num "*")))
You can get rfc's in your emacs buffer by doing

M-x rfc ENTER 20 ENTER

And now back you our regularly scheduled work day.

JDH
Jul 18 '05 #6
John Hunter wrote:
Well, if you are a standards wonk and emacs user, you might have fun
with this little bit of python and emacs code. If you place rfc.py in
your PATH


Huh. Never really figured out how to customize Lisp.

Another solution is to use a browser like Konqueror which
lets users define new "protocols" so that "rfc:20"
expands to the given URL.

Very handy for Qt programming because I can have
"qt:textedi t" expand to the documentation for that
module.

Most of the specs I read, btw, aren't RFCs.

Andrew
da***@dalkescie ntific.com

Jul 18 '05 #7
On Thu, 14 Oct 2004 16:38:25 -0500, John Hunter <jd******@ace.b sd.uchicago.edu > wrote:
>> "Andrew" == Andrew Dalke <ad****@mindspr ing.com> writes:


Andrew> Standards wonk that I am, I was curious about this. I've

Well, if you are a standards wonk and emacs user, you might have fun
with this little bit of python and emacs code. If you place rfc.py in
your PATH

#!/usr/bin/env python
# Print an RFC indicated by a command line arg to stdout
# > rfc.py 822

import urllib, sys

try: n = int(sys.argv[1])
except:
print 'Example usage: %s 822' % sys.argv[0]
sys.exit(1)

print urllib.urlopen( 'http://www.ietf.org/rfc/rfc%d.txt' % n).read()
and add this function to your .emacs

;;** RFC
(defun rfc (num)
"Insert RFC indicated by num into buffer *RFC<num>*"
(interactive "sRFC: ")
(shell-command
(concat "rfc.py " num)
(concat "*RFC" num "*")))
You can get rfc's in your emacs buffer by doing

M-x rfc ENTER 20 ENTER

And now back you our regularly scheduled work day.

Thanks. For win32 users with gvim I've modified it a little ...

---< vrfc.py >---------------------------------------
# vrfc.py
# to use in gvim on win32, put this file somewhere
# and put a vrfc.cmd file in one of your %PATH% directories
# (e.g. c:\util here) running python with a full path to this
# script (vrfc.py), e.g.,
# +--< vrfc.cmd >------------+
# |@python c:\util\vrfc.py %1|
# +--------------------------+
# (This cmd file is necessary on NT4 and some other windows platforms
# in order for the output to be pipe-able back into gvim (or anytwhere else)).
# Then you can insert an rfc into your current gvim editing using
# :r!vrfc n
# where n is the rfc number
# Form form feeds are converted to underline separators 78 chars wide
# and \r's if any are stripped for normalized output, in case.
#
import urllib, sys
try: n = int(sys.argv[1])
except:
print 'Example usage: python %s 822' % sys.argv[0]
sys.exit(1)
s = urllib.urlopen( 'http://www.ietf.org/rfc/rfc%d.txt' % n).read()
s = s.replace('\r', '')
s = s.replace('\x0c ','_'*78+'\n')
sys.stdout.writ e(s)
sys.stdout.clos e()
--------------------------------------------------------
;-)

Regards,
Bengt Richter
Jul 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
9094
by: J. Campbell | last post by:
OK...I'm in the process of learning C++. In my old (non-portable) programming days, I made use of binary files a lot...not worrying about endian issues. I'm starting to understand why C++ makes it difficult to read/write an integer directly as a bit-stream to a file. However, I'm at a bit of a loss for how to do the following. So as not to...
9
3579
by: Alex | last post by:
I have a serious problem and I hope there is some solution. It is easier to illustrate with a simple code: >>> class Parent(object): __slots__= def __init__(self, a, b): self.A=a; self.B=b def __getstate__(self): return self.A, self.B def __setstate__(self, tup):
13
3534
by: greg | last post by:
Hello, I'm searching to know if a local file is ascii or binary. I couldn't find it in the manual, is there a way to know that ? thanks, -- greg
10
3633
by: joelagnel | last post by:
hi friends, i've been having this confusion for about a year, i want to know the exact difference between text and binary files. using the fwrite function in c, i wrote 2 bytes of integers in binary mode. according to me, notepad opens files and each byte of the file read, it converts that byte from ascii to its correct character and...
13
22079
by: HNT20 | last post by:
Hello All i am new to python language. i am working on a gnuradio project where it uses python as the primary programming language. i am trying to convert a message, text, or numbers into binary code so that i can process it. i googled many times and tried many of the answers out there, no luck so far. is there a way to convert a...
5
2890
by: bwv539 | last post by:
I have to output data into a binary file, that will contain data coming from a four channel measurement instrument. Since those data have to be read from another C program somewhere else, the reading program must know how many channels have been acquired, date, time, and so on. I mean that the position of each datum is not fixed in the file...
399
12705
by: =?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?= | last post by:
PEP 1 specifies that PEP authors need to collect feedback from the community. As the author of PEP 3131, I'd like to encourage comments to the PEP included below, either here (comp.lang.python), or to python-3000@python.org In summary, this PEP proposes to allow non-ASCII letters as identifiers in Python. If the PEP is accepted, the...
3
3488
by: logaelo | last post by:
Hello all, Could anyone explain how to optimization this code? In the prosess of optimization what is the factor needed and important to know about it? Thank you very much for all. /********************************************************/ /* Binary converter */ /* By Matt Fowler ...
5
2575
by: Canned | last post by:
Hi, I'm trying to write a class that can convert ascii to binary and vice versa. I write my class based on this function I've found on internet That works perfectly, but when I try to implement it in my own class it gives me alot of headache, also because I'm totally new to the language. It work only with one character at a time, and if I...
0
7700
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7614
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7924
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8125
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7676
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7974
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5513
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3642
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1221
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.