[Boris Borcic]
Assuming that the items of my_stream share no content (they are
dumps of db cursor fetches), is there a simple way to do the
equivalent of
def pickles(my_stream) :
from cPickle import load,dumps
while 1 :
yield dumps(load(my_stream))
without the overhead associated with unpickling objects
just to pickle them again ?
cPickle (but not pickle.py) Unpickler objects have a barely documented
noload() method. This "acts like" load(), except doesn't import
modules or construct objects of user-defined classes. The return
value of noload() is undocumented and usually useless. ZODB uses it a
lot ;-)
Anyway, that can go much faster than load(), and works even if the
classes and modules referenced by pickles aren't available in the
unpickling environment. It doesn't return the individual pickle
strings, but they're easy to get at by paying attention to the file
position between noload() calls. For example,
import cPickle as pickle
import os
# Build a pickle file with 4 pickles.
PICKLEFILE = "temp.pck"
class C:
pass
f = open(PICKLEFILE, "wb")
p = pickle.Pickler(f, 1)
p.dump(2)
p.dump([3, 4])
p.dump(C())
p.dump("all done")
f.close()
# Now use noload() to extract the 4 pickle
# strings in that file.
f = open(PICKLEFILE, "rb")
limit = os.path.getsize(PICKLEFILE)
u = pickle.Unpickler(f)
pickles = []
pos = 0
while pos < limit:
u.noload()
thispos = f.tell()
f.seek(pos)
pickles.append(f.read(thispos - pos))
pos = thispos
from pprint import pprint
pprint(pickles)
That prints a list containing the 4 pickle strings:
['K\x02.',
']q\x01(K\x03K\x04e.',
'(c__main__\nC\nq\x02o}q\x03b.',
'U\x08all doneq\x04.']
You could do much the same by calling pickletools.dis() and ignoring
its output, but that's likely to be slower.