472,993 Members | 3,153 Online

# Pickling dictionaries containing dictionaries: failing,recursion-style!

I'm having great fun playing with Markov chains. I am making a
dictionary of all the words in a given string, getting a count of how
many appearances word1 makes in the string, getting a list of all the
word2s that follow each appearance of word1 and a count of how many
times word2 appears in the string as well. (I know I should probably
be only counting how many times word2 actually follows word1, but as I
said, I'm having great fun playing ...)
printed output of the dictionary looks like so:

{'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1, {'down':
1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
{'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}

Here's the actual function.

def assembleVocab(self):
self.wordDB = {}
for word in self.words:
try:
if not word in self.wordDB.keys():
wordsWeights = {}
afterwords = [self.words[i + 1] for i, e in
enumerate(self.words) if e == word]
for aw in afterwords:
if not aw in wordsWeights.keys():
wordsWeights[aw] = afterwords.count(aw)
self.wordDB[word] = [self.words.count(word), wordsWeights]
except:
pass
out = open("mchain.pkl",'wb')
pickle.dump(self.wordDB, out, -1)
out.close()

My problem is, I can't seem to get it to unpickle. When I attempt to
saved data, I get:

AttributeError: 'tuple' object has no attribute 'readline'

with pickle, and

Looking at the pickle pages on docs.python.org, I see that I am
indeed
supposed to be able to pickle ``tuples, lists, sets, and dictionaries
containing only picklable objects''.

I'm sure I'm missing something obvious. Clues?
Dec 1 '07 #1
6 2986
lysdexia <do**********@gmail.comwrites:
self.wordDB[word] = [self.words.count(word), wordsWeights]
what is self.words.count? Could it be an iterator? I don't think you
can pickle those.
Dec 1 '07 #2
Are you opening the file in binary mode ("rb") before doing pickle.load on it?

On 01 Dec 2007 14:13:33 -0800, Paul Rubin
<"http://phr.cx"@nospam.invalidwrote:
lysdexia <do**********@gmail.comwrites:
self.wordDB[word] = [self.words.count(word), wordsWeights]

what is self.words.count? Could it be an iterator? I don't think you
can pickle those.

--
http://mail.python.org/mailman/listinfo/python-list

--
-David
Dec 1 '07 #3
On Dec 2, 9:13 am, Paul Rubin <http://phr...@NOSPAM.invalidwrote:
lysdexia <doug.shaw...@gmail.comwrites:
self.wordDB[word] = [self.words.count(word), wordsWeights]

what is self.words.count? Could it be an iterator? I don't think you
can pickle those.
Whaaaat??
self.words is obviously an iterable (can you see "for word in
self.words" in his code?), probably just a list.
self.words.count looks like a standard sequence method to me.
self.words.count(word) will return an int -- can you see all those
"[1,", "[2," etc in his printed dict output?
Dec 1 '07 #4
John Machin <sj******@lexicon.netwrites:
self.words is obviously an iterable (can you see "for word in
self.words" in his code?), probably just a list.
It could be a file, in which case its iterator method would read lines
from the file and cause that error message. But I think the answer is
that the pickle itself needs to be opened in binary mode, as someone
else posted.
Dec 1 '07 #5
On Dec 2, 8:59 am, lysdexia <doug.shaw...@gmail.comwrote:
I'm having great fun playing with Markov chains. I am making a
dictionary of all the words in a given string, getting a count of how
many appearances word1 makes in the string, getting a list of all the
word2s that follow each appearance of word1 and a count of how many
times word2 appears in the string as well. (I know I should probably
be only counting how many times word2 actually follows word1, but as I
said, I'm having great fun playing ...)

printed output of the dictionary looks like so:

{'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1, {'down':
1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
{'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}

Here's the actual function.

def assembleVocab(self):
self.wordDB = {}
for word in self.words:
try:
if not word in self.wordDB.keys():
wordsWeights = {}
afterwords = [self.words[i + 1] for i, e in
enumerate(self.words) if e == word]
for aw in afterwords:
if not aw in wordsWeights.keys():
wordsWeights[aw] = afterwords.count(aw)
self.wordDB[word] = [self.words.count(word), wordsWeights]
except:
pass
out = open("mchain.pkl",'wb')
pickle.dump(self.wordDB, out, -1)
out.close()

My problem is, I can't seem to get it to unpickle. When I attempt to
saved data, I get:

AttributeError: 'tuple' object has no attribute 'readline'

with pickle, and

The code that created the dictionary is interesting, but not very
relevant. Please consider posting the code that is actually giving the
error!
>
Looking at the pickle pages on docs.python.org, I see that I am
indeed
supposed to be able to pickle ``tuples, lists, sets, and dictionaries
containing only picklable objects''.

I'm sure I'm missing something obvious. Clues?
The docs for pickle.load(file) say """
Read a string from the open file object file and interpret it as a
pickle data stream, reconstructing and returning the original object
hierarchy. This is equivalent to Unpickler(file).load().

file must have two methods, a read() method that takes an integer
argument, and a readline() method that requires no arguments. Both
methods should return a string. Thus file can be a file object opened
for reading, a StringIO object, or any other custom object that meets
this interface.
"""

The error message(s) [plural??] that you are getting suggest(s) that
the argument that you supplied was *not* an open file object nor
anything else with both a read and readline method. Open the file in
binary mode ('rb') and pass the result to pickle.load.
Dec 1 '07 #6
On Dec 2, 9:49 am, Paul Rubin <http://phr...@NOSPAM.invalidwrote:
John Machin <sjmac...@lexicon.netwrites:
self.words is obviously an iterable (can you see "for word in
self.words" in his code?), probably just a list.

It could be a file, in which case its iterator method would read lines
from the file and cause that error message.
Impossible:
(1) in "for word in words:" each word would end in "\n" and he'd have
to strip those and there's no evidence of that.
(2) Look at the line """afterwords = [self.words[i + 1] for i, e in
enumerate(self.words) if e == word]"""
and tell me how that works if self.words is a file!
(3) "self.words.count(word)" -- AttributeError: 'file' object has no
attribute 'count'

But I think the answer is
that the pickle itself needs to be opened in binary mode, as someone
else posted.
The answer is (1) he needs to supply a file of any kind for a start
[read the error messages that he got!!]
(2) despite the silence of the docs, it is necessary to have opened
the file in binary mode on systems where it makes a difference
(notably Windows)

[If the OP is still reading this thread, here's an example of how to
show a problem, with minimal code that reproduces the problem, and all
the output including the stack trace]

C:\junk>type dpkl.py
import pickle

d = {'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1,
{'down':
1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
{'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}

s = pickle.dumps(d, -1)
print "string", dnews == d

out = open("mchain.pkl",'wb')
pickle.dump(d, out, -1)
out.close()

f = open("mchain.pkl", "rb")
f.close()
print "load binary", dnewb == d

f = open("mchain.pkl", "r")
f.close()
print "load text", dnewa == d

C:\junk>python dpkl.py
string True
Traceback (most recent call last):
File "dpkl.py", line 24, in <module>
File "c:\python25\lib\pickle.py", line 1370, in load
File "c:\python25\lib\pickle.py", line 858, in load
dispatch[key](self)
File "c:\python25\lib\pickle.py", line 1169, in load_binput
TypeError: ord() expected a character, but string of length 0 found

Changing the first line to
import cPickle as pickle
gives this:

C:\junk>python dpkl.py
string True
Traceback (most recent call last):
File "dpkl.py", line 24, in <module>
EOFError

Each of the two different errors indicate that reading was terminated
prematurely by the presence of the good ol' ^Z aka CPMEOF in the file:
s.find(chr(26))
179
>>len(s)
363

HTH,
John
Dec 2 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.