By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,096 Members | 1,568 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,096 IT Pros & Developers. It's quick & easy.

Pickling dictionaries containing dictionaries: failing,recursion-style!

P: n/a
I'm having great fun playing with Markov chains. I am making a
dictionary of all the words in a given string, getting a count of how
many appearances word1 makes in the string, getting a list of all the
word2s that follow each appearance of word1 and a count of how many
times word2 appears in the string as well. (I know I should probably
be only counting how many times word2 actually follows word1, but as I
said, I'm having great fun playing ...)
printed output of the dictionary looks like so:

{'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1, {'down':
1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
{'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}

Here's the actual function.

def assembleVocab(self):
self.wordDB = {}
for word in self.words:
try:
if not word in self.wordDB.keys():
wordsWeights = {}
afterwords = [self.words[i + 1] for i, e in
enumerate(self.words) if e == word]
for aw in afterwords:
if not aw in wordsWeights.keys():
wordsWeights[aw] = afterwords.count(aw)
self.wordDB[word] = [self.words.count(word), wordsWeights]
except:
pass
out = open("mchain.pkl",'wb')
pickle.dump(self.wordDB, out, -1)
out.close()

My problem is, I can't seem to get it to unpickle. When I attempt to
load the
saved data, I get:

AttributeError: 'tuple' object has no attribute 'readline'

with pickle, and

TypeError: argument must have 'read' and 'readline' attributes

Looking at the pickle pages on docs.python.org, I see that I am
indeed
supposed to be able to pickle ``tuples, lists, sets, and dictionaries
containing only picklable objects''.

I'm sure I'm missing something obvious. Clues?
Dec 1 '07 #1
Share this Question
Share on Google+
6 Replies


P: n/a
lysdexia <do**********@gmail.comwrites:
self.wordDB[word] = [self.words.count(word), wordsWeights]
what is self.words.count? Could it be an iterator? I don't think you
can pickle those.
Dec 1 '07 #2

P: n/a
Are you opening the file in binary mode ("rb") before doing pickle.load on it?

On 01 Dec 2007 14:13:33 -0800, Paul Rubin
<"http://phr.cx"@nospam.invalidwrote:
lysdexia <do**********@gmail.comwrites:
self.wordDB[word] = [self.words.count(word), wordsWeights]

what is self.words.count? Could it be an iterator? I don't think you
can pickle those.

--
http://mail.python.org/mailman/listinfo/python-list


--
-David
Dec 1 '07 #3

P: n/a
On Dec 2, 9:13 am, Paul Rubin <http://phr...@NOSPAM.invalidwrote:
lysdexia <doug.shaw...@gmail.comwrites:
self.wordDB[word] = [self.words.count(word), wordsWeights]

what is self.words.count? Could it be an iterator? I don't think you
can pickle those.
Whaaaat??
self.words is obviously an iterable (can you see "for word in
self.words" in his code?), probably just a list.
self.words.count looks like a standard sequence method to me.
self.words.count(word) will return an int -- can you see all those
"[1,", "[2," etc in his printed dict output?
Dec 1 '07 #4

P: n/a
John Machin <sj******@lexicon.netwrites:
self.words is obviously an iterable (can you see "for word in
self.words" in his code?), probably just a list.
It could be a file, in which case its iterator method would read lines
from the file and cause that error message. But I think the answer is
that the pickle itself needs to be opened in binary mode, as someone
else posted.
Dec 1 '07 #5

P: n/a
On Dec 2, 8:59 am, lysdexia <doug.shaw...@gmail.comwrote:
I'm having great fun playing with Markov chains. I am making a
dictionary of all the words in a given string, getting a count of how
many appearances word1 makes in the string, getting a list of all the
word2s that follow each appearance of word1 and a count of how many
times word2 appears in the string as well. (I know I should probably
be only counting how many times word2 actually follows word1, but as I
said, I'm having great fun playing ...)

printed output of the dictionary looks like so:

{'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1, {'down':
1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
{'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}

Here's the actual function.

def assembleVocab(self):
self.wordDB = {}
for word in self.words:
try:
if not word in self.wordDB.keys():
wordsWeights = {}
afterwords = [self.words[i + 1] for i, e in
enumerate(self.words) if e == word]
for aw in afterwords:
if not aw in wordsWeights.keys():
wordsWeights[aw] = afterwords.count(aw)
self.wordDB[word] = [self.words.count(word), wordsWeights]
except:
pass
out = open("mchain.pkl",'wb')
pickle.dump(self.wordDB, out, -1)
out.close()

My problem is, I can't seem to get it to unpickle. When I attempt to
load the
saved data, I get:

AttributeError: 'tuple' object has no attribute 'readline'

with pickle, and

TypeError: argument must have 'read' and 'readline' attributes
The code that created the dictionary is interesting, but not very
relevant. Please consider posting the code that is actually giving the
error!
>
Looking at the pickle pages on docs.python.org, I see that I am
indeed
supposed to be able to pickle ``tuples, lists, sets, and dictionaries
containing only picklable objects''.

I'm sure I'm missing something obvious. Clues?
The docs for pickle.load(file) say """
Read a string from the open file object file and interpret it as a
pickle data stream, reconstructing and returning the original object
hierarchy. This is equivalent to Unpickler(file).load().

file must have two methods, a read() method that takes an integer
argument, and a readline() method that requires no arguments. Both
methods should return a string. Thus file can be a file object opened
for reading, a StringIO object, or any other custom object that meets
this interface.
"""

The error message(s) [plural??] that you are getting suggest(s) that
the argument that you supplied was *not* an open file object nor
anything else with both a read and readline method. Open the file in
binary mode ('rb') and pass the result to pickle.load.
Dec 1 '07 #6

P: n/a
On Dec 2, 9:49 am, Paul Rubin <http://phr...@NOSPAM.invalidwrote:
John Machin <sjmac...@lexicon.netwrites:
self.words is obviously an iterable (can you see "for word in
self.words" in his code?), probably just a list.

It could be a file, in which case its iterator method would read lines
from the file and cause that error message.
Impossible:
(1) in "for word in words:" each word would end in "\n" and he'd have
to strip those and there's no evidence of that.
(2) Look at the line """afterwords = [self.words[i + 1] for i, e in
enumerate(self.words) if e == word]"""
and tell me how that works if self.words is a file!
(3) "self.words.count(word)" -- AttributeError: 'file' object has no
attribute 'count'

But I think the answer is
that the pickle itself needs to be opened in binary mode, as someone
else posted.
The answer is (1) he needs to supply a file of any kind for a start
[read the error messages that he got!!]
(2) despite the silence of the docs, it is necessary to have opened
the file in binary mode on systems where it makes a difference
(notably Windows)

[If the OP is still reading this thread, here's an example of how to
show a problem, with minimal code that reproduces the problem, and all
the output including the stack trace]

C:\junk>type dpkl.py
import pickle

d = {'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1,
{'down':
1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
{'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}

s = pickle.dumps(d, -1)
dnews = pickle.loads(s)
print "string", dnews == d

out = open("mchain.pkl",'wb')
pickle.dump(d, out, -1)
out.close()

f = open("mchain.pkl", "rb")
dnewb = pickle.load(f)
f.close()
print "load binary", dnewb == d

f = open("mchain.pkl", "r")
dnewa = pickle.load(f)
f.close()
print "load text", dnewa == d

C:\junk>python dpkl.py
string True
load binary True
Traceback (most recent call last):
File "dpkl.py", line 24, in <module>
dnewa = pickle.load(f)
File "c:\python25\lib\pickle.py", line 1370, in load
return Unpickler(file).load()
File "c:\python25\lib\pickle.py", line 858, in load
dispatch[key](self)
File "c:\python25\lib\pickle.py", line 1169, in load_binput
i = ord(self.read(1))
TypeError: ord() expected a character, but string of length 0 found

Changing the first line to
import cPickle as pickle
gives this:

C:\junk>python dpkl.py
string True
load binary True
Traceback (most recent call last):
File "dpkl.py", line 24, in <module>
dnewa = pickle.load(f)
EOFError

Each of the two different errors indicate that reading was terminated
prematurely by the presence of the good ol' ^Z aka CPMEOF in the file:
>>s = open('mchain.pkl', 'rb').read()
s.find(chr(26))
179
>>len(s)
363

HTH,
John
Dec 2 '07 #7

This discussion thread is closed

Replies have been disabled for this discussion.