473,725 Members | 1,781 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Pickling limitation with instances defining __cmp__/__hash__?

I've come across a limitation in unpickling certain types of complex
data structures which involve instances that override __hash__, and was
wondering if it was known (basic searches didn't seem to come up with
anything similar) and if there is a workaround for it short of
restructuring the data structures in question.

The fundamental issue rests with defining classes which override __cmp__
and __hash__ in order to be used as keys in dictionaries (and elements
of sets). __cmp__ and __hash__ are defined to manipulate a single
attribute of the class, which never changes for the lifetime of an
object. In a simplified form:

class C:

def __init__(self, x):
self.x = x

def __cmp__(self, other):
return cmp(self.x, other.x)

def __hash__(self):
return hash(self.x)

Even if C contains other members which are manipulated, making it
technically mutable, since the one attribute (in this example, x) which
is used for __cmp__ and __hash__ is never changed after the creation of
the object, it is legal to use as a dictionary key. (Formally, the
atrribute in question is a name which is guaranteed to be unique.)

The difficulty arises when the data structures that are built up in C
contain a circular reference to itself as a dictionary key. In my
particular case the situation is rather involved, but the simplest
example which reproduces the problem (using C) would be:

c = C(1)
c.m = {c: '1'}

So far this is fine and behaves as expected. Pickling the object c
results in no problems. Unpickling it, however, results in an error:

data = pickle.dumps(c)
d = pickle.loads(da ta) # line 25

Traceback (most recent call last):
File "/home/max/tmp/hash.py", line 25, in ?
d = pickle.loads(da ta)
File "/usr/local/lib/python2.4/pickle.py", line 1394, in loads
return Unpickler(file) .load()
File "/usr/local/lib/python2.4/pickle.py", line 872, in load
File "/usr/local/lib/python2.4/pickle.py", line 1218, in load_setitem
dict[key] = value
File "/home/max/tmp/hash.py", line 15, in __hash__
return hash(self.x)
AttributeError: C instance has no attribute 'x'

By poking around, one can see that the error is occurring because the
unpickler algorithm is trying to use the instance as a key in a
dictionary before the instance has been completely initialized (in fact,
the __dict__ of this object is the empty dictionary!).

The error happens regardless of whether pickle or cPickle is used (so I
used pickle to give a more meaningful traceback above), nor whether the
protocol is 0 or HIGHEST_PROTOCO L.

Is this issue known? I don't see any mention of this kind of
circularity in the Python Library Reference 3.14.4. Second, is there
any reasonably straightforward workaround to this limitation, short of
reworking things so that these self-referenced objects aren't used as
dictionary keys?

Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
You'll learn / Life is worth it / Watch the tables turn
-- TLC
Jul 19 '05 #1
1 2275
Erik Max Francis wrote:
I've come across a limitation in unpickling certain types of complex
data structures which involve instances that override __hash__, and was
wondering if it was known (basic searches didn't seem to come up with
anything similar) and if there is a workaround for it short of
restructuring the data structures in question.

Replying to my own (old) post here, I finally got back to this and found
the best solution was to define surrogate set and dictionary classes
that internally used the IDs as keys, eliminating the circular
dependency. Examples of SafeSet and SafeDict serving this purpose
follow, though note that I only defined the methods that I used, rather
than the full and complete interfaces for sets and dictionaries (though
it should serve as an example for those who need to do more):

class SafeSet(_ReprMi xin):

def ider(thing):
return thing.id

def __init__(self, ider=None):
if ider is not None:
self.ider = ider
self._map = {} # id -> thing

def __len__(self):
return len(self._map)

def __contains__(se lf, thing):
return self.ider(thing ) in self._map

def add(self, thing):
key = self.ider(thing )
if self._map.has_k ey(key):
assert self._map[key] is thing
self._map[key] = thing

def remove(self, thing):
del self._map[self.ider(thing )]

def pop(self):
iterator = self._map.iterk eys()
next = iterator.next()
return self._map.pop(n ext)

def clear(self):
self._map.clear ()

def copy(self):
return copy.copy(self)

def update(self, sequence):
for thing in sequence:

def difference(self , other):
thisSet = set(self._map.i terkeys())
otherSet = set(other._map. iterkeys())
newSet = thisSet.differe nce(otherSet)
safeSet = SafeSet()
for key in newSet:
safeSet.add(sel f._map[key])
return safeSet

def __iter__(self):
return self._map.iterv alues()

def __str__(self):
return 'set(' + str(self._map.k eys()) + ')'
class SafeDict(_ReprM ixin):

def ider(thing):
return thing.id

def __init__(self, ider=None):
if ider is not None:
self.ider = ider
self._keys = {} # id -> key
self._values = {} # id -> value

def __len__(self):
return len(self._keys)

def __contains__(se lf, thing):
return self.ider(thing ) in self._keys

def __getitem__(sel f, thing):
return self._values[self.ider(thing )]

def __setitem__(sel f, thing, value):
key = self.ider(thing )
self._keys[key] = thing
self._values[key] = value

def __delitem__(sel f, thing, value):
key = self.ider(thing )
del self._keys[key]
del self._values[key]

def keys(self):
return self._keys.valu es()

def iterkeys(self):
return self._keys.iter values()

def values(self):
return self._values.va lues()

def itervalues(self ):
return self._values.it ervalues()

def items(self):
return [(self._keys[x], self._values[x]) for x in self._keys]

def iteritems(self) :
return ((self._keys[x], self._values[x]) for x in self._keys)

def clear(self):
self._keys.clea r()
self._values.cl ear()

def copy(self):
return copy.copy(self)

def update(self, mapping):
for key, value in mapping.iterite ms():
self[key] = value

def has_key(self, thing):
return self._keys.has_ key(self.ider(t hing))

def get(self, thing, default=None):
return self._values.ge t(self.ider(thi ng), default)

def setdefault(self , thing, default):
key = self.ider(thing )
if key in self._keys:
return self._values[key]
self._keys[key] = thing
self._values[key] = default

def __iter__(self):
return self._keys.iter values()

def __str__(self):
return str(self._value s)

Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
The only completely consistent people are the dead.
-- Aldous Huxley
Aug 9 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

by: christof hoeke | last post by:
hello, this must have come up before, so i am already sorry for asking but a quick googling did not give me any answer. i have a list from which i want a simpler list without the duplicates an easy but somehow contrived solution would be >>> a = >>> d = {}.fromkeys(a) >>> b = d.keys()
by: Jan-Erik Meyer-Lütgens | last post by:
In the Python Language Reference, I found the following statements about using objects as dictionary keys: 1. "__hash__() should return a 32-bit integer." 2. "The only required property is that objects which compare equal have the same hash value." 3. "If a class does not define a __cmp__() method it should not define a __hash__() operation either."
by: John Roth | last post by:
I'm adding a thread for comments on Gerrit Holl's pre-pep, which can be found here: http://tinyurl.com/2578q Frankly, I like the idea. It's about time that all of the file and directory stuff in the os module got objectified properly (or at least with some semblance of OO propriety!) In the issues section:
by: Hans Georg Krauthaeuser | last post by:
Dear all, I have a long running application (electromagnetic compatibility measurements in mode-stirred chambers over GPIB) that use pickle (cPickle) to autosave a class instance with all the measured data from time to time. At the beginning, pickling is quite fast but when the data becomes more and more pickling slows down rapidly.
by: Kirk Strauser | last post by:
I have a module that defines a Search class and a SearchResult class. I use these classes by writing other modules that subclass both of them as needed to interface with particular search engines. My problem is that Search defines a method (called automatically by __del__) to save its results between invocations: def _saveresults(self): self._oldresults = self._results file = open(self._storefile(), 'w')
by: jeanphilippe.aumasson | last post by:
Hi, I have some problems when pickling an instance of a class, i don't retrieve all its attributes instances after loading. I'm quite a beginner in Python, so it may be a stupid error... Here the full description : I have a class Test, having an attribute self.problem, as an instance of Problem class, defined in the same module. Problem as an attribute self.optimum, that is a list of Point
by: marduk | last post by:
I couldn't think of a good subject.. Basically, say I have a class class Spam: def __init__(self, x): self.x = x then if I create two instances:
by: Ben Finney | last post by:
Howdy all, I've recently packaged 'enum' in PyPI. In its description, I make the claim that it creates "immutable" enumeration objects, and that the enumeration values are "constant" values. This raises questions. Is there any difference between a Python immutable value, and a constant? I suppose "constant" also implies that the *name* binds
by: Ben Finney | last post by:
Howdy all, How can a (user-defined) class ensure that its instances are immutable, like an int or a tuple, without inheriting from those types? What caveats should be observed in making immutable instances? -- \ "Love is the triumph of imagination over intelligence." -- |
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.