473,664 Members | 3,035 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Better dict of dicts

I have a dictionary of dictionaries where the keys are typically very
long tuples and repeated in each inner dictionary. The dictionary
representation is nice because it handles sparseness well...and it is
nice to be able to look up values based on a string rather than a
number. However, since my keys are quite long, I worry that I am
wasting a lot of memory. I'm looking for better data structures.

Here is an example:
>>a = {"string_1": {"string_2": 1,
.... "string_3": 5,
.... "string_4": 10},
.... "string_2": {"string_2": 12,
.... "string_6": 2,
.... "string_1": 4}}

So as my strings get longer and longer, it seems that the dictionary of
dictionary representation is less and less efficient.

My thought was to subclass the dictionary structure....

keys = {"string_1": 1,
"string_2": 2,
"string_3": 3,
"string_4": 4,
"string_6": 5}

Then the underlying dictionary of dictionaries would look like:

a = {1:{2:1,3:5,4:1 0},2:{2:12,5:2, 1:4}}

Somehow I need to intercept every possible call though....such that

a["string_1"]["string_2"] actually calls a[1][2]

and

a.keys() returns ["string_1", "string_2", "string_3". ...]
rather than [1,2,3,4,5]

etc.

Ideally, I would like the option to have different key hashes for the
rows and columns as well.

Any ideas?
Apr 19 '07 #1
12 1809
On Apr 19, 5:24 pm, Bill Jackson <jack...@hotmai l.comwrote:
I have a dictionary of dictionaries where the keys are typically very
long tuples and repeated in each inner dictionary. The dictionary
representation is nice because it handles sparseness well...and it is
nice to be able to look up values based on a string rather than a
number. However, since my keys are quite long, I worry that I am
wasting a lot of memory.
I wouldn't worry about it. Try doing hash('string_2' ) in the
interpreter -- the output thereof is what's really being used as the
key. It doesn't use up any more memory than the integer 2.

Apr 19 '07 #2
Adam Atlas wrote:
On Apr 19, 5:24 pm, Bill Jackson <jack...@hotmai l.comwrote:
>I have a dictionary of dictionaries where the keys are typically very
long tuples and repeated in each inner dictionary. The dictionary
representati on is nice because it handles sparseness well...and it is
nice to be able to look up values based on a string rather than a
number. However, since my keys are quite long, I worry that I am
wasting a lot of memory.

I wouldn't worry about it. Try doing hash('string_2' ) in the
interpreter -- the output thereof is what's really being used as the
key. It doesn't use up any more memory than the integer 2.
Are you sure about that? Most dictionaries need to store the actual key,
in case of a collision, so when you lookup a key they can tell which
you're really looking for.
Apr 19 '07 #3
Bill Jackson schrieb:
I have a dictionary of dictionaries where the keys are typically very
long tuples and repeated in each inner dictionary.
What I don't understand here: you say the keys are tuples, yet later,
you show that the keys are strings. Which one is it?
The dictionary
representation is nice because it handles sparseness well...and it is
nice to be able to look up values based on a string rather than a
number. However, since my keys are quite long, I worry that I am
wasting a lot of memory. I'm looking for better data structures.

Here is an example:
>>>a = {"string_1": {"string_2": 1,
... "string_3": 5,
... "string_4": 10},
... "string_2": {"string_2": 12,
... "string_6": 2,
... "string_1": 4}}

So as my strings get longer and longer, it seems that the dictionary of
dictionary representation is less and less efficient.
You seem to assume that every occurrence of "string_2" has separate
memory. That may or may not be the case, depending on where the strings
come from. For example, it may be that the keys are the very same strings:

s1 = "string_1"
s2 = "string_2"

a[s1][s2] = 1
a[s2][s1] = 4

Then, the memory for the string would exist only once.
My thought was to subclass the dictionary structure....

keys = {"string_1": 1,
"string_2": 2,
"string_3": 3,
"string_4": 4,
"string_6": 5}
Instead of doing that, I would use a procedure called "interning" .
You may want to use the builtin intern function, or your can
come up with your own interning:

interns = {}
def local_intern(s) :
return interns.setdefa ult(s, s)

Then, instead of

a[k1][k2] = value

do

a[local_intern(k1 )][local_intern(k2 )] = value

then all strings occur only once as keys, as long as the interns
dictionary isn't cleared.
Any ideas?
HTH,
Martin
Apr 19 '07 #4
John Bauman wrote:
Adam Atlas wrote:
>On Apr 19, 5:24 pm, Bill Jackson <jack...@hotmai l.comwrote:
>>I have a dictionary of dictionaries where the keys are typically very
long tuples and repeated in each inner dictionary. The dictionary
representatio n is nice because it handles sparseness well...and it is
nice to be able to look up values based on a string rather than a
number. However, since my keys are quite long, I worry that I am
wasting a lot of memory.

I wouldn't worry about it. Try doing hash('string_2' ) in the
interpreter -- the output thereof is what's really being used as the
key. It doesn't use up any more memory than the integer 2.
Are you sure about that? Most dictionaries need to store the actual key,
in case of a collision, so when you lookup a key they can tell which
you're really looking for.
Maybe this will help:

http://wiki.python.org/moin/DictionaryKeys

-Larry
Apr 19 '07 #5
On Thursday 19 April 2007, Bill Jackson wrote:
I have a dictionary of dictionaries where the keys are typically very
long tuples and repeated in each inner dictionary. The dictionary
representation is nice because it handles sparseness well...and it is
nice to be able to look up values based on a string rather than a
number. However, since my keys are quite long, I worry that I am
wasting a lot of memory. I'm looking for better data structures.
I think you may want to look into that rarely used function "intern"
(under "on-essential Built-in Functions").

Basically, Python keeps a cache of certain strings are are frequently used so
comparisons and dictionary lookups only require a pointer comparison. You
could then subclass dict (though using "DictMixin" could be better) like:

class IDict(DictMixin ):
def __setitem__(sel f, key, value):
key=intern(key)
self.__dict[key]=value

That's totally untested and incomplete, but you hopefully get the idea.

Python (or at least CPython) seems to auto intern some strings occasionally
(you could look at the source if you care about the exact rules).
Example:
>>a="1234567890 "
b="1234567890 "
a is b
True

So you don't have all that much to worry about.

Apr 19 '07 #6
Martin v. Löwis wrote the following on 04/19/2007 02:43 PM:
Bill Jackson schrieb:
>I have a dictionary of dictionaries where the keys are typically very
long tuples and repeated in each inner dictionary.

What I don't understand here: you say the keys are tuples, yet later,
you show that the keys are strings. Which one is it?
Sorry, I was just lazy. The keys will always be tuples...tuple of
strings, tuples of numbers, tuples of objects....simp ly tuples.
Instead of doing that, I would use a procedure called "interning" .
You may want to use the builtin intern function, or your can
come up with your own interning:

interns = {}
def local_intern(s) :
return interns.setdefa ult(s, s)

Then, instead of

a[k1][k2] = value

do

a[local_intern(k1 )][local_intern(k2 )] = value

then all strings occur only once as keys, as long as the interns
dictionary isn't cleared.
So, my structure is something like this:

a = {tuple_1: {tuple_2:value_ 1, tuple_3:value_2 },
tuple_4: {tuple_1:value_ 3, tuple_3:value_4 }
tuple_5: {tuple_2:value_ 5, tuple_3:value_6 , tuple_4:value_7 }}

Since the tuples from the inner dictionaries and the outer dictionaries
are frequently the same, I would benefit from using a single intern
function. Then, the tuples will always be "pointing" to the values
stored in the intern dictionary.

Now suppose there is little overlap between the keys for the outer
dictionary and the inner dictionaries... but still much overlap between
the various inner dictionaries. Then, there is no point in using an
intern function for the outer dictionary, but still a benefit for the
inner dictionary. Thus, something like the following would be appropriate:

a[k1][local_intern(k2 )] = value

Have I understood this properly?
Apr 19 '07 #7
On Thu, 19 Apr 2007 17:40:27 -0400, John Bauman wrote:
Adam Atlas wrote:
>On Apr 19, 5:24 pm, Bill Jackson <jack...@hotmai l.comwrote:
>>I have a dictionary of dictionaries where the keys are typically very
long tuples and repeated in each inner dictionary. The dictionary
representatio n is nice because it handles sparseness well...and it is
nice to be able to look up values based on a string rather than a
number. However, since my keys are quite long, I worry that I am
wasting a lot of memory.

I wouldn't worry about it. Try doing hash('string_2' ) in the
interpreter -- the output thereof is what's really being used as the
key. It doesn't use up any more memory than the integer 2.
Are you sure about that? Most dictionaries need to store the actual key,
in case of a collision, so when you lookup a key they can tell which
you're really looking for.

The key is already stored, so long as the string 'string_2' exists. And it
will continue to exist so long as the dictionary includes it as a key. An
extra copy isn't made (unless you make an extra copy yourself).

In other words, the dictionary stores a reference to the string, not a
copy of it:
>>s = "this is a long string"
d = {s: 1}
id(s)
-1209073840
>>id(d.keys()[0])
-1209073840
--
Steven.

Apr 19 '07 #8
On Thursday 19 April 2007, Bill Jackson wrote:
Martin v. Löwis wrote the following on 04/19/2007 02:43 PM:
Bill Jackson schrieb:
I have a dictionary of dictionaries where the keys are typically very
long tuples and repeated in each inner dictionary.
What I don't understand here: you say the keys are tuples, yet later,
you show that the keys are strings. Which one is it?

Sorry, I was just lazy. The keys will always be tuples...tuple of
strings, tuples of numbers, tuples of objects....simp ly tuples.
That'll change things a bit because intern only works with strings.

Of course, It's not so big a deal, but you will have to put together a class
to implement interning.
I wrote one for fun:

class ITuple(tuple):
_interns={}
def __new__(cls, tup):
if tup not in cls._interns:
itup=tuple.__ne w__(cls, tup)
cls._interns[tup]=itup
return cls._interns[tup]
def __init__(self, *args):
#Prevent multiple calls to __init__
if hasattr(self, "_inited"): return
tuple.__init__( self, *args)
self._inited=Tr ue
def __eq__(self, o):
#If the other is an ITuple, self==o iff self is o
if isinstance(o, ITuple):
return self is o
return tuple.__eq__(se lf, o)

>>t1=(1,2,3,4 ); t2=(1,2,3,4)
ti1=ITuple(t1 ); ti2=ITuple(t2)
print t1==t2, t1 is t2
True False
>>print ti1==ti2, ti1 is ti2
True True

That seems to work. Something to note is that the call overhead of the __eq__
function is large enough that unless you have a slow comparing tuple,
comparisons will be faster without it. Comparisons are fast if they are done
internally; so between builtin objects or identical (" is ") objects.

For an example, suppose you have:

class TTT(object):
def __eq__(self, o): return True
a,b=TTT(),TTT()

Then the follow comparisons are fast:
(1,2,3)==(1,2,3 )
(1,2,3,a)==(1,2 ,3,a)
(0,0,0,a)==(1,2 ,3,b)

The following are slow:
(1,2,3,a)==(1,2 ,3,b)

Note that the only slow case is the one where a.__eq__(b) is called. However,
a.__eq__(b) is assumed True is "a is b" is True. So chances are you'll want
to comment out the __eq__ function.
Apr 20 '07 #9
On Apr 20, 1:38 am, DillonCo <dillo...@comca st.netwrote:
On Thursday 19 April 2007, Bill Jackson wrote:
Martin v. Löwis wrote the following on 04/19/2007 02:43 PM:
Bill Jackson schrieb:
>I have a dictionary of dictionaries where the keys are typically very
>long tuples and repeated in each inner dictionary.
What I don't understand here: you say the keys are tuples, yet later,
you show that the keys are strings. Which one is it?
Sorry, I was just lazy. The keys will always be tuples...tuple of
strings, tuples of numbers, tuples of objects....simp ly tuples.

That'll change things a bit because intern only works with strings.

Of course, It's not so big a deal, but you will have to put together a class
to implement interning.
I wrote one for fun:

class ITuple(tuple):
_interns={}
def __new__(cls, tup):
if tup not in cls._interns:
itup=tuple.__ne w__(cls, tup)
cls._interns[tup]=itup
return cls._interns[tup]
def __init__(self, *args):
#Prevent multiple calls to __init__
if hasattr(self, "_inited"): return
tuple.__init__( self, *args)
self._inited=Tr ue
def __eq__(self, o):
#If the other is an ITuple, self==o iff self is o
if isinstance(o, ITuple):
return self is o
return tuple.__eq__(se lf, o)
>t1=(1,2,3,4) ; t2=(1,2,3,4)
ti1=ITuple(t1) ; ti2=ITuple(t2)
print t1==t2, t1 is t2
True False
>print ti1==ti2, ti1 is ti2

True True

That seems to work. Something to note is that the call overhead of the __eq__
function is large enough that unless you have a slow comparing tuple,
comparisons will be faster without it. Comparisons are fast if they are done
internally; so between builtin objects or identical (" is ") objects.

For an example, suppose you have:

class TTT(object):
def __eq__(self, o): return True
a,b=TTT(),TTT()

Then the follow comparisons are fast:
(1,2,3)==(1,2,3 )
(1,2,3,a)==(1,2 ,3,a)
(0,0,0,a)==(1,2 ,3,b)

The following are slow:
(1,2,3,a)==(1,2 ,3,b)

Note that the only slow case is the one where a.__eq__(b) is called. However,
a.__eq__(b) is assumed True is "a is b" is True. So chances are you'll want
to comment out the __eq__ function.
Hi DillonCo,
Martins earlier local_intern function would work for tuples as well as
strings.

- Paddy.

Apr 20 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

17
4277
by: Pierre Fortin | last post by:
Hi, Is the following a reasonable generic approach for converting python returned tuples/lists into dicts...? I'm not advocating library functions also return dicts (I'd probably spend more time looking up the real names... :) I'm just looking to make my code more readable and self-documenting... --------
8
2143
by: bearophileHUGS | last post by:
I'm frequently using Py2.4 sets, I find them quite useful, and I like them, even if they seem a little slower than dicts. Sets also need the same memory of dicts (can they be made to use less memory, not storing values? Maybe this requires too much code rewriting). I presume such sets are like this because they are kind of dicts. If this is true, then converting a dict to a set (that means converting just the keys; this is often useful for...
6
3448
by: David Rasmussen | last post by:
If I have a collection of dicts like: john = {'id': 1, 'name': "John Cleese", 'year': 1939} graham = {'id': 2, 'name': "Graham Chapman", 'year': 1941} I could store all of them in a list. But for easy lookup, I might store all these in a dict instead, like people = {'1': john, '2': graham}
5
1296
by: Alexander Kozlovsky | last post by:
Hello all! I have small silly syntax suggestion (SSSS) In many cases, keys in dictionary-like objects are strings, and moreover - valid Python identifiers. Something like: foo.x = y How about small syntactic sugar:
15
2444
by: George Sakkis | last post by:
Although I consider dict(**kwds) as one of the few unfortunate design choices in python since it prevents the future addition of useful keyword arguments (e.g a default value or an orderby function), I've been finding myself lately using it sometimes instead of dict literals, for no particular reason. Is there any coding style consensus on when should dict literals be preferred over dict(**kwds) and vice versa ? George
3
2918
by: aking | last post by:
Dear Python people, im a newbie to python and here...so hello! Im trying to iterate through values in a dictionary so i can find the closest value and then extract the key for that value....what ive done so far: def pcloop(dictionary, exvalue): z = dictionary.itervalues() y = z - exvalue
13
2343
by: vd12005 | last post by:
Hello, i would like to sort(ed) and reverse(d) the result of many huge dictionaries (a single dictionary will contain ~ 150000 entries). Keys are words, values are count (integer). i'm wondering if i can have a 10s of these in memory, or if i should proceed one after the other. but moreover i'm interested in saving theses as values, keys sorted and
4
3012
by: bearophileHUGS | last post by:
I have started doing practice creating C extensions for CPython, so here are two ideas I have had, possibly useless. If you keep adding elements to a CPython dict/set, it periodically rebuilds itself. So maybe dict.reserve(n) and a set.reserve(n) methods may help, reserving enough (empty) memory for about n *distinct* keys the programmer wants to add to the dict/set in a short future. I have seen that the the C API of the dicts doesn't...
7
4826
by: Benjamin | last post by:
How would I go about "flattening" a dict with many nested dicts within? The dicts might look like this: {"mays" : {"eggs" : "spam"}, "jam" : {"soda" : {"love" : "dump"}}, "lamba" : 23 } I'd like it to put "/" inbetween the dicts to make it a one dimensional dict and look like this: {"mays/eggs" : "spam", "jam/soda/love" : "dump",
0
8437
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8348
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8778
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8549
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8636
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7375
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5660
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4185
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
2003
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.