By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,625 Members | 1,274 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,625 IT Pros & Developers. It's quick & easy.

Re: PEP 372 -- Adding an ordered directory to collections

P: n/a
dbpoko...:
Why keep the normal dict operations at the same speed? There is a
substantial cost this entails.
I presume now we can create a list of possible odict usages, because I
think that despite everyone using it for different purposes, we may
find some main groups of its usage. I use odicts is situations where
an dict is nearly the right data structure, so keeping all operations
close to the time complexity of dicts has a purpose.

but the storage requirements are reduced to 2n from 4n.
In Python 2.5 a dict(int:None) needs about 36.2 bytes/element. I am
suggesting to add 2 pointers, to create a linked list, so it probably
becomes (on 32 bit systems) about 44.2 bytes/pair.

Note that computer science is full of strange data structures, so
maybe a skip list can be used here, to increase some operation
timings, and reduce other ones... :-)

Bye,
bearophile
Jun 27 '08 #1
Share this Question
Share on Google+
4 Replies


P: n/a
On Jun 18, 3:15 pm, bearophileH...@lycos.com wrote:
In Python 2.5 a dict(int:None) needs about 36.2 bytes/element. I am
suggesting to add 2 pointers, to create a linked list, so it probably
becomes (on 32 bit systems) about 44.2 bytes/pair.
PyDictEntry is
typedef struct {
Py_ssize_t me_hash;
PyObject *me_key;
PyObject *me_value;
} PyDictEntry;

Which should be 12 bytes on a 32-bit machine. I thought the space for
growth factor for dicts was about 12% but it is really 100%. In any
case, a pair of lists will take up less space than a dict and a list.
Or the storage could be an array of PyDictEntrys (to cache the hash
values of the keys), an approach that is in some sense halfway between
the others.

There is one advantage of this last approach - I think the amount of
hacking on dictobject.c that would have to take place is minimal. In
fact it almost seems like you could get the desired result by setting
mp->ma_lookup to a new function (and keep most of the rest of the
methods as they are). This seems too easy though, so there might be a
catch.

David
Jun 27 '08 #2

P: n/a
dbpoko...:
Which should be 12 bytes on a 32-bit machine. I thought the space for
growth factor for dicts was about 12% but it is really 100%.
(Please ignore the trailing ".2" in my number in my last post, such
precision is silly).
My memory value comes from experiments, I have created a little
program like this:

from memory import memory

def main(N):
m1 = memory()
print m1

d = {}
for i in xrange(N):
d[i] = None

m2 = memory()
print m2
print float((m2 - m1) * 1024) / N
main(20000000)

Where memory is a small module of mine that calls a little known
program that tells how much memory is used by the current Python
process. The results for that run n=20000000 are (first two numbers
are kilobytes, the third number is byte/pair):

1876
633932
32.3612672

It means to store 20_000_000 pairs it requires about 647_000_000
bytes, Python 2.5.2, on Win.

Bye,
bearophile
Jun 27 '08 #3

P: n/a
be************@lycos.com wrote:
My memory value comes from experiments, I have created a little
program like this:

from memory import memory

def main(N):
m1 = memory()
print m1

d = {}
for i in xrange(N):
d[i] = None

m2 = memory()
print m2
print float((m2 - m1) * 1024) / N
main(20000000)

Where memory is a small module of mine that calls a little known
program that tells how much memory is used by the current Python
process. The results for that run n=20000000 are (first two numbers
are kilobytes, the third number is byte/pair):

1876
633932
32.3612672

It means to store 20_000_000 pairs it requires about 647_000_000
bytes, Python 2.5.2, on Win.
What do you get if you change the output to exclude the integers from
the memory calculation so you are only looking at the dictionary
elements themselves? e.g.

def main(N):
keys = range(N)
m1 = memory()
print m1

d = {}
for i in keys:
d[i] = None

m2 = memory()
print m2
print float((m2 - m1) * 1024) / N
main(20000000)
--
Duncan Booth http://kupuguy.blogspot.com
Jun 27 '08 #4

P: n/a
Duncan Booth:
What do you get if you change the output to exclude the integers from
the memory calculation so you are only looking at the dictionary
elements themselves? e.g.
The results:

318512 (kbytes)
712124 (kbytes)
20.1529344 (bytes)

Bye,
bearophile
Jun 27 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.