473,320 Members | 1,694 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

a question on python dict

Python dict is a hash table, isn't it?

I know that hashtable has the concept of "bucket size" and "min bucket
count" stuff,
and they should be configurable so I can set them to the proper value
when I know how many items I'm going to handle.

If these two values can't be set, the hashtable will give them default
values. When there are more and more items being added to the
hashtable, it increase its buckets and copy the old items into the new
one and re-calculate the hash value of each item.

I think this will waste some time doing the increasing-copying thing.
If I know I'm going to handle about 20000 items, I can set the size of
hashtable to 30000.

So, can I do this in python? I can't figure it out so I come here for
help.

Thanks!

Dec 21 '06 #1
10 1851
[co*******@gmail.com]
Python dict is a hash table, isn't it?
Yup.
I know that hashtable has the concept of "bucket size" and "min bucket
count" stuff,
Some implementations of hash tables do. Python's does not. Python's
uses what's called "open addressing" instead.
and they should be configurable so I can set them to the proper value
when I know how many items I'm going to handle.

If these two values can't be set, the hashtable will give them default
values. When there are more and more items being added to the
hashtable, it increase its buckets and copy the old items into the new
one and re-calculate the hash value of each item.
That's several distinct claims, each of which is true of some hash
table implementations but not of others. Python's implementation has
no "buckets", so all that's simply irrelevant. Python's
implementation (not all do) saves the hash value of each item, so when
it does need to grow (or shrink) the space allocated to the table, it
does not need to recalculate the hash value of any item.

You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).
I think this will waste some time doing the increasing-copying thing.
It does take time to resize. "Waste" is a value judgment ;-)
If I know I'm going to handle about 20000 items, I can set the size of
hashtable to 30000.

So, can I do this in python?
You can't. Unless you build up to 20000 items over and over (many
thousands of times), it's unlikely you'd be able to measure the time
difference even if you could.
Dec 21 '06 #2
Thank you very much for your explanation!

I made a mistake that I said the hash value should be recalculated each
time the dict resize, what I wanted to say was that the position of
each item should be recalculated.

Maybe I should take a look at the source code of python dict some day.
I'll some here for help then.

Thanks again!
Tim Peters wrote:
[co*******@gmail.com]
Python dict is a hash table, isn't it?

Yup.
I know that hashtable has the concept of "bucket size" and "min bucket
count" stuff,

Some implementations of hash tables do. Python's does not. Python's
uses what's called "open addressing" instead.
and they should be configurable so I can set them to the proper value
when I know how many items I'm going to handle.

If these two values can't be set, the hashtable will give them default
values. When there are more and more items being added to the
hashtable, it increase its buckets and copy the old items into the new
one and re-calculate the hash value of each item.

That's several distinct claims, each of which is true of some hash
table implementations but not of others. Python's implementation has
no "buckets", so all that's simply irrelevant. Python's
implementation (not all do) saves the hash value of each item, so when
it does need to grow (or shrink) the space allocated to the table, it
does not need to recalculate the hash value of any item.

You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).
I think this will waste some time doing the increasing-copying thing.

It does take time to resize. "Waste" is a value judgment ;-)
If I know I'm going to handle about 20000 items, I can set the size of
hashtable to 30000.

So, can I do this in python?

You can't. Unless you build up to 20000 items over and over (many
thousands of times), it's unlikely you'd be able to measure the time
difference even if you could.
Dec 21 '06 #3
co*******@gmail.com wrote:
Thank you very much for your explanation!

I made a mistake that I said the hash value should be recalculated
each time the dict resize, what I wanted to say was that the position
of each item should be recalculated.

Maybe I should take a look at the source code of python dict some day.
I'll some here for help then.
In your original message you said:
I think this will waste some time doing the increasing-copying thing.
If I know I'm going to handle about 20000 items, I can set the size of
hashtable to 30000.
Have you actually tried timing how much time is wasted doing this? 20000
items really isn't very many for a dictionary to handle.

I ran a quick test to see how long it takes to create a dictionary with
20000 items. Remember, as has been explained to you, the items are not
copied, nor are the hash values recalculated when the dictionary is
resized, so if your dictionary takes much longer to create than the test
below gives you (on your machine that is rather than mine), then the
problem lies elsewhere. e.g. a slow hash function, or the time taken to
create whatever you are storing in the dictionary.

Just creating the dictionary with no other Python overhead:

timeit.py -s "import random; r = [ random.random() for i in range(20000)]"
"d = dict.fromkeys(r)"
100 loops, best of 3: 11.3 msec per loop

Creating the dictionary using a Python for loop:

timeit.py -s "import random; r = [ random.random() for i in range(20000)]"
"d={}" "for k in r: d[k] = None"
100 loops, best of 3: 11.7 msec per loop

Assigning to the elements in a pre-populated (and therefore the right
size) dictionary:

timeit.py -s "import random; r = [ random.random() for i in range(20000)];
d=dict.fromkeys(r)" "for k in r: d[k] = None"
100 loops, best of 3: 10.1 msec per loop

How many times do you create this dictionary that shaving 1.5 milliseconds
off 11 millseconds matters to you?

FWIW, for a 2,000,000 element dictionary the times are 2.5s and 1.48s
respectively so the situation does get progressively worse as dictionaries
get very large.
Dec 21 '06 #4
In message <ma***************************************@python. org>, Tim
Peters wrote:
You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).
Actually, no. It also consists of updating reference counts as well.
Dec 31 '06 #5
[Tim Peters]
>You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).
[Lawrence D'Oliveiro]
Actually, no. It also consists of updating reference counts as well.
Not in context: dict resizing is refcount-neutral, and the CPython
implementation of dict resizing exploits that (quite directly in 2.5;
indirectly before 2.5). It's not just that refcounts end up the same
/across/ dict resizing, it's that they're not changed internally either for
the duration of the resizing operation.
Dec 31 '06 #6
In message <Xn***********************@216.196.97.136>, Tim Peters wrote:
[Tim Peters]
>>You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).

[Lawrence D'Oliveiro]
>Actually, no. It also consists of updating reference counts as well.

Not in context: dict resizing is refcount-neutral...
The question wasn't about resizing, it was about "copying a dict key or
value".
Dec 31 '06 #7
[Tim Peters]
>>>You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).
[Lawrence D'Oliveiro]
>>Actually, no. It also consists of updating reference counts as well.
[Tim Peters]
>Not in context: dict resizing is refcount-neutral...
[Lawrence D'Oliveiro]
The question wasn't about resizing, it was about "copying a dict key or
value".
Then you misunderstood the OP. He was asking about dict resizing:

...

When there are more and more items being added to the
hashtable, it increase its buckets and copy the old items into the
new one and re-calculate the hash value of each item.

I think this will waste some time doing the increasing-copying
thing.

Taking my response out of context to begin with doesn't really change that
I answered the question he asked ;-)
Dec 31 '06 #8
In message <Xn***********************@216.196.97.136>, Tim Peters wrote:
[Tim Peters]
>>>>You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).

[Lawrence D'Oliveiro]
>>>Actually, no. It also consists of updating reference counts as well.

[Tim Peters]
>>Not in context: dict resizing is refcount-neutral...

[Lawrence D'Oliveiro]
>The question wasn't about resizing, it was about "copying a dict key or
value".

Then you misunderstood the OP. He was asking about dict resizing:
Which is not what you were talking about, in the part of your posting that I
was responding to (above).
Jan 1 '07 #9
Tim Peters wrote:
Taking my response out of context to begin with doesn't really change that
I answered the question he asked ;-)
welcome to comp.lang.python.

</F>

Jan 3 '07 #10
[Tim Peters]
>...
Taking my response out of context to begin with doesn't really change
that I answered the question he asked ;-)
[Fredrik Lundh]
welcome to comp.lang.python.

</F>
Thanks for the welcome! It's tough to be a newbie here ;-)
Jan 3 '07 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Dave Benjamin | last post by:
Hey good people, I've been doing a lot of simultaneous Jython and CPython programming lately, and just wanted to say, with no intended ill will toward any of the individuals who have been...
7
by: svilen | last post by:
hello again. i'm now into using python instead of another language(s) for describing structures of data, including names, structure, type-checks, conversions, value-validations, metadata etc....
105
by: Peter Hickman | last post by:
Well after all this discussion it would appear that a 'Python like' language has appeared => Prothon. http://www.prothon.org/index.html Very alpha, sort of like Python (if you consider the...
3
by: fdsl ysnh | last post by:
--- python-list-request@python.orgдµÀ: > Send Python-list mailing list submissions to > python-list@python.org > > To subscribe or unsubscribe via the World Wide Web, > visit >...
13
by: Steven Bethard | last post by:
Jean-Paul Calderone <exarkun@divmod.comwrote: Interesting. Could you give a few illustrations of this? (I didn't run into the same problem at all, so I'm curious.) Steve
18
by: Marko.Cain.23 | last post by:
Hi, I create a dictionary like this myDict = {} and I add entry like this: myDict = 1 but how can I empty the whole dictionary? Thank you.
0
by: rkmr.em | last post by:
the memory usage of a python app keeps growing in a x86 64 linux continuously, whereas in 32 bit linux this is not the case. Python version in both 32 bit and 64 bit linux - 2.6.24.4-64.fc8 Python...
20
by: Mr.SpOOn | last post by:
Hi, I need a structure to represent a set of integers. I also need to perform on this set some basic set operations, such as adding or removing elements, joining with other sets and checking for...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.