473,545 Members | 524 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

a question on python dict

Python dict is a hash table, isn't it?

I know that hashtable has the concept of "bucket size" and "min bucket
count" stuff,
and they should be configurable so I can set them to the proper value
when I know how many items I'm going to handle.

If these two values can't be set, the hashtable will give them default
values. When there are more and more items being added to the
hashtable, it increase its buckets and copy the old items into the new
one and re-calculate the hash value of each item.

I think this will waste some time doing the increasing-copying thing.
If I know I'm going to handle about 20000 items, I can set the size of
hashtable to 30000.

So, can I do this in python? I can't figure it out so I come here for
help.

Thanks!

Dec 21 '06 #1
10 1863
[co*******@gmail .com]
Python dict is a hash table, isn't it?
Yup.
I know that hashtable has the concept of "bucket size" and "min bucket
count" stuff,
Some implementations of hash tables do. Python's does not. Python's
uses what's called "open addressing" instead.
and they should be configurable so I can set them to the proper value
when I know how many items I'm going to handle.

If these two values can't be set, the hashtable will give them default
values. When there are more and more items being added to the
hashtable, it increase its buckets and copy the old items into the new
one and re-calculate the hash value of each item.
That's several distinct claims, each of which is true of some hash
table implementations but not of others. Python's implementation has
no "buckets", so all that's simply irrelevant. Python's
implementation (not all do) saves the hash value of each item, so when
it does need to grow (or shrink) the space allocated to the table, it
does not need to recalculate the hash value of any item.

You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).
I think this will waste some time doing the increasing-copying thing.
It does take time to resize. "Waste" is a value judgment ;-)
If I know I'm going to handle about 20000 items, I can set the size of
hashtable to 30000.

So, can I do this in python?
You can't. Unless you build up to 20000 items over and over (many
thousands of times), it's unlikely you'd be able to measure the time
difference even if you could.
Dec 21 '06 #2
Thank you very much for your explanation!

I made a mistake that I said the hash value should be recalculated each
time the dict resize, what I wanted to say was that the position of
each item should be recalculated.

Maybe I should take a look at the source code of python dict some day.
I'll some here for help then.

Thanks again!
Tim Peters wrote:
[co*******@gmail .com]
Python dict is a hash table, isn't it?

Yup.
I know that hashtable has the concept of "bucket size" and "min bucket
count" stuff,

Some implementations of hash tables do. Python's does not. Python's
uses what's called "open addressing" instead.
and they should be configurable so I can set them to the proper value
when I know how many items I'm going to handle.

If these two values can't be set, the hashtable will give them default
values. When there are more and more items being added to the
hashtable, it increase its buckets and copy the old items into the new
one and re-calculate the hash value of each item.

That's several distinct claims, each of which is true of some hash
table implementations but not of others. Python's implementation has
no "buckets", so all that's simply irrelevant. Python's
implementation (not all do) saves the hash value of each item, so when
it does need to grow (or shrink) the space allocated to the table, it
does not need to recalculate the hash value of any item.

You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).
I think this will waste some time doing the increasing-copying thing.

It does take time to resize. "Waste" is a value judgment ;-)
If I know I'm going to handle about 20000 items, I can set the size of
hashtable to 30000.

So, can I do this in python?

You can't. Unless you build up to 20000 items over and over (many
thousands of times), it's unlikely you'd be able to measure the time
difference even if you could.
Dec 21 '06 #3
co*******@gmail .com wrote:
Thank you very much for your explanation!

I made a mistake that I said the hash value should be recalculated
each time the dict resize, what I wanted to say was that the position
of each item should be recalculated.

Maybe I should take a look at the source code of python dict some day.
I'll some here for help then.
In your original message you said:
I think this will waste some time doing the increasing-copying thing.
If I know I'm going to handle about 20000 items, I can set the size of
hashtable to 30000.
Have you actually tried timing how much time is wasted doing this? 20000
items really isn't very many for a dictionary to handle.

I ran a quick test to see how long it takes to create a dictionary with
20000 items. Remember, as has been explained to you, the items are not
copied, nor are the hash values recalculated when the dictionary is
resized, so if your dictionary takes much longer to create than the test
below gives you (on your machine that is rather than mine), then the
problem lies elsewhere. e.g. a slow hash function, or the time taken to
create whatever you are storing in the dictionary.

Just creating the dictionary with no other Python overhead:

timeit.py -s "import random; r = [ random.random() for i in range(20000)]"
"d = dict.fromkeys(r )"
100 loops, best of 3: 11.3 msec per loop

Creating the dictionary using a Python for loop:

timeit.py -s "import random; r = [ random.random() for i in range(20000)]"
"d={}" "for k in r: d[k] = None"
100 loops, best of 3: 11.7 msec per loop

Assigning to the elements in a pre-populated (and therefore the right
size) dictionary:

timeit.py -s "import random; r = [ random.random() for i in range(20000)];
d=dict.fromkeys (r)" "for k in r: d[k] = None"
100 loops, best of 3: 10.1 msec per loop

How many times do you create this dictionary that shaving 1.5 milliseconds
off 11 millseconds matters to you?

FWIW, for a 2,000,000 element dictionary the times are 2.5s and 1.48s
respectively so the situation does get progressively worse as dictionaries
get very large.
Dec 21 '06 #4
In message <ma************ *************** ************@py thon.org>, Tim
Peters wrote:
You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).
Actually, no. It also consists of updating reference counts as well.
Dec 31 '06 #5
[Tim Peters]
>You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).
[Lawrence D'Oliveiro]
Actually, no. It also consists of updating reference counts as well.
Not in context: dict resizing is refcount-neutral, and the CPython
implementation of dict resizing exploits that (quite directly in 2.5;
indirectly before 2.5). It's not just that refcounts end up the same
/across/ dict resizing, it's that they're not changed internally either for
the duration of the resizing operation.
Dec 31 '06 #6
In message <Xn************ ***********@216 .196.97.136>, Tim Peters wrote:
[Tim Peters]
>>You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).

[Lawrence D'Oliveiro]
>Actually, no. It also consists of updating reference counts as well.

Not in context: dict resizing is refcount-neutral...
The question wasn't about resizing, it was about "copying a dict key or
value".
Dec 31 '06 #7
[Tim Peters]
>>>You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).
[Lawrence D'Oliveiro]
>>Actually, no. It also consists of updating reference counts as well.
[Tim Peters]
>Not in context: dict resizing is refcount-neutral...
[Lawrence D'Oliveiro]
The question wasn't about resizing, it was about "copying a dict key or
value".
Then you misunderstood the OP. He was asking about dict resizing:

...

When there are more and more items being added to the
hashtable, it increase its buckets and copy the old items into the
new one and re-calculate the hash value of each item.

I think this will waste some time doing the increasing-copying
thing.

Taking my response out of context to begin with doesn't really change that
I answered the question he asked ;-)
Dec 31 '06 #8
In message <Xn************ ***********@216 .196.97.136>, Tim Peters wrote:
[Tim Peters]
>>>>You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).

[Lawrence D'Oliveiro]
>>>Actually, no. It also consists of updating reference counts as well.

[Tim Peters]
>>Not in context: dict resizing is refcount-neutral...

[Lawrence D'Oliveiro]
>The question wasn't about resizing, it was about "copying a dict key or
value".

Then you misunderstood the OP. He was asking about dict resizing:
Which is not what you were talking about, in the part of your posting that I
was responding to (above).
Jan 1 '07 #9
Tim Peters wrote:
Taking my response out of context to begin with doesn't really change that
I answered the question he asked ;-)
welcome to comp.lang.pytho n.

</F>

Jan 3 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
2102
by: Dave Benjamin | last post by:
Hey good people, I've been doing a lot of simultaneous Jython and CPython programming lately, and just wanted to say, with no intended ill will toward any of the individuals who have been generous enough to make the two languages possible, that, well, they're kinda different. I guess it was inevitable, but with Jython stuck at Python 2.1,...
7
3641
by: svilen | last post by:
hello again. i'm now into using python instead of another language(s) for describing structures of data, including names, structure, type-checks, conversions, value-validations, metadata etc. And i have things to offer, and to request. And a lot of ideas, but who needs them.... here's an example (from type_struct.py):
105
5084
by: Peter Hickman | last post by:
Well after all this discussion it would appear that a 'Python like' language has appeared => Prothon. http://www.prothon.org/index.html Very alpha, sort of like Python (if you consider the indenting is what makes Python unique) and sort of Ruby in its use of prefixes to define scoping etc (although there is no reference to this trait being...
3
2816
by: fdsl ysnh | last post by:
--- python-list-request@python.orgĐ´µŔ: > Send Python-list mailing list submissions to > python-list@python.org > > To subscribe or unsubscribe via the World Wide Web, > visit > http://mail.python.org/mailman/listinfo/python-list > or, via email, send a message with subject or body > 'help' to
13
2143
by: Steven Bethard | last post by:
Jean-Paul Calderone <exarkun@divmod.comwrote: Interesting. Could you give a few illustrations of this? (I didn't run into the same problem at all, so I'm curious.) Steve
18
2949
by: Marko.Cain.23 | last post by:
Hi, I create a dictionary like this myDict = {} and I add entry like this: myDict = 1 but how can I empty the whole dictionary? Thank you.
0
1171
by: rkmr.em | last post by:
the memory usage of a python app keeps growing in a x86 64 linux continuously, whereas in 32 bit linux this is not the case. Python version in both 32 bit and 64 bit linux - 2.6.24.4-64.fc8 Python 2.5.1 (r251:54863, Oct 30 2007, 13:45:26) i isolated the memory leak problem to a function that uses datetime module extensively. i use...
20
2423
by: Mr.SpOOn | last post by:
Hi, I need a structure to represent a set of integers. I also need to perform on this set some basic set operations, such as adding or removing elements, joining with other sets and checking for the presence of specific elements. I think that using Python sets would be the best choice, but I also need integers to be ordered inside the set...
0
7390
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7743
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
5959
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
4940
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3441
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3437
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1867
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1010
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
692
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.