473,748 Members | 9,416 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

a question on python dict

Python dict is a hash table, isn't it?

I know that hashtable has the concept of "bucket size" and "min bucket
count" stuff,
and they should be configurable so I can set them to the proper value
when I know how many items I'm going to handle.

If these two values can't be set, the hashtable will give them default
values. When there are more and more items being added to the
hashtable, it increase its buckets and copy the old items into the new
one and re-calculate the hash value of each item.

I think this will waste some time doing the increasing-copying thing.
If I know I'm going to handle about 20000 items, I can set the size of
hashtable to 30000.

So, can I do this in python? I can't figure it out so I come here for
help.

Thanks!

Dec 21 '06 #1
10 1878
[co*******@gmail .com]
Python dict is a hash table, isn't it?
Yup.
I know that hashtable has the concept of "bucket size" and "min bucket
count" stuff,
Some implementations of hash tables do. Python's does not. Python's
uses what's called "open addressing" instead.
and they should be configurable so I can set them to the proper value
when I know how many items I'm going to handle.

If these two values can't be set, the hashtable will give them default
values. When there are more and more items being added to the
hashtable, it increase its buckets and copy the old items into the new
one and re-calculate the hash value of each item.
That's several distinct claims, each of which is true of some hash
table implementations but not of others. Python's implementation has
no "buckets", so all that's simply irrelevant. Python's
implementation (not all do) saves the hash value of each item, so when
it does need to grow (or shrink) the space allocated to the table, it
does not need to recalculate the hash value of any item.

You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).
I think this will waste some time doing the increasing-copying thing.
It does take time to resize. "Waste" is a value judgment ;-)
If I know I'm going to handle about 20000 items, I can set the size of
hashtable to 30000.

So, can I do this in python?
You can't. Unless you build up to 20000 items over and over (many
thousands of times), it's unlikely you'd be able to measure the time
difference even if you could.
Dec 21 '06 #2
Thank you very much for your explanation!

I made a mistake that I said the hash value should be recalculated each
time the dict resize, what I wanted to say was that the position of
each item should be recalculated.

Maybe I should take a look at the source code of python dict some day.
I'll some here for help then.

Thanks again!
Tim Peters wrote:
[co*******@gmail .com]
Python dict is a hash table, isn't it?

Yup.
I know that hashtable has the concept of "bucket size" and "min bucket
count" stuff,

Some implementations of hash tables do. Python's does not. Python's
uses what's called "open addressing" instead.
and they should be configurable so I can set them to the proper value
when I know how many items I'm going to handle.

If these two values can't be set, the hashtable will give them default
values. When there are more and more items being added to the
hashtable, it increase its buckets and copy the old items into the new
one and re-calculate the hash value of each item.

That's several distinct claims, each of which is true of some hash
table implementations but not of others. Python's implementation has
no "buckets", so all that's simply irrelevant. Python's
implementation (not all do) saves the hash value of each item, so when
it does need to grow (or shrink) the space allocated to the table, it
does not need to recalculate the hash value of any item.

You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).
I think this will waste some time doing the increasing-copying thing.

It does take time to resize. "Waste" is a value judgment ;-)
If I know I'm going to handle about 20000 items, I can set the size of
hashtable to 30000.

So, can I do this in python?

You can't. Unless you build up to 20000 items over and over (many
thousands of times), it's unlikely you'd be able to measure the time
difference even if you could.
Dec 21 '06 #3
co*******@gmail .com wrote:
Thank you very much for your explanation!

I made a mistake that I said the hash value should be recalculated
each time the dict resize, what I wanted to say was that the position
of each item should be recalculated.

Maybe I should take a look at the source code of python dict some day.
I'll some here for help then.
In your original message you said:
I think this will waste some time doing the increasing-copying thing.
If I know I'm going to handle about 20000 items, I can set the size of
hashtable to 30000.
Have you actually tried timing how much time is wasted doing this? 20000
items really isn't very many for a dictionary to handle.

I ran a quick test to see how long it takes to create a dictionary with
20000 items. Remember, as has been explained to you, the items are not
copied, nor are the hash values recalculated when the dictionary is
resized, so if your dictionary takes much longer to create than the test
below gives you (on your machine that is rather than mine), then the
problem lies elsewhere. e.g. a slow hash function, or the time taken to
create whatever you are storing in the dictionary.

Just creating the dictionary with no other Python overhead:

timeit.py -s "import random; r = [ random.random() for i in range(20000)]"
"d = dict.fromkeys(r )"
100 loops, best of 3: 11.3 msec per loop

Creating the dictionary using a Python for loop:

timeit.py -s "import random; r = [ random.random() for i in range(20000)]"
"d={}" "for k in r: d[k] = None"
100 loops, best of 3: 11.7 msec per loop

Assigning to the elements in a pre-populated (and therefore the right
size) dictionary:

timeit.py -s "import random; r = [ random.random() for i in range(20000)];
d=dict.fromkeys (r)" "for k in r: d[k] = None"
100 loops, best of 3: 10.1 msec per loop

How many times do you create this dictionary that shaving 1.5 milliseconds
off 11 millseconds matters to you?

FWIW, for a 2,000,000 element dictionary the times are 2.5s and 1.48s
respectively so the situation does get progressively worse as dictionaries
get very large.
Dec 21 '06 #4
In message <ma************ *************** ************@py thon.org>, Tim
Peters wrote:
You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).
Actually, no. It also consists of updating reference counts as well.
Dec 31 '06 #5
[Tim Peters]
>You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).
[Lawrence D'Oliveiro]
Actually, no. It also consists of updating reference counts as well.
Not in context: dict resizing is refcount-neutral, and the CPython
implementation of dict resizing exploits that (quite directly in 2.5;
indirectly before 2.5). It's not just that refcounts end up the same
/across/ dict resizing, it's that they're not changed internally either for
the duration of the resizing operation.
Dec 31 '06 #6
In message <Xn************ ***********@216 .196.97.136>, Tim Peters wrote:
[Tim Peters]
>>You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).

[Lawrence D'Oliveiro]
>Actually, no. It also consists of updating reference counts as well.

Not in context: dict resizing is refcount-neutral...
The question wasn't about resizing, it was about "copying a dict key or
value".
Dec 31 '06 #7
[Tim Peters]
>>>You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).
[Lawrence D'Oliveiro]
>>Actually, no. It also consists of updating reference counts as well.
[Tim Peters]
>Not in context: dict resizing is refcount-neutral...
[Lawrence D'Oliveiro]
The question wasn't about resizing, it was about "copying a dict key or
value".
Then you misunderstood the OP. He was asking about dict resizing:

...

When there are more and more items being added to the
hashtable, it increase its buckets and copy the old items into the
new one and re-calculate the hash value of each item.

I think this will waste some time doing the increasing-copying
thing.

Taking my response out of context to begin with doesn't really change that
I answered the question he asked ;-)
Dec 31 '06 #8
In message <Xn************ ***********@216 .196.97.136>, Tim Peters wrote:
[Tim Peters]
>>>>You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).

[Lawrence D'Oliveiro]
>>>Actually, no. It also consists of updating reference counts as well.

[Tim Peters]
>>Not in context: dict resizing is refcount-neutral...

[Lawrence D'Oliveiro]
>The question wasn't about resizing, it was about "copying a dict key or
value".

Then you misunderstood the OP. He was asking about dict resizing:
Which is not what you were talking about, in the part of your posting that I
was responding to (above).
Jan 1 '07 #9
Tim Peters wrote:
Taking my response out of context to begin with doesn't really change that
I answered the question he asked ;-)
welcome to comp.lang.pytho n.

</F>

Jan 3 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
2108
by: Dave Benjamin | last post by:
Hey good people, I've been doing a lot of simultaneous Jython and CPython programming lately, and just wanted to say, with no intended ill will toward any of the individuals who have been generous enough to make the two languages possible, that, well, they're kinda different. I guess it was inevitable, but with Jython stuck at Python 2.1, it's not really the same language as CPython is today. You still have to type "from __future__...
7
3665
by: svilen | last post by:
hello again. i'm now into using python instead of another language(s) for describing structures of data, including names, structure, type-checks, conversions, value-validations, metadata etc. And i have things to offer, and to request. And a lot of ideas, but who needs them.... here's an example (from type_struct.py):
105
5180
by: Peter Hickman | last post by:
Well after all this discussion it would appear that a 'Python like' language has appeared => Prothon. http://www.prothon.org/index.html Very alpha, sort of like Python (if you consider the indenting is what makes Python unique) and sort of Ruby in its use of prefixes to define scoping etc (although there is no reference to this trait being borrowed from Ruby). It also quotes Self as being an influence.
3
2834
by: fdsl ysnh | last post by:
--- python-list-request@python.orgдµÀ: > Send Python-list mailing list submissions to > python-list@python.org > > To subscribe or unsubscribe via the World Wide Web, > visit > http://mail.python.org/mailman/listinfo/python-list > or, via email, send a message with subject or body > 'help' to
13
2159
by: Steven Bethard | last post by:
Jean-Paul Calderone <exarkun@divmod.comwrote: Interesting. Could you give a few illustrations of this? (I didn't run into the same problem at all, so I'm curious.) Steve
18
2971
by: Marko.Cain.23 | last post by:
Hi, I create a dictionary like this myDict = {} and I add entry like this: myDict = 1 but how can I empty the whole dictionary? Thank you.
0
1178
by: rkmr.em | last post by:
the memory usage of a python app keeps growing in a x86 64 linux continuously, whereas in 32 bit linux this is not the case. Python version in both 32 bit and 64 bit linux - 2.6.24.4-64.fc8 Python 2.5.1 (r251:54863, Oct 30 2007, 13:45:26) i isolated the memory leak problem to a function that uses datetime module extensively. i use datetime.datetime.strptime, datetime.timedelta, datetime.datetime.now methods... i tried to get some info...
20
2460
by: Mr.SpOOn | last post by:
Hi, I need a structure to represent a set of integers. I also need to perform on this set some basic set operations, such as adding or removing elements, joining with other sets and checking for the presence of specific elements. I think that using Python sets would be the best choice, but I also need integers to be ordered inside the set and I've just found out that, instead, Python sets are unordered collections.
0
8984
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9363
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9312
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9238
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6793
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4593
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4864
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3300
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2206
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.