473,725 Members | 2,271 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Adding tuples to a dictionary

Hi Pythonistas!

I've got a question about storing tuples in a dictionary. First, a
small test case which creates a list of dictionaries:

import time

list_of_dicts = []
keys = [str(x) for x in range(200000)]
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
my_dict[key] = key
list_of_dicts.a ppend(my_dict)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk

It creates dictionaries and stores them in a list, printing out
execution times. The size of each dictionary is constant, so is the
execution time for each iteration.

0 0.1
1 0.1
2 0.1
3 0.08
4 0.09

....and so on.

Then, just one line is changed:
my_dict[key] = key
into:
my_dict[key] = (key, key)

Full code:

list_of_dicts = []
keys = [str(x) for x in range(200000)]
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
my_dict[key] = (key, key)
list_of_dicts.a ppend(my_dict)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk

The difference is that instead of single values, tuples are added to
the dictionary instead. When the program is run again:

0 0.27
1 0.37
2 0.49
3 0.6
....
16 2.32
17 2.45
18 2.54
19 2.68

The execution time is rising linearly with every new dictionary
created.

Next experiment: dictionaries are not stored in a list, they are just
left out when an iteration has finished. It's done by removing two
lines:

list_of_dicts = []

and

list_of_dicts.a ppend(my_dict)

Full code:

keys = [str(x) for x in range(200000)]
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
my_dict[key] = (key, key)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk

The time is constant again:

0 0.28
1 0.28
2 0.28
3 0.26
4 0.26

I see no reason for this kind of performance problem, really. It
happens when both things are true: dictionaries are kept in a list (or
more generally, in memory) and they store tuples.

As this goes beyond my understanding of Python internals, I would like
to kindly ask, if anyone has an idea about how to create this data
structure (list of dictionaries of tuples, assuming that size of all
dictionaries is the same), in constant time?

Regards,
Maciej

May 31 '07 #1
4 9805
Maciej Blizi?ski wrote:
Hi Pythonistas!

I've got a question about storing tuples in a dictionary. First, a
small test case which creates a list of dictionaries:

import time

list_of_dicts = []
keys = [str(x) for x in range(200000)]
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
my_dict[key] = key
list_of_dicts.a ppend(my_dict)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk

It creates dictionaries and stores them in a list, printing out
execution times. The size of each dictionary is constant, so is the
execution time for each iteration.

0 0.1
1 0.1
2 0.1
3 0.08
4 0.09

...and so on.

Then, just one line is changed:
my_dict[key] = key
into:
my_dict[key] = (key, key)

Full code:

list_of_dicts = []
keys = [str(x) for x in range(200000)]
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
my_dict[key] = (key, key)
list_of_dicts.a ppend(my_dict)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk

The difference is that instead of single values, tuples are added to
the dictionary instead. When the program is run again:

0 0.27
1 0.37
2 0.49
3 0.6
...
16 2.32
17 2.45
18 2.54
19 2.68

The execution time is rising linearly with every new dictionary
created.

Next experiment: dictionaries are not stored in a list, they are just
left out when an iteration has finished. It's done by removing two
lines:

list_of_dicts = []

and

list_of_dicts.a ppend(my_dict)

Full code:

keys = [str(x) for x in range(200000)]
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
my_dict[key] = (key, key)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk

The time is constant again:

0 0.28
1 0.28
2 0.28
3 0.26
4 0.26

I see no reason for this kind of performance problem, really. It
happens when both things are true: dictionaries are kept in a list (or
more generally, in memory) and they store tuples.

As this goes beyond my understanding of Python internals, I would like
to kindly ask, if anyone has an idea about how to create this data
structure (list of dictionaries of tuples, assuming that size of all
dictionaries is the same), in constant time?
Disable garbage collection:

import gc
gc.disable()

It uses inappropriate heuristics in this case.

Peter
May 31 '07 #2
On May 31, 8:30 pm, Maciej Bliziński <maciej.blizin. ..@gmail.com>
wrote:
Hi Pythonistas!

I've got a question about storing tuples in a dictionary. First, a
small test case which creates a list of dictionaries:

import time

list_of_dicts = []
keys = [str(x) for x in range(200000)]
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
my_dict[key] = key
list_of_dicts.a ppend(my_dict)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk

It creates dictionaries and stores them in a list, printing out
execution times. The size of each dictionary is constant, so is the
execution time for each iteration.

0 0.1
1 0.1
2 0.1
3 0.08
4 0.09

...and so on.

Then, just one line is changed:
my_dict[key] = key
into:
my_dict[key] = (key, key)

Full code:

list_of_dicts = []
keys = [str(x) for x in range(200000)]
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
my_dict[key] = (key, key)
list_of_dicts.a ppend(my_dict)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk

The difference is that instead of single values, tuples are added to
the dictionary instead. When the program is run again:

0 0.27
1 0.37
2 0.49
3 0.6
...
16 2.32
17 2.45
18 2.54
19 2.68

The execution time is rising linearly with every new dictionary
created.

Next experiment: dictionaries are not stored in a list, they are just
left out when an iteration has finished. It's done by removing two
lines:

list_of_dicts = []

and

list_of_dicts.a ppend(my_dict)

Full code:

keys = [str(x) for x in range(200000)]
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
my_dict[key] = (key, key)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk

The time is constant again:

0 0.28
1 0.28
2 0.28
3 0.26
4 0.26

I see no reason for this kind of performance problem, really. It
happens when both things are true: dictionaries are kept in a list (or
more generally, in memory) and they store tuples.

As this goes beyond my understanding of Python internals, I would like
to kindly ask, if anyone has an idea about how to create this data
structure (list of dictionaries of tuples, assuming that size of all
dictionaries is the same), in constant time?

Regards,
Maciej
Let me comment on what happens in you're code:
The place where you create new objects is
keys = [str(x) for x in range(200000)] # here you create 200000
strings which will be reused ( by reference )
and
my_dict[key] = (key, key) # here you create a new tuple with 2
elements
( both are key, so you're taking a
reference of existing key object twice )
The tricky part is where you wrote:
for key in keys:
my_dict[key] = (key, key)
list_of_dicts.a ppend(my_dict) # note that
list_of_dicts.a ppend is in the loop! check upstairs!
This means that my_dict reference will be stored 200000 times, and it
won't be released.
statement
my_dict = {}
will always create new my_dict ( 20 times means 20 new dictionaries )
and start over.
Since python caches free dictionaries ( after delete - they're used
everywhere ),
reuse won't happen, and memory will have to be allocated again.
Lists are internally like arrays, when there is not enough space for
next element, pointer array is doubled, so
there is no huge penalty in the append function. Lists are also reused
from some internal cache.
Dictionaries also have a some growth function. When there is no space
for next key, internal hash map doubles.
The reason why you have a growing time comes from the fact that memory
allocation takes place instead of object
being used by reference. Check the memory usage, and you'll see that
test time is pretty much proportional to overall memory usage.

Regards, Bosko

May 31 '07 #3
bv****@teletrad er.com <bv****@teletra der.comwrote:
Let me comment on what happens in you're code:
The place where you create new objects is
keys = [str(x) for x in range(200000)] # here you create 200000
strings which will be reused ( by reference )
and
my_dict[key] = (key, key) # here you create a new tuple with 2
elements
( both are key, so you're taking a
reference of existing key object twice )
The tricky part is where you wrote:
for key in keys:
my_dict[key] = (key, key)
list_of_dicts.a ppend(my_dict) # note that
list_of_dicts.a ppend is in the loop! check upstairs!
This means that my_dict reference will be stored 200000 times, and it
won't be released.
statement
my_dict = {}
will always create new my_dict ( 20 times means 20 new dictionaries )
and start over.
Since python caches free dictionaries ( after delete - they're used
everywhere ),
reuse won't happen, and memory will have to be allocated again.
[snip]
The reason why you have a growing time comes from the fact that memory
allocation takes place instead of object
being used by reference. Check the memory usage, and you'll see that
test time is pretty much proportional to overall memory usage.
That agrees with my analysis also - it is all about memory usage due
to the duplicated (key, key) tuples. In fact it puts my machine into
swap at the larger iteration numbers!

Here is storing them in a cache which runs nice and fast!

import time

list_of_dicts = []
keys = [str(x) for x in range(200000)]
tuple_cache = {}
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
value = tuple_cache.set default((key, key), (key, key))
my_dict[key] = value
list_of_dicts.a ppend(my_dict)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk

0 1.0
1 0.39
2 0.41
3 0.39
4 0.39
5 0.4
6 0.41
7 0.39
8 0.4
9 0.4
10 0.39
11 0.4
12 0.4
13 0.39
14 0.4
15 0.4
16 0.39
17 0.4
18 0.39
19 0.41

Note the first iteration is slower as it builds the tuple cache

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Jun 1 '07 #4
Thank you for all the answers! My problem is solved even better than I
expected!

@Peter: Yes, the garbage collector was causing the slowdown. Switching
it off sped the program up; each iteration was taking the same amount
of time. I ran collection manually every 10 iterations to control
memory usage.

@Bosko: Right, this example code had a bug, appending to the
dictionary was meant to be one nesting level up.

@Nick: Using a tuple cache is a great advice! My real program's tuples
are even better to cache, as there are roughly about 10 different
tuples that are used as values. Using a tuple cache reduced the memory
consumption by about 3-4 times (I'm telling that by looking at the
gkrellm display).

Thanks!
Maciej

Jun 11 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

42
2750
by: Jeff Wagner | last post by:
I've spent most of the day playing around with lists and tuples to get a really good grasp on what you can do with them. I am still left with a question and that is, when should you choose a list or a tuple? I understand that a tuple is immutable and a list is mutable but there has to be more to it than just that. Everything I tried with a list worked the same with a tuple. So, what's the difference and why choose one over the other? Jeff
3
2115
by: Thorsten Kampe | last post by:
I found out that I am rarely using tuples and almost always lists because of the more flexible usability of lists (methods, etc.) To my knowledge, the only fundamental difference between tuples and lists is that tuples are immutable, so if this is correct, than list are a superset of tuples, meaning lists can do everything tuples can do and more. Is there any advantage for using tuples? Are they "faster"? Consume less memory? When is...
8
2270
by: nidhog | last post by:
Hello guys, I made a script that extracts strings from a binary file. It works. My next problem is sorting those strings. Output is like: ---- snip ---- 200501221530
66
3504
by: Mike Meyer | last post by:
It seems that the distinction between tuples and lists has slowly been fading away. What we call "tuple unpacking" works fine with lists on either side of the assignment, and iterators on the values side. IIRC, "apply" used to require that the second argument be a tuple; it now accepts sequences, and has been depreciated in favor of *args, which accepts not only sequences but iterators. Is there any place in the language that still...
122
5504
by: C.L. | last post by:
I was looking for a function or method that would return the index to the first matching element in a list. Coming from a C++ STL background, I thought it might be called "find". My first stop was the Sequence Types page of the Library Reference (http://docs.python.org/lib/typesseq.html); it wasn't there. A search of the Library Reference's index seemed to confirm that the function did not exist. A little later I realized it might be called...
15
1504
by: Scott | last post by:
I'm going to start grouping all my questions in one post as this is my second today, and sorta makes me feel dumb to keep having to bother you all with trivial questions. I'll just seperate my questions with: ------------------------------------------------------------------------------------------- Now onto the issue: List's and Tuple's I don't see the distinction between the two. I mean, I understand that a list is mutable and a tuple...
27
1674
by: seberino | last post by:
Please help me think of an example where immutable tuples are essential. It seems that everywhere a tuple is used one could just as easily use a list instead. chris
4
15707
by: mrstephengross | last post by:
Let's say I've got a list of tuples, like so: ( ('a', '1'), ('b', '2'), ('c', '3') And I want to turn it into a dictionary in which the first value of each tuple is a key and the second value is a value, like so: { 'a' -'1', 'b' -'2', 'c' -'3' } Is there a way to do this with a single line of code? Currently, I'm
9
175
by: =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= | last post by:
``odict.byindex(index)`` For this API, I think it's important to make some performance guarantees. It seems fairly difficult to make byindex O(1), and simultaneously also make insertion/deletion better than O(n). IOW, the PEP should somehow specify which operations are efficient, and which ones aren't. Regards,
0
8752
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9257
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9179
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9116
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6702
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6011
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4519
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4784
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2637
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.