440,621 Members | 1,101 Online Need help? Post your question and get tips & solutions from a community of 440,621 IT Pros & Developers. It's quick & easy.

# Adding tuples to a dictionary

 P: n/a Hi Pythonistas! I've got a question about storing tuples in a dictionary. First, a small test case which creates a list of dictionaries: import time list_of_dicts = [] keys = [str(x) for x in range(200000)] prev_clk = time.clock() for i in range(20): my_dict = {} for key in keys: my_dict[key] = key list_of_dicts.append(my_dict) new_clk = time.clock() print i, new_clk - prev_clk prev_clk = new_clk It creates dictionaries and stores them in a list, printing out execution times. The size of each dictionary is constant, so is the execution time for each iteration. 0 0.1 1 0.1 2 0.1 3 0.08 4 0.09 ....and so on. Then, just one line is changed: my_dict[key] = key into: my_dict[key] = (key, key) Full code: list_of_dicts = [] keys = [str(x) for x in range(200000)] prev_clk = time.clock() for i in range(20): my_dict = {} for key in keys: my_dict[key] = (key, key) list_of_dicts.append(my_dict) new_clk = time.clock() print i, new_clk - prev_clk prev_clk = new_clk The difference is that instead of single values, tuples are added to the dictionary instead. When the program is run again: 0 0.27 1 0.37 2 0.49 3 0.6 .... 16 2.32 17 2.45 18 2.54 19 2.68 The execution time is rising linearly with every new dictionary created. Next experiment: dictionaries are not stored in a list, they are just left out when an iteration has finished. It's done by removing two lines: list_of_dicts = [] and list_of_dicts.append(my_dict) Full code: keys = [str(x) for x in range(200000)] prev_clk = time.clock() for i in range(20): my_dict = {} for key in keys: my_dict[key] = (key, key) new_clk = time.clock() print i, new_clk - prev_clk prev_clk = new_clk The time is constant again: 0 0.28 1 0.28 2 0.28 3 0.26 4 0.26 I see no reason for this kind of performance problem, really. It happens when both things are true: dictionaries are kept in a list (or more generally, in memory) and they store tuples. As this goes beyond my understanding of Python internals, I would like to kindly ask, if anyone has an idea about how to create this data structure (list of dictionaries of tuples, assuming that size of all dictionaries is the same), in constant time? Regards, Maciej May 31 '07 #1
4 Replies

 P: n/a Maciej Blizi?ski wrote: Hi Pythonistas! I've got a question about storing tuples in a dictionary. First, a small test case which creates a list of dictionaries: import time list_of_dicts = [] keys = [str(x) for x in range(200000)] prev_clk = time.clock() for i in range(20): my_dict = {} for key in keys: my_dict[key] = key list_of_dicts.append(my_dict) new_clk = time.clock() print i, new_clk - prev_clk prev_clk = new_clk It creates dictionaries and stores them in a list, printing out execution times. The size of each dictionary is constant, so is the execution time for each iteration. 0 0.1 1 0.1 2 0.1 3 0.08 4 0.09 ...and so on. Then, just one line is changed: my_dict[key] = key into: my_dict[key] = (key, key) Full code: list_of_dicts = [] keys = [str(x) for x in range(200000)] prev_clk = time.clock() for i in range(20): my_dict = {} for key in keys: my_dict[key] = (key, key) list_of_dicts.append(my_dict) new_clk = time.clock() print i, new_clk - prev_clk prev_clk = new_clk The difference is that instead of single values, tuples are added to the dictionary instead. When the program is run again: 0 0.27 1 0.37 2 0.49 3 0.6 ... 16 2.32 17 2.45 18 2.54 19 2.68 The execution time is rising linearly with every new dictionary created. Next experiment: dictionaries are not stored in a list, they are just left out when an iteration has finished. It's done by removing two lines: list_of_dicts = [] and list_of_dicts.append(my_dict) Full code: keys = [str(x) for x in range(200000)] prev_clk = time.clock() for i in range(20): my_dict = {} for key in keys: my_dict[key] = (key, key) new_clk = time.clock() print i, new_clk - prev_clk prev_clk = new_clk The time is constant again: 0 0.28 1 0.28 2 0.28 3 0.26 4 0.26 I see no reason for this kind of performance problem, really. It happens when both things are true: dictionaries are kept in a list (or more generally, in memory) and they store tuples. As this goes beyond my understanding of Python internals, I would like to kindly ask, if anyone has an idea about how to create this data structure (list of dictionaries of tuples, assuming that size of all dictionaries is the same), in constant time? Disable garbage collection: import gc gc.disable() It uses inappropriate heuristics in this case. Peter May 31 '07 #2

 P: n/a On May 31, 8:30 pm, Maciej BliziÅski wrote: Hi Pythonistas! I've got a question about storing tuples in a dictionary. First, a small test case which creates a list of dictionaries: import time list_of_dicts = [] keys = [str(x) for x in range(200000)] prev_clk = time.clock() for i in range(20): my_dict = {} for key in keys: my_dict[key] = key list_of_dicts.append(my_dict) new_clk = time.clock() print i, new_clk - prev_clk prev_clk = new_clk It creates dictionaries and stores them in a list, printing out execution times. The size of each dictionary is constant, so is the execution time for each iteration. 0 0.1 1 0.1 2 0.1 3 0.08 4 0.09 ...and so on. Then, just one line is changed: my_dict[key] = key into: my_dict[key] = (key, key) Full code: list_of_dicts = [] keys = [str(x) for x in range(200000)] prev_clk = time.clock() for i in range(20): my_dict = {} for key in keys: my_dict[key] = (key, key) list_of_dicts.append(my_dict) new_clk = time.clock() print i, new_clk - prev_clk prev_clk = new_clk The difference is that instead of single values, tuples are added to the dictionary instead. When the program is run again: 0 0.27 1 0.37 2 0.49 3 0.6 ... 16 2.32 17 2.45 18 2.54 19 2.68 The execution time is rising linearly with every new dictionary created. Next experiment: dictionaries are not stored in a list, they are just left out when an iteration has finished. It's done by removing two lines: list_of_dicts = [] and list_of_dicts.append(my_dict) Full code: keys = [str(x) for x in range(200000)] prev_clk = time.clock() for i in range(20): my_dict = {} for key in keys: my_dict[key] = (key, key) new_clk = time.clock() print i, new_clk - prev_clk prev_clk = new_clk The time is constant again: 0 0.28 1 0.28 2 0.28 3 0.26 4 0.26 I see no reason for this kind of performance problem, really. It happens when both things are true: dictionaries are kept in a list (or more generally, in memory) and they store tuples. As this goes beyond my understanding of Python internals, I would like to kindly ask, if anyone has an idea about how to create this data structure (list of dictionaries of tuples, assuming that size of all dictionaries is the same), in constant time? Regards, Maciej Let me comment on what happens in you're code: The place where you create new objects is keys = [str(x) for x in range(200000)] # here you create 200000 strings which will be reused ( by reference ) and my_dict[key] = (key, key) # here you create a new tuple with 2 elements ( both are key, so you're taking a reference of existing key object twice ) The tricky part is where you wrote: for key in keys: my_dict[key] = (key, key) list_of_dicts.append(my_dict) # note that list_of_dicts.append is in the loop! check upstairs! This means that my_dict reference will be stored 200000 times, and it won't be released. statement my_dict = {} will always create new my_dict ( 20 times means 20 new dictionaries ) and start over. Since python caches free dictionaries ( after delete - they're used everywhere ), reuse won't happen, and memory will have to be allocated again. Lists are internally like arrays, when there is not enough space for next element, pointer array is doubled, so there is no huge penalty in the append function. Lists are also reused from some internal cache. Dictionaries also have a some growth function. When there is no space for next key, internal hash map doubles. The reason why you have a growing time comes from the fact that memory allocation takes place instead of object being used by reference. Check the memory usage, and you'll see that test time is pretty much proportional to overall memory usage. Regards, Bosko May 31 '07 #3 