473,320 Members | 1,902 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

dict.get and str.xsplit

When I use dictionaries I find the dict.get() method very handy, and
I'd like to use it often, but I think it's quite slow. This is a small
benchmark:

from random import randrange
from timeit import default_timer as clock

def main(N, M):
keys = list(set(randrange(2*N) for n in xrange(N)))
n2 = len(keys) / 2
keys1, keys2 = keys[:n2], keys[n2:]
adict = dict.fromkeys(keys1)

t = clock()
for _ in xrange(M):
for k in keys1:
r = adict.get(k, None)
for k in keys2:
r = adict.get(k, None)
print round(clock() - t, 2), "s"

t = clock()
for _ in xrange(M):
for k in keys1:
if k in adict:
r = adict[k]
else:
r = None
for k in keys2:
if k in adict:
r = adict[k]
else:
r = None
print round(clock() - t, 2), "s"

main(40000, 30)
On my old PC the output is:
2.36 s
1.59 s
(Your timings may be 4-5 times smaller, so you can tweak N and M to
have significant results).
And note that the if/then version is faster even if it calls the hash
function two times, while get() is supposed to do it once (in this
case the hashing function is very simple and fast. If the keys are
long strings the results of a similar benchmark may be different,
because the computing of the hashing function may become the slowest
part of that code).

This is a real difference, that has real impact on the programs I
write, so I often use the if/else approach, despite the dict.get()
method being semantically fitter and shorter.
So can the dict.get() method be speed up? And if not, why?

------------------

Another method I'd like to see, is the str.xsplit() I was talking
about time ago too. It's like str.split() but it's lazy. It avoids to
build a potentially huge list. In real D programs I have seen such
string function may speed up heavy-string-processing code (the kind of
code you may want to use Python/Perl/Ruby for) up to 2-3 times. Seeing
how such kind of processing is common in Python I think this isn't an
optimization to ignore.
(I am now getting better in writing C code, so in few months I may
even become able to submit a xsplit patch myself. I think it's not too
much complex to write, it can borrow lot of code from the str.split
method).

Bye, and thank you for your kind attention,
bearophile
Feb 26 '08 #1
6 1589
On Tue, 26 Feb 2008 06:02:12 -0800, bearophileHUGS wrote:
This is a real difference, that has real impact on the programs I
write, so I often use the if/else approach, despite the dict.get()
method being semantically fitter and shorter.
So can the dict.get() method be speed up? And if not, why?
I guess it's the method lookup that's the slow part. Factor it out of the
loop and measure again::

adict_get = adict.get
for _ in xrange(M):
for k in keys1:
r = adict_get(k, None)
for k in keys2:
r = adict_get(k, None)

Ciao,
Marc 'BlackJack' Rintsch
Feb 26 '08 #2
Marc 'BlackJack' Rintsch wrote:
On Tue, 26 Feb 2008 06:02:12 -0800, bearophileHUGS wrote:
>This is a real difference, that has real impact on the programs I
write, so I often use the if/else approach, despite the dict.get()
method being semantically fitter and shorter.
So can the dict.get() method be speed up? And if not, why?

I guess it's the method lookup that's the slow part. Factor it out of the
loop and measure again::

adict_get = adict.get
for _ in xrange(M):
for k in keys1:
r = adict_get(k, None)
for k in keys2:
r = adict_get(k, None)
And think about using the timeit module, since it's been included among
the batteries for your convenience, and it removes some common pitfalls
with cross-platform timings.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Feb 26 '08 #3
Marc 'BlackJack' Rintsch:
I guess it's the method lookup that's the slow part. Factor it out of the
loop and measure again::
I did know of that optimization, but sometimes I "forget" about it...
The new timings:

Output timings, best of 3, unloaded CPU:
2.32 s with adict.get
1.56 s with "in" + if/then/else
1.78 s with adict_get

It's slower still, despite doing the lookup two times half of the
times (half keys are present in the test set). The "in" is an
operator, it's faster than a method call, but I don't understand the
other details. Now the difference between 1.78 and 1.56 is small
enough, so probably now it's not much noticeable in practice in real
programs, so my original argument is mostly dead now :-) In the code
points where speed matters instead of a single line with a get() I can
use two lines to create a local adict_get.

Bye and thank you,
bearophile
Feb 26 '08 #4
bearophileH...@lycos.com napisa³(a):
It's slower still, despite doing the lookup two times half of the
times (half keys are present in the test set). The "in" is an
operator, it's faster than a method call, but I don't understand the
other details. Now the difference between 1.78 and 1.56 is small
enough, so probably now it's not much noticeable in practice in real
programs, so my original argument is mostly dead now :-) In the code
points where speed matters instead of a single line with a get() I can
use two lines to create a local adict_get.
AFAIK, method/function call is generally slow in Python (similarly to
the dot attribute lookup). As for the original prooblem, why not use
defaultdict? I think it's the most idiomatic way to achieve what we
want. And also the fastest one, according to my quick-and-dirty tests:

from collections import defaultdict

def main(N, M):

# <snip>

addict = defaultdict(lambda: None, adict)
t = clock()
for _ in xrange(M):
for k in keys1:
r = addict[k]
for k in keys2:
r = addict[k]
print round(clock() - t, 2), "s"

adict.get: 3.12 s
adict_get: 2.24 s
in: 1.62 s
defaultdict: 1.05 s
Feb 26 '08 #5
marek.ro...@wp.pl:
As for the original prooblem, why not use
defaultdict? I think it's the most idiomatic way to achieve what we
want. And also the fastest one, according to my quick-and-dirty tests:
It adds the new keys, I can't accept that:
>>from collections import defaultdict as dd
adict = {1:2, 3:4}
addict = dd(lambda: None, adict)
r = addict[10]
addict
defaultdict(<function <lambdaat 0x008E6FB0>, {1: 2, 10: None, 3: 4})

Bye,
bearophile
Feb 26 '08 #6
bearophileH...@lycos.com napisa³(a):
marek.ro...@wp.pl:
As for the original prooblem, why not use
defaultdict? I think it's the most idiomatic way to achieve what we
want. And also the fastest one, according to my quick-and-dirty tests:

It adds the new keys, I can't accept that:
Right, my fault not noticing that. I experimented further with
__missing__ method and with "ask forgiveness" idiom (try... except
KeyError) and they were both noticeably slower. So checking with "in"
may be the most efficient way we have.

Regards,
Marek
Feb 26 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Robin Cull | last post by:
Imagine I have a dict looking something like this: myDict = {"key 1": , "key 2": , "key 3": , "key 4": } That is, a set of keys which have a variable length list of associated values after...
2
by: GrelEns | last post by:
hello, i would like if this behaviour can be obtained from python : trap an attributeError from inside a subclassing dict class... (here is a silly examples to explain my question) class...
3
by: Bengt Richter | last post by:
Has anyone found a way besides not deriving from dict? Shouldn't there be a way? TIA (need this for what I hope is an improvement on the Larosa/Foord OrderedDict ;-) I guess I can just document...
11
by: sandravandale | last post by:
I can think of several messy ways of making a dict that sets a flag if it's been altered, but I have a hunch that experienced python programmers would probably have an easier (well maybe more...
15
by: Cruella DeVille | last post by:
I'm trying to implement a bookmark-url program, which accepts user input and puts the strings in a dictionary. Somehow I'm not able to iterate myDictionary of type Dict{} When I write print...
15
by: George Sakkis | last post by:
Although I consider dict(**kwds) as one of the few unfortunate design choices in python since it prevents the future addition of useful keyword arguments (e.g a default value or an orderby...
12
by: jeremito | last post by:
Please excuse me if this is obvious to others, but I can't figure it out. I am subclassing dict, but want to prevent direct changing of some key/value pairs. For this I thought I should override...
1
by: | last post by:
Hello all, I have a question which might be simple or need some work around. I want to do something like this. My class/instance has a dict as a property. I want the instance to catch the...
10
by: Gerard flanagan | last post by:
Simon Mullis wrote: data = from itertools import groupby datadict = \ dict((k, len(list(g))) for k,g in groupby(data, lambda s: s)) print datadict
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.