470,614 Members | 1,447 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,614 developers. It's quick & easy.

Testing for membership speed

I just ran this stuff for my own knowledge. Though it might be
useful to some other people to know and maybe spark a discussion.

I needed a fast way to test for membership, so naturally the
choices were the builtin containers: lists, dictionaries, and
tuples. The following is the test code and results:

import timeit

lst_i=timeit.Timer('random.randrange(10000) in l','import
random; l=range(10000)')
dct_i=timeit.Timer('l.has_key(random.randrange(100 00))','import
random; l=dict([(i,None) for i in xrange(10000)])')
tup_i=timeit.Timer('random.randrange(10000) in l','import
random; l=tuple(range(10000))')

lst_str=timeit.Timer('md5.md5(str(random.randrange (10000))).hexdigest()
in l','import random,md5; l=[md5.md5(str(i)).hexdigest() for i
in xrange(10000)]')
dct_str=timeit.Timer('l.has_key(md5.md5(str(random .randrange(10000))).hexdigest())','import
random,md5; l=dict([(md5.md5(str(i)).hexdigest(),None) for i
in xrange(10000)])')
tup_str=timeit.Timer('md5.md5(str(random.randrange (10000))).hexdigest()
in l','import random,md5; l=tuple([md5.md5(str(i)).hexdigest()
for i in xrange(10000)])')

print 'Integer lookup'
r=lst_i.repeat(100,100); print 'list:',min(r),max(r);
r=dct_i.repeat(100,100); print 'dict:',min(r),max(r);
r=tup_i.repeat(100,100); print 'tupl:',min(r),max(r);
print 'String lookup'
r=lst_str.repeat(100,100); print 'list:',min(r),max(r);
r=dct_str.repeat(100,100); print 'dict:',min(r),max(r);
r=tup_str.repeat(100,100); print 'tupl:',min(r),max(r);

[[[Ran on IRIX64 6.5, 24 processors, 12G Memory, 4G Swap, this
code only uses one processor at %100 the full length of the run]]]
Python 2.2.3 (#1, Nov 25 2003, 18:58:21) [C] on irix646-n32
Type "help", "copyright", "credits" or "license" for more
information.
## working on region in file /usr/tmp/python-119959673PMu.py...
Integer lookup
list: 0.126830816269 0.160212993622
dict: 0.00362300872803 0.00385618209839
tupl: 0.119297981262 0.161748170853
String lookup
list: 0.142526865005 0.188524961472
dict: 0.00711393356323 0.00760197639465
tupl: 0.143892049789 0.186873912811


The results are conclusive, go for dictionaries. But this
surprised me a little, anyone have insight as to why?

I was also surprised that tuples and lists scored exactly the
same. I was hoping that immutable tuples might earn it some
speed over lists.

I will eventually need this for testing strings. So the
doubling of speed for strings over integers for dictionaries
is a little alarming. Lists and tuples only saw a modest increase.

Thank you in advance for any clever tricks you suggest.

-Brian

Jul 18 '05 #1
1 1940
The dictionaries are faster because they are indexed.
Lists/tuples must be searched serially from the
beginning.

Dictionary indexes are hashed and the
hashing algorithm has more to do on a string
than on an integer (see first article link
below for explanation).

You could check into pre-hashing your strings and using
these as integer keys. It will take longer to build
the dictionary, but searching should be faster.

Here are some articles that might interest you:

http://effbot.org/zone/python-hash.htm

http://mail.python.org/pipermail/pyt...ne/036556.html

http://www.egenix.com/files/python/mxTextTools.html

Searching list or tuple serially from the beginning
should take approximately the same time. Nothing
about the mutability can help the search speed.

HTH,
Larry Bates
Syscon, Inc.
<br****@temple.edu> wrote in message
news:ma*************************************@pytho n.org...
I just ran this stuff for my own knowledge. Though it might be
useful to some other people to know and maybe spark a discussion.

I needed a fast way to test for membership, so naturally the
choices were the builtin containers: lists, dictionaries, and
tuples. The following is the test code and results:

import timeit

lst_i=timeit.Timer('random.randrange(10000) in l','import
random; l=range(10000)')
dct_i=timeit.Timer('l.has_key(random.randrange(100 00))','import
random; l=dict([(i,None) for i in xrange(10000)])')
tup_i=timeit.Timer('random.randrange(10000) in l','import
random; l=tuple(range(10000))')

lst_str=timeit.Timer('md5.md5(str(random.randrange (10000))).hexdigest()
in l','import random,md5; l=[md5.md5(str(i)).hexdigest() for i
in xrange(10000)]')
dct_str=timeit.Timer('l.has_key(md5.md5(str(random .randrange(10000))).hexdig
est())','import random,md5; l=dict([(md5.md5(str(i)).hexdigest(),None) for i
in xrange(10000)])')
tup_str=timeit.Timer('md5.md5(str(random.randrange (10000))).hexdigest()
in l','import random,md5; l=tuple([md5.md5(str(i)).hexdigest()
for i in xrange(10000)])')

print 'Integer lookup'
r=lst_i.repeat(100,100); print 'list:',min(r),max(r);
r=dct_i.repeat(100,100); print 'dict:',min(r),max(r);
r=tup_i.repeat(100,100); print 'tupl:',min(r),max(r);
print 'String lookup'
r=lst_str.repeat(100,100); print 'list:',min(r),max(r);
r=dct_str.repeat(100,100); print 'dict:',min(r),max(r);
r=tup_str.repeat(100,100); print 'tupl:',min(r),max(r);

[[[Ran on IRIX64 6.5, 24 processors, 12G Memory, 4G Swap, this
code only uses one processor at %100 the full length of the run]]]
Python 2.2.3 (#1, Nov 25 2003, 18:58:21) [C] on irix646-n32
Type "help", "copyright", "credits" or "license" for more
information.
## working on region in file /usr/tmp/python-119959673PMu.py...
Integer lookup
list: 0.126830816269 0.160212993622
dict: 0.00362300872803 0.00385618209839
tupl: 0.119297981262 0.161748170853
String lookup
list: 0.142526865005 0.188524961472
dict: 0.00711393356323 0.00760197639465
tupl: 0.143892049789 0.186873912811


The results are conclusive, go for dictionaries. But this
surprised me a little, anyone have insight as to why?

I was also surprised that tuples and lists scored exactly the
same. I was hoping that immutable tuples might earn it some
speed over lists.

I will eventually need this for testing strings. So the
doubling of speed for strings over integers for dictionaries
is a little alarming. Lists and tuples only saw a modest increase.

Thank you in advance for any clever tricks you suggest.

-Brian

Jul 18 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by Brian | last post: by
9 posts views Thread by Paul Keegstra | last post: by
reply views Thread by Cooper Blake | last post: by
1 post views Thread by =?Utf-8?B?Q2FtaWxvIE9yb3pjbw==?= | last post: by
reply views Thread by mike222 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.