By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,715 Members | 768 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,715 IT Pros & Developers. It's quick & easy.

item access time: sets v. lists

P: n/a
Is it expected for access to set elements to be much
slower than access to list elements? Explanation?
Thanks,
Alan Isaac
>>t1=timeit.Timer("for i in set(xrange(10000)):pass","")
t2=timeit.Timer("for i in list(xrange(10000)):pass","")
t1.timeit(1000)
9.806250235714316
>>t2.timeit(1000)
3.9823075279120701
Oct 4 '06 #1
Share this Question
Share on Google+
12 Replies


P: n/a
On Wed, 04 Oct 2006 16:02:56 GMT
"David Isaac" <ai*****@verizon.netwrote:
Is it expected for access to set elements to be much
slower than access to list elements? Explanation?
Thanks,
Alan Isaac
>t1=timeit.Timer("for i in set(xrange(10000)):pass","")
t2=timeit.Timer("for i in list(xrange(10000)):pass","")
[...]
You're measuring the time for creating the xrange and the set/list too.
Create them before you call Timer() and repeat your timing.
Dennis Benzinger
Oct 4 '06 #2

P: n/a
"David Isaac" <ai*****@verizon.netwrote in message
news:QWQUg.15396$3T2.2982@trnddc06...
Is it expected for access to set elements to be much
slower than access to list elements? Explanation?
Thanks,
Alan Isaac
>>>t1=timeit.Timer("for i in set(xrange(10000)):pass","")
t2=timeit.Timer("for i in list(xrange(10000)):pass","")
t1.timeit(1000)
9.806250235714316
>>>t2.timeit(1000)
3.9823075279120701

A couple of points:
1. Your use of timeit is a bit flawed. You are not only timing the access
into the set or list, but also the construction of the set/list, *every
time*. Use the second string arg (the one you are passing as "") for
one-time initialization, so that you measure only access, which is what you
say your are interested in.
2. Ooops, it turns out you're not really *accessing* the set/list, you are
*iterating* over it. Given that sets are implemented internally using a
tree structure, I would expect that iterating over a list could be faster,
although only marginally so.
3. Here are some stats for a little more correct timeits:

Construct list/set
set -1.89524618352
list -0.299499796762

Iterate over list/set
set -0.646887523414
list -0.552001162159

Random access to first item in list/set
set -0.000189409547861
list -0.000160076210804

Random access to item in list/set when item exists
set -0.000241650824337
list -0.0245168031132

Random access to item in list/set when item does not exist
set -0.000187733357172
list -0.522086186932
The code:
import timeit

print "\nConstruct list/set"
t1=timeit.Timer("z = set(xrange(10000))","")
t2=timeit.Timer("z = list(xrange(10000))","")
print "set ->", t1.timeit(1000)
print "list ->", t2.timeit(1000)

print "\nIterate over list/set"
t1=timeit.Timer("for i in z: pass","z = set(xrange(10000))")
t2=timeit.Timer("for i in z: pass", "z = list(xrange(10000))")
print "set ->", t1.timeit(1000)
print "list ->", t2.timeit(1000)

print "\nRandom access to first item in list/set"
t1=timeit.Timer("0 in z","z = set(xrange(10000))")
t2=timeit.Timer("0 in z", "z = list(xrange(10000))")
print "set ->", t1.timeit(1000)
print "list ->", t2.timeit(1000)

print "\nRandom access to item in list/set when item exists"
t1=timeit.Timer("500 in z","z = set(xrange(10000))")
t2=timeit.Timer("500 in z", "z = list(xrange(10000))")
print "set ->", t1.timeit(1000)
print "list ->", t2.timeit(1000)

print "\nRandom access to item in list/set when item does not exist"
t1=timeit.Timer("10500 in z","z = set(xrange(10000))")
t2=timeit.Timer("10500 in z", "z = list(xrange(10000))")
print "set ->", t1.timeit(1000)
print "list ->", t2.timeit(1000)
-- Paul
Oct 4 '06 #3

P: n/a
Paul M. wrote:
Random access to item in list/set when item exists
set -0.000241650824337
list -0.0245168031132

Random access to item in list/set when item does not exist
set -0.000187733357172
list -0.522086186932

OK, that's a much better set of answers
including to questions I did not
even know I wanted to ask until
I saw your post. But, how to
explain the above??

Thanks,
Alan Isaac
Oct 4 '06 #4

P: n/a
On 2006-10-04, David Isaac <ai*****@verizon.netwrote:
Paul M. wrote:
>Random access to item in list/set when item exists
set -0.000241650824337
list -0.0245168031132

Random access to item in list/set when item does not exist
set -0.000187733357172
list -0.522086186932


OK, that's a much better set of answers
including to questions I did not
even know I wanted to ask until
I saw your post. But, how to
explain the above??
Look at the code again. It's not testing what it says it's
testing.

print "\nRandom access to first item in list/set"
t1=timeit.Timer("0 in z","z = set(xrange(10000))")
t2=timeit.Timer("0 in z", "z = list(xrange(10000))")
print "set ->", t1.timeit(1000)
print "list ->", t2.timeit(1000)

print "\nRandom access to item in list/set when item exists"
t1=timeit.Timer("500 in z","z = set(xrange(10000))")
t2=timeit.Timer("500 in z", "z = list(xrange(10000))")
print "set ->", t1.timeit(1000)
print "list ->", t2.timeit(1000)

--
Neil Cerutti
Oct 4 '06 #5

P: n/a

"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:sl*******************@FIAD06.norwich.edu...
Look at the code again. It's not testing what it says it's
testing.
It isnt?

The only quibble I can see is that there really is no "first" element in a
set. I picked the "0 in set" and "0 in list" to pick the fastest case for
list, which does a linear search through the list elements.

Where did I go wrong on the test descriptions?

-- Paul
Oct 4 '06 #6

P: n/a
"David Isaac" <ai*****@verizon.netwrote in message
news:4cSUg.7864$753.4683@trnddc05...
Paul M. wrote:
>Random access to item in list/set when item exists
set -0.000241650824337
list -0.0245168031132

Random access to item in list/set when item does not exist
set -0.000187733357172
list -0.522086186932


OK, that's a much better set of answers
including to questions I did not
even know I wanted to ask until
I saw your post. But, how to
explain the above??

Thanks,
Alan Isaac

Searching for an item in a list is a linear search, and all 10000 items must
be checked before determining that a non-existent item is not in the list,
and sure enough, our list takes about 20 times as long to find that 10500 is
not in the range 10000 as it does to find 500 in the range 10000. By
contrast, a set (and also the keys in a dict) use a tree structure to index
more quickly into the list of items, so this access (and determination of
non-existence) will be much faster, especially so for a large list.

-- Paul
Oct 4 '06 #7

P: n/a
"Paul McGuire" <pt***@austin.rr._bogus_.comwrote:
By contrast, a set (and also the keys in a dict) use a tree structure
to index more quickly into the list of items
'dict' and I believe also 'set' use a hash table, not a tree structure.
Oct 4 '06 #8

P: n/a
On 2006-10-04, Paul McGuire <pt***@austin.rr._bogus_.comwrote:
"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:sl*******************@FIAD06.norwich.edu...
>Look at the code again. It's not testing what it says it's
testing.

It isnt?

The only quibble I can see is that there really is no "first"
element in a set. I picked the "0 in set" and "0 in list" to
pick the fastest case for list, which does a linear search
through the list elements.

Where did I go wrong on the test descriptions?
It seems to be timing "testing for membership", not "random
access". Random access is just seq[n]; at least, that's the
assumption I made.

Perhaps I read the test description out of context and got
confused.

--
Neil Cerutti
8 new choir robes are currently needed, due to the addition of
several new members and to the deterioration of some of the
older ones. --Church Bulletin Blooper
Oct 4 '06 #9

P: n/a
David Isaac wrote:
Paul M. wrote:
Random access to item in list/set when item exists
set -0.000241650824337
list -0.0245168031132

Random access to item in list/set when item does not exist
set -0.000187733357172
list -0.522086186932


OK, that's a much better set of answers
including to questions I did not
even know I wanted to ask until
I saw your post. But, how to
explain the above??
The code was flawed, but it illustrates a point. To find an arbitrary
item in a list takes time proportional to the length of the list--on
average, you have to scan n/2 items in a list of length n to find it.
"x in list" really has to check item 0, item 1, item 2, etc until it
finds x or reaches the end of the list.

Obviously in a sorted list like this, finding 0 will be much faster
than finding 9999 since you have to scan almost the whole list to find
the latter. Of course, if you know the list is sorted you could use a
binary search (which will find the item in log(n) comparisons on
average).

So really to find a random item that is in the list is going to take
more like 0.250 seconds on average, give or take a little.

When the item doesn't occur in the list, you wind up having to check
every element to see if it's there. (binary search would work for this
case as well for a sorted list).

A set, on the other hand, uses a hash table so finding an element takes
constant time (it's one hash lookup, independent of the size of the
set)--and determining an item isn't there is likewise constant time.

(if your data is degenerate and you have many things in the set that
all hash to the same value then the lookup could take longer, but
that's pretty unlikely to make a big difference in practice with python
sets and user data).

Oct 4 '06 #10

P: n/a
"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:sl********************@FIAD06.norwich.edu...
On 2006-10-04, Paul McGuire <pt***@austin.rr._bogus_.comwrote:
>"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:sl*******************@FIAD06.norwich.edu.. .

It seems to be timing "testing for membership", not "random
access". Random access is just seq[n]; at least, that's the
assumption I made.

Perhaps I read the test description out of context and got
confused.
Oh, poor wording on my part, sorry.

You got me wondering what seq[n] would give me in the case of a set:
>>z = set(range(100))
z[30]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unindexable object
>>>
It really doesn't make any sense since sets are unordered. So we couldn't
compare lists and sets in this manner anyway.

I *did* learn that dicts and sets use a hash table, not a tree, which also
helps explain why the initial construction of the 10000-element set takes so
much longer than just simple assembly of the range into a list.

-- Paul
Oct 4 '06 #11

P: n/a
"Duncan Booth" <du**********@invalid.invalidwrote in message
news:Xn*************************@127.0.0.1...
"Paul McGuire" <pt***@austin.rr._bogus_.comwrote:
>By contrast, a set (and also the keys in a dict) use a tree structure
to index more quickly into the list of items

'dict' and I believe also 'set' use a hash table, not a tree structure.
Thanks, I stand corrected. How do they know how big a hash table to use? I
think this is an interesting problem.

-- Paul
Oct 4 '06 #12

P: n/a
"sj*******@yahoo.com" <sj*******@yahoo.comwrote:
A set, on the other hand, uses a hash table so finding an element
takes constant time (it's one hash lookup, independent of the size of
the set)--and determining an item isn't there is likewise constant
time.
One hash calculation, but there could be collisions. Worst case for an n
element dict/set could be that it takes n attempts to find a value,
although unless you try to do it deliberately by ensuring that the keys
have identical hash values this won't happen in practice.

Paul McGuire wrote:
Thanks, I stand corrected. How do they know how big a hash table to
use? I think this is an interesting problem.
If I read the code correctly:

Minimum size is 8. Whenever more than 2/3 of the slots are in use
(including slots where the element has been deleted) the dictionary is
resized to the smallest power of 2 (greater than 8) which is greater than 4
times the number of currently occupied slots (or 2 times the number of
occupied slots when more than 50000 slots are occupied). This can reduce
the size if a lot of elements have been deleted.
Oct 4 '06 #13

This discussion thread is closed

Replies have been disabled for this discussion.