440,715 Members | 768 Online
Need help? Post your question and get tips & solutions from a community of 440,715 IT Pros & Developers. It's quick & easy.

# item access time: sets v. lists

 P: n/a Is it expected for access to set elements to be much slower than access to list elements? Explanation? Thanks, Alan Isaac >>t1=timeit.Timer("for i in set(xrange(10000)):pass","")t2=timeit.Timer("for i in list(xrange(10000)):pass","")t1.timeit(1000) 9.806250235714316 >>t2.timeit(1000) 3.9823075279120701 Oct 4 '06 #1
12 Replies

 P: n/a On Wed, 04 Oct 2006 16:02:56 GMT "David Isaac" t1=timeit.Timer("for i in set(xrange(10000)):pass","")t2=timeit.Timer("for i in list(xrange(10000)):pass","") [...] You're measuring the time for creating the xrange and the set/list too. Create them before you call Timer() and repeat your timing. Dennis Benzinger Oct 4 '06 #2

 P: n/a Paul M. wrote: Random access to item in list/set when item exists set -0.000241650824337 list -0.0245168031132 Random access to item in list/set when item does not exist set -0.000187733357172 list -0.522086186932 OK, that's a much better set of answers including to questions I did not even know I wanted to ask until I saw your post. But, how to explain the above?? Thanks, Alan Isaac Oct 4 '06 #4

 P: n/a "Neil Cerutti"

 P: n/a "David Isaac" Random access to item in list/set when item existsset -0.000241650824337list -0.0245168031132Random access to item in list/set when item does not existset -0.000187733357172list -0.522086186932 OK, that's a much better set of answers including to questions I did not even know I wanted to ask until I saw your post. But, how to explain the above?? Thanks, Alan Isaac Searching for an item in a list is a linear search, and all 10000 items must be checked before determining that a non-existent item is not in the list, and sure enough, our list takes about 20 times as long to find that 10500 is not in the range 10000 as it does to find 500 in the range 10000. By contrast, a set (and also the keys in a dict) use a tree structure to index more quickly into the list of items, so this access (and determination of non-existence) will be much faster, especially so for a large list. -- Paul Oct 4 '06 #7

 P: n/a "Paul McGuire"

 P: n/a On 2006-10-04, Paul McGuire Look at the code again. It's not testing what it says it'stesting. It isnt? The only quibble I can see is that there really is no "first" element in a set. I picked the "0 in set" and "0 in list" to pick the fastest case for list, which does a linear search through the list elements. Where did I go wrong on the test descriptions? It seems to be timing "testing for membership", not "random access". Random access is just seq[n]; at least, that's the assumption I made. Perhaps I read the test description out of context and got confused. -- Neil Cerutti 8 new choir robes are currently needed, due to the addition of several new members and to the deterioration of some of the older ones. --Church Bulletin Blooper Oct 4 '06 #9

 P: n/a David Isaac wrote: Paul M. wrote: Random access to item in list/set when item exists set -0.000241650824337 list -0.0245168031132 Random access to item in list/set when item does not exist set -0.000187733357172 list -0.522086186932 OK, that's a much better set of answers including to questions I did not even know I wanted to ask until I saw your post. But, how to explain the above?? The code was flawed, but it illustrates a point. To find an arbitrary item in a list takes time proportional to the length of the list--on average, you have to scan n/2 items in a list of length n to find it. "x in list" really has to check item 0, item 1, item 2, etc until it finds x or reaches the end of the list. Obviously in a sorted list like this, finding 0 will be much faster than finding 9999 since you have to scan almost the whole list to find the latter. Of course, if you know the list is sorted you could use a binary search (which will find the item in log(n) comparisons on average). So really to find a random item that is in the list is going to take more like 0.250 seconds on average, give or take a little. When the item doesn't occur in the list, you wind up having to check every element to see if it's there. (binary search would work for this case as well for a sorted list). A set, on the other hand, uses a hash table so finding an element takes constant time (it's one hash lookup, independent of the size of the set)--and determining an item isn't there is likewise constant time. (if your data is degenerate and you have many things in the set that all hash to the same value then the lookup could take longer, but that's pretty unlikely to make a big difference in practice with python sets and user data). Oct 4 '06 #10

 P: n/a "Neil Cerutti" "Neil Cerutti" >z = set(range(100))z[30] Traceback (most recent call last): File "", line 1, in ? TypeError: unindexable object >>> It really doesn't make any sense since sets are unordered. So we couldn't compare lists and sets in this manner anyway. I *did* learn that dicts and sets use a hash table, not a tree, which also helps explain why the initial construction of the 10000-element set takes so much longer than just simple assembly of the range into a list. -- Paul Oct 4 '06 #11

 P: n/a "Duncan Booth" By contrast, a set (and also the keys in a dict) use a tree structureto index more quickly into the list of items 'dict' and I believe also 'set' use a hash table, not a tree structure. Thanks, I stand corrected. How do they know how big a hash table to use? I think this is an interesting problem. -- Paul Oct 4 '06 #12

 P: n/a "sj*******@yahoo.com"

### This discussion thread is closed

Replies have been disabled for this discussion.