455,534 Members | 1,299 Online
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,534 IT Pros & Developers. It's quick & easy.

built in zip function speed

 P: n/a I hope I am not being too ignorant :p but here goes... my boss has written a bit of python code and asked me to speed it up for him... I've reduced the run time from around 20 minutes to 13 (not bad I think ;) to speed it up further I asked him to replace a loop like this:- index = 0 for element in a: av = a[index] bv = b[index] cv = c[index] dv = d[index] avbv = (av-bv) * (av-bv) diff = cv - dv e.append(diff - avbv) index = index + 1 (where a, b, c and d are 200,000 element float arrays) to use the built in zip function.. it would seem made for this problem! for av, bv, cv, dv in zip(a, b, c, d): avbv = (av-bv) * (av - bv) diff = cv - dv e.append(diff - avbv) however this seems to run much slower than *I* thought it would (and in fact slower than slicing) I guess what I am asking is.. would you expect this? full code listing (I hope I have made a very obvious error):- import array import time a = array.array("f") b = array.array("f") c = array.array("f") d = array.array("f") e = array.array("f") for value in xrange(1, 200000, 1): a.append(float(value)) b.append(float(value)) c.append(float(value)) d.append(float(value)) start = time.time() index = 0 for element in a: av = a[index] bv = b[index] cv = c[index] dv = d[index] avbv = (av-bv) * (av-bv) diff = cv - dv e.append(diff - avbv) index = index + 1 end0 = time.time() print end0-start e = array.array("f") for av, bv, cv, dv in zip(a, b, c, d): avbv = (av-bv) * (av - bv) diff = cv - dv e.append(diff - avbv) end1 = time.time() print end1-end0 e = array.array("f") ## just for a laugh my own zip function ## the joke is it runs faster than built in zip ?? def myzip(*args): index = 0 for elem in args[0]: zipper = [] for arg in args: zipper.append(arg[index]) index = index +1 yield zipper for av, bv, cv, dv in myzip(a, b, c, d): avbv = (av-bv) * (av - bv) diff = cv - dv e.append(diff - avbv) end2 = time.time() print end2-end1 timings from 4 million element input array slice: 8.77999997139 zip(): 36.5759999752 myzip(): 12.1449999809 Jul 4 '06 #1
12 Replies

 P: n/a itertools.izip is usually faster than zip. You can try that. Jul 4 '06 #2

 P: n/a Rune Strand wrote: itertools.izip is usually faster than zip. You can try that. Thanks very much timing for itertools.izip for av, bv, cv, dv in itertools.izip(a, b, c, d): avbv = (av-bv) * (av - bv) diff = cv - dv e.append(diff - avbv) on a 4 million element aray: slice: 8.06299996376 built in zip: 36.5169999599 myzip: 12.0320000648 izip: 5.76499986649 so fastest overall Jul 4 '06 #3

 P: n/a Rune Strand wrote: itertools.izip is usually faster than zip. You can try that. Thanks very much timing for itertools.izip for av, bv, cv, dv in itertools.izip(a, b, c, d): avbv = (av-bv) * (av - bv) diff = cv - dv e.append(diff - avbv) on a 4 million element aray: slice: 8.06299996376 built in zip: 36.5169999599 myzip: 12.0320000648 izip: 5.76499986649 so fastest overall Jul 4 '06 #4

 P: n/a On Tue, 04 Jul 2006 07:18:29 -0700, ma***********@gmail.com wrote: I hope I am not being too ignorant :p but here goes... my boss has written a bit of python code and asked me to speed it up for him... I've reduced the run time from around 20 minutes to 13 (not bad I think ;) to speed it up further I asked him to replace a loop like this:- index = 0 for element in a: av = a[index] bv = b[index] cv = c[index] dv = d[index] avbv = (av-bv) * (av-bv) diff = cv - dv e.append(diff - avbv) index = index + 1 This is, I think, a good case for an old-fashioned for-with-index loop: for i in len(a): e.append(c[i] - d[i] - (a[i] - b[i])**2) Python doesn't optimize away lines of code -- you have to do it yourself. Every line of Python code takes a bit of time to execute. My version uses 34 lines disassembled; yours takes 60 lines, almost twice as much code. (See the dis module for further details.) It's too much to hope that my code will be twice as fast as yours, but it should be a little faster. (where a, b, c and d are 200,000 element float arrays) to use the built in zip function.. it would seem made for this problem! for av, bv, cv, dv in zip(a, b, c, d): avbv = (av-bv) * (av - bv) diff = cv - dv e.append(diff - avbv) however this seems to run much slower than *I* thought it would (and in fact slower than slicing) I guess what I am asking is.. would you expect this? Yes. zip() makes a copy of your data. It's going to take some time to copy 4 * 200,000 floats into one rather large list. That list is an ordinary Python list of objects, not an array of bytes like the array module uses. That means zip has to convert every one of those 800,000 floats into rich Python float objects. This won't matter for small sets of data, but with 800,000 of them, it all adds up. -- Steven. Jul 4 '06 #5

 P: n/a ma***********@gmail.com wrote: ## just for a laugh my own zip function ## the joke is it runs faster than built in zip ?? since it doesn't do the same thing, it's not a very good joke. def myzip(*args): index = 0 for elem in args[0]: zipper = [] for arg in args: zipper.append(arg[index]) index = index +1 yield zipper Jul 4 '06 #6

 P: n/a so fastest overall you may experience speed-ups by using from itertools import izip and just use izip() instead to avoid the module namespace lookup. The same applies for the list.append() methods. If you're appending some million times a_list = [] a_list_append = a_list.append a_list_append(value) will be faster than a_list.append(value) but not much. Jul 4 '06 #7

 P: n/a Steven D'Aprano wrote: On Tue, 04 Jul 2006 07:18:29 -0700, ma***********@gmail.com wrote: I hope I am not being too ignorant :p but here goes... my boss has written a bit of python code and asked me to speed it up for him... I've reduced the run time from around 20 minutes to 13 (not bad I think ;) to speed it up further I asked him to replace a loop like this:- index = 0 for element in a: av = a[index] bv = b[index] cv = c[index] dv = d[index] avbv = (av-bv) * (av-bv) diff = cv - dv e.append(diff - avbv) index = index + 1 This is, I think, a good case for an old-fashioned for-with-index loop: for i in len(a): e.append(c[i] - d[i] - (a[i] - b[i])**2) Python doesn't optimize away lines of code -- you have to do it yourself. Every line of Python code takes a bit of time to execute. My version uses 34 lines disassembled; yours takes 60 lines, almost twice as much code. (See the dis module for further details.) It's too much to hope that my code will be twice as fast as yours, but it should be a little faster. indeed thanks very much :) my tests on 4 million:- slice (original): 7.73399996758 built in zip: 36.7350001335 izip: 5.98399996758 Steven slice: 4.96899986267 so overall fastest so far > (where a, b, c and d are 200,000 element float arrays) to use the built in zip function.. it would seem made for this problem! for av, bv, cv, dv in zip(a, b, c, d): avbv = (av-bv) * (av - bv) diff = cv - dv e.append(diff - avbv) however this seems to run much slower than *I* thought it would (and in fact slower than slicing) I guess what I am asking is.. would you expect this? Yes. zip() makes a copy of your data. It's going to take some time to copy 4 * 200,000 floats into one rather large list. That list is an ordinary Python list of objects, not an array of bytes like the array module uses. That means zip has to convert every one of those 800,000 floats into rich Python float objects. This won't matter for small sets of data, but with 800,000 of them, it all adds up. I was beginning to suspect this was the case (I opened windows task manager and noticed the memory usage) thanks for explaining it to me. -- Steven. Jul 4 '06 #8

 P: n/a Fredrik Lundh wrote: ma***********@gmail.com wrote: ## just for a laugh my own zip function ## the joke is it runs faster than built in zip ?? since it doesn't do the same thing, it's not a very good joke. def myzip(*args): index = 0 for elem in args[0]: zipper = [] for arg in args: zipper.append(arg[index]) index = index +1 yield zipper indeed, the joke is on me ;) I thanks for pointing it out Jul 4 '06 #9

 P: n/a ma***********@gmail.com: Using Python you can do: # Data: l_a = [1.1, 1.2] l_b = [2.1, 2.2] l_c = [3.1, 3.2] l_d = [5.1, 4.2] from itertools import izip l_e = [(c-d) - (a-b)*(a-b) for a,b,c,d in izip(l_a, l_b, l_c, l_d)] print l_e With psyco + the standard module array you can probably go quite fast, Psyco regognizes those arrays and speeds them a lot. But with something like this you can probably go faster: from numarray import array arr_a = array(l_a) arr_b = array(l_b) arr_c = array(l_c) arr_d = array(l_d) arr_e = (arr_c - arr_d) - (arr_a - arr_b)**2 print arr_e (Instead of numarray you can use ScyPy, numerics, etc.) If your data in on disk you can avoid the list=>array conversion, and load the data from the numerical library itself, this is probably almost as fast as doing the same thing in C. Bye, bearophile Jul 4 '06 #10

 P: n/a ma***********@gmail.com wrote: I hope I am not being too ignorant :p but here goes... my boss has written a bit of python code and asked me to speed it up for him... I've reduced the run time from around 20 minutes to 13 (not bad I think ;) to speed it up further I asked him to replace a loop like this:- index = 0 for element in a: av = a[index] bv = b[index] cv = c[index] dv = d[index] avbv = (av-bv) * (av-bv) diff = cv - dv e.append(diff - avbv) index = index + 1 For /real/ speed-ups use a numerical library, e. g. # untested from numarray import array a = array(a) b = array(b) c = array(c) d = array(d) e = (c-d) - (a-b)*(a-b) Peter Jul 4 '06 #11

 P: n/a Peter Otten wrote: from numarray import array a = array(a) b = array(b) c = array(c) d = array(d) e = (c-d) - (a-b)*(a-b) Oops, bearophile has already posted the same idea with better execution... Jul 4 '06 #12

 P: n/a be************@lycos.com writes: [...] (Instead of numarray you can use ScyPy, numerics, etc.) If your data in on disk you can avoid the list=>array conversion, and load the data from the numerical library itself, this is probably almost as fast as doing the same thing in C. Apparently if you're starting to write numerical code with Python these days you should use numpy, not Numeric or numarray. (Note that in old postings you'll see 'numpy' used as a synonym for what's now strictly called 'Numeric'. First came Numeric, then the offshoots/rewrites numarray and scipy-core, and now numpy has come along to re-unify the two camps -- hooray!) John Jul 4 '06 #13

This discussion thread is closed

Replies have been disabled for this discussion.