455,538 Members | 1,428 Online Need help? Post your question and get tips & solutions from a community of 455,538 IT Pros & Developers. It's quick & easy.

# How to make this speed up

 P: n/a Hi, I have thousands of points in 3D space, and I want to calcuate the distance between each other of them. I do the calculation with the following code. def distance(a,b): sum = 0 for i in range(len(a)): sum += (a[i]-b[i])**2 return sqrt(sum) for i in range(len(plist)): for j in range(i+1,len(plist)): d = distance(plist[i],plist[j]) But this is damn slow. Do you have any suggestion to speed it up? If I write the distance function in C extension, how faster will it be? (For the time being, I don't know how to write C extension yet :( Regards, Yang Jul 18 '05 #1
5 Replies

 P: n/a At some point, "myang" wrote: Hi, I have thousands of points in 3D space, and I want to calcuate the distance between each other of them. I do the calculation with the following code. def distance(a,b): sum = 0 for i in range(len(a)): sum += (a[i]-b[i])**2 return sqrt(sum) for i in range(len(plist)): for j in range(i+1,len(plist)): d = distance(plist[i],plist[j]) But this is damn slow. Do you have any suggestion to speed it up? If I write the distance function in C extension, how faster will it be? (For the time being, I don't know how to write C extension yet :( Use numarray or Numeric, and use arrays for your points. Also, do you need the distance, or will the square of the distance do? (that'll save you a sqrt). import numarray as NA def distance_between_points(plist): # if plist was a list of N points in d dimensions, p is an array of size (N,d) p = NA.asarray(plist) N, d = p.shape # dist2 is a (N,N) array of squared distance dist2 = NA.zeros((N, N), type=p.type()) for i in range(d): dist2 += NA.subtract.outer(p[:,i],p[:,i])**2 return NA.sqrt(dist2) This should be almost as fast as writing your own C extension (and much easier...) -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke |cookedm(at)physics(dot)mcmaster(dot)ca Jul 18 '05 #2

 P: n/a myang wrote: Hi, I have thousands of points in 3D space, and I want to calcuate the distance between each other of them. I do the calculation with the following code. def distance(a,b): sum = 0 for i in range(len(a)): sum += (a[i]-b[i])**2 return sqrt(sum) Two obvious things. First, you say these are all 3D points, so there's not much point being general here. It is faster to get rid of the loop and just calculate the distance directly. Second, the ** operator seems to be a bit slower than multiplication, so when speed matters, you should multiply instead of raise powers if you can. So I would define distance as: def distance(a,b): x = a - b y = a - b z = a - b return sqrt(x*x + y*y + z*z) for i in range(len(plist)): for j in range(i+1,len(plist)): d = distance(plist[i],plist[j]) Function calls are rather expensive in Python. It would be much faster to calculate the distance right here, rather than call a function to do it. (Being that the distance formula is simple, it's probably not necessary to abstract it in a function.) But this is damn slow. Do you have any suggestion to speed it up? Numeric and/or numarray. These are some extension modules designed to do the math stuff a lot faster (numarray will supercede Numeric). Try Google. If I write the distance function in C extension, how faster will it be? (For the time being, I don't know how to write C extension yet :( It could be a lot faster. I would say the best tradeoff is to write whole loop (not just the distance function) in C. -- CARL BANKS http://www.aerojockey.com/software "If you believe in yourself, drink your school, stay on drugs, and don't do milk, you can get work." -- Parody of Mr. T from a Robert Smigel Cartoon Jul 18 '05 #3

 P: n/a In article , David M. Cooke wrote: Jul 18 '05 #4

 P: n/a At some point, cl****@lairds.com (Cameron Laird) wrote: In article , David M. Cooke wrote: . . .Use numarray or Numeric, and use arrays for your points. Also, do youneed the distance, or will the square of the distance do? (that'llsave you a sqrt).import numarray as NAdef distance_between_points(plist): # if plist was a list of N points in d dimensions, p is an array ofsize (N,d) p = NA.asarray(plist) N, d = p.shape # dist2 is a (N,N) array of squared distance dist2 = NA.zeros((N, N), type=p.type()) for i in range(d): dist2 += NA.subtract.outer(p[:,i],p[:,i])**2 return NA.sqrt(dist2)This should be almost as fast as writing your own C extension (andmuch easier...) . . . The "much easier" has me curious. While I'm as insistent as anyone about Python's universality, this is an area where I see it with no advantage over C. The simple-minded C homologue requires no memory management or dereferencing, at the source level, so it's as easy as C gets. Are you simply observing that, when working in Python, any require- ment to introduce the machinery necessary for a second language makes the problem not easy? Or is there another comparison between the two that I haven't noticed? Well, let me qualify: much easier for me, because 1) I have numarray installed, and am not worried about depending upon it 2) I know how to use it efficiently 3) I don't have to look up the sequence API. For a C extension, I presume if you're not using arrays (and lists instead), you'd be mucking around with pulling things out of lists, and creating a new one. I can easily imagine that code (with error checks, etc.) getting to be a few hundred lines. I haven't tried, but Pyrex probably won't be too much faster either, as it still has to index the lists. A C extension written to use numarray (or Numeric) arrays will be shorter and easier to write than the equivalent using lists. I'd first evaluate whether the procedure I wrote above is "good enough", though. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke |cookedm(at)physics(dot)mcmaster(dot)ca Jul 18 '05 #5

 P: n/a Carl Banks wrote: def distance(a,b): x = a - b y = a - b z = a - b return sqrt(x*x + y*y + z*z) This gives a nice speedup (about 2 times). However, using psyco -and using x*x instead of x**2- results in more than 10 times faster code on my machine, without losing the option of having a variable number of dimensions: from random import uniform from math import sqrt from time import time def random_points(limits,dims,n): def u(): return [uniform(*limits) for i in range(dims)] return [u() for i in xrange(n)] def distance(a,b): d = 0 for i in range(len(a)): x = a[i]-b[i] d += x*x return sqrt(d) def compute_distances(plist): result = [] for i in range(len(plist)-1): for j in range(i+1,len(plist)): result.append(distance(plist[i],plist[j])) return result def test(): n = 500 P = random_points([0,1000],3,n) t = time() D1 = compute_distances(P) print time()-t import psyco psyco.full() t = time() D2 = compute_distances(P) print time()-t assert D1==D2 if __name__=='__main__': test() output: 4.77999997139 0.389999866486 Anton Jul 18 '05 #6

### This discussion thread is closed

Replies have been disabled for this discussion. 