How to make this speed up

myang

Hi,

I have thousands of points in 3D space, and I want to calcuate the distance
between each other of them. I do the calculation with the following code.

def distance(a,b):
sum = 0
for i in range(len(a)):
sum += (a[i]-b[i])**2
return sqrt(sum)

for i in range(len(plist)):
for j in range(i+1,len(plist)):
d = distance(plist[i],plist[j])

But this is damn slow. Do you have any suggestion to speed it up? If I write
the distance function in C extension, how faster will it be? (For the time
being, I don't know how to write C extension yet :(

Regards,
Yang

Jul 18 '05 #1

Subscribe Post Reply

1626

David M. Cooke

At some point, "myang" <my***@clarku.edu> wrote:

Hi,

I have thousands of points in 3D space, and I want to calcuate the distance
between each other of them. I do the calculation with the following code.

def distance(a,b):
sum = 0
for i in range(len(a)):
sum += (a[i]-b[i])**2
return sqrt(sum)

for i in range(len(plist)):
for j in range(i+1,len(plist)):
d = distance(plist[i],plist[j])

But this is damn slow. Do you have any suggestion to speed it up? If I write
the distance function in C extension, how faster will it be? (For the time
being, I don't know how to write C extension yet :(

Use numarray or Numeric, and use arrays for your points. Also, do you
need the distance, or will the square of the distance do? (that'll
save you a sqrt).

import numarray as NA

def distance_between_points(plist):
# if plist was a list of N points in d dimensions, p is an array of size (N,d)
p = NA.asarray(plist)
N, d = p.shape
# dist2 is a (N,N) array of squared distance
dist2 = NA.zeros((N, N), type=p.type())
for i in range(d):
dist2 += NA.subtract.outer(p[:,i],p[:,i])**2
return NA.sqrt(dist2)

This should be almost as fast as writing your own C extension (and
much easier...)

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca

Jul 18 '05 #2

Carl Banks

myang wrote:

Hi,

I have thousands of points in 3D space, and I want to calcuate the distance
between each other of them. I do the calculation with the following code.

def distance(a,b):
sum = 0
for i in range(len(a)):
sum += (a[i]-b[i])**2
return sqrt(sum)
Two obvious things. First, you say these are all 3D points, so
there's not much point being general here. It is faster to get rid of
the loop and just calculate the distance directly. Second, the **
operator seems to be a bit slower than multiplication, so when speed
matters, you should multiply instead of raise powers if you can.

So I would define distance as:

def distance(a,b):
x = a[0] - b[0]
y = a[1] - b[1]
z = a[2] - b[2]
return sqrt(x*x + y*y + z*z)

for i in range(len(plist)):
for j in range(i+1,len(plist)):
d = distance(plist[i],plist[j])
Function calls are rather expensive in Python. It would be much
faster to calculate the distance right here, rather than call a
function to do it. (Being that the distance formula is simple, it's
probably not necessary to abstract it in a function.)

But this is damn slow. Do you have any suggestion to speed it up?
Numeric and/or numarray. These are some extension modules designed to
do the math stuff a lot faster (numarray will supercede Numeric).
Try Google.

If I write
the distance function in C extension, how faster will it be? (For the time
being, I don't know how to write C extension yet :(

It could be a lot faster. I would say the best tradeoff is to write
whole loop (not just the distance function) in C.
--
CARL BANKS http://www.aerojockey.com/software
"If you believe in yourself, drink your school, stay on drugs, and
don't do milk, you can get work."
-- Parody of Mr. T from a Robert Smigel Cartoon

Jul 18 '05 #3

Cameron Laird

In article <qn*************@arbutus.physics.mcmaster.ca>,
David M. Cooke <co**********@physics.mcmaster.ca> wrote:

Jul 18 '05 #4

David M. Cooke

At some point, cl****@lairds.com (Cameron Laird) wrote:

In article <qn*************@arbutus.physics.mcmaster.ca>,
David M. Cooke <co**********@physics.mcmaster.ca> wrote:
.
.
.
Use numarray or Numeric, and use arrays for your points. Also, do you
need the distance, or will the square of the distance do? (that'll
save you a sqrt).

import numarray as NA

def distance_between_points(plist):
# if plist was a list of N points in d dimensions, p is an array of
size (N,d)
p = NA.asarray(plist)
N, d = p.shape
# dist2 is a (N,N) array of squared distance
dist2 = NA.zeros((N, N), type=p.type())
for i in range(d):
dist2 += NA.subtract.outer(p[:,i],p[:,i])**2
return NA.sqrt(dist2)

This should be almost as fast as writing your own C extension (and
much easier...)

.
.
.
The "much easier" has me curious. While I'm as insistent
as anyone about Python's universality, this is an area
where I see it with no advantage over C. The simple-minded
C homologue requires no memory management or dereferencing,
at the source level, so it's as easy as C gets. Are you
simply observing that, when working in Python, any require-
ment to introduce the machinery necessary for a second
language makes the problem not easy? Or is there another
comparison between the two that I haven't noticed?

Well, let me qualify: much easier for me, because
1) I have numarray installed, and am not worried about depending upon
it
2) I know how to use it efficiently
3) I don't have to look up the sequence API. For a C extension, I
presume if you're not using arrays (and lists instead), you'd be
mucking around with pulling things out of lists, and creating a new
one. I can easily imagine that code (with error checks, etc.) getting
to be a few hundred lines. I haven't tried, but Pyrex probably won't
be too much faster either, as it still has to index the lists.

A C extension written to use numarray (or Numeric) arrays will be
shorter and easier to write than the equivalent using lists. I'd first
evaluate whether the procedure I wrote above is "good enough", though.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca

Jul 18 '05 #5

Anton Vredegoor

Carl Banks wrote:

def distance(a,b):
x = a[0] - b[0]
y = a[1] - b[1]
z = a[2] - b[2]
return sqrt(x*x + y*y + z*z)

This gives a nice speedup (about 2 times). However, using psyco -and
using x*x instead of x**2- results in more than 10 times faster code
on my machine, without losing the option of having a variable number
of dimensions:

from random import uniform
from math import sqrt
from time import time

def random_points(limits,dims,n):
def u():
return [uniform(*limits)
for i in range(dims)]
return [u() for i in xrange(n)]

def distance(a,b):
d = 0
for i in range(len(a)):
x = a[i]-b[i]
d += x*x
return sqrt(d)

def compute_distances(plist):
result = []
for i in range(len(plist)-1):
for j in range(i+1,len(plist)):
result.append(distance(plist[i],plist[j]))
return result

def test():
n = 500
P = random_points([0,1000],3,n)
t = time()
D1 = compute_distances(P)
print time()-t
import psyco
psyco.full()
t = time()
D2 = compute_distances(P)
print time()-t
assert D1==D2

if __name__=='__main__':
test()

output:

4.77999997139
0.389999866486
Anton

Jul 18 '05 #6

Similar topics

Make all pages .php?

by: Bruce W...1 | last post by:

This is a best practices question. For a website where PHP is used, does it make sense to have both .htm and .php files? In other words, for pages that have no active content there's no point...

PHP

how can I make this script shorter?

by: Lowell Kirsh | last post by:

I have a script which I use to find all duplicates of files within a given directory and all its subdirectories. It seems like it's longer than it needs to be but I can't figure out how to shorten...

Python

Do I make a lot of small Tables or a few big Tables?

by: mark1822 | last post by:

Hello, I am working on designing a Web site using PHP and MySQL and I am currently figuring out how to best build my MySQL database. My Web site is going to use a MySQL database extensively...

MySQL Database

354

c# is a good way to learn c

by: Montrose... | last post by:

After working in c# for a year, the only conclusion I can come to is that I wish I knew c. All I need is Linux, the gnu c compiler and I can do anything. Web services are just open sockets...

.NET Framework

How can I make this code to go faster???

by: Martin Ho | last post by:

This is the code I have so far, to generate 50.000 random numbers. This code will generate the same sequence every time I run it. But the problem being is that it's slow. It take on my p4 1.6gh...

Visual Basic .NET

how can I make this code run faster

by: Scott Emick | last post by:

I am using the following to compute distances between two lat/long coordinates for a store locator - (VB .NET 2003) it seems to take a long time to iterate through like 100-150 locations -...

Visual Basic .NET

my supervisor won't let me make my website in .net

by: stubbsie | last post by:

Hi, I have redesigned our official public government website in .net and it has taken me a few months to redo. I have been the sole designer of the website from its humble beginnning a few years...

.NET Framework

a method to make js have the ability to inherit

by: zzw8206262001 | last post by:

Hi,I find a way to make javescript more like c++ or pyhon There is the sample code: function Father(self) //every contructor may have "self" argument { self=self?self:this; ...

Javascript

Make This Subroutine Faster

by: vunet.us | last post by:

Is there a suggestion I can make this code run faster: if(document.getElementById("1")){ doOne(); } if(document.getElementById("2")){ doTwo(); } .......................

Javascript

make it faster

by: Simply_Red | last post by:

Hi, is there a way to make this function faster??? struct Points { double X; double Y; };

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice