443,705 Members | 2,017 Online
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,705 IT Pros & Developers. It's quick & easy.

# nearest neighbor in 2D

 P: n/a I have a list of two tuples containing x and y coord (x0, y0) (x1, y1) ... (xn, yn) Given a new point x,y, I would like to find the point in the list closest to x,y. I have to do this a lot, in an inner loop, and then I add each new point x,y to the list. I know the range of x and y in advance. One solution that comes to mind is to partition to space into quadrants and store the elements by quadrant. When a new element comes in, identify it's quadrant and only search the appropriate quadrant for nearest neighbor. This could be done recursively, a 2D binary search of sorts.... Can anyone point me to some code or module that provides the appropriate data structures and algorithms to handle this task efficiently? The size of the list will likely be in the range of 10-1000 elements. Thanks, John Hunter Jul 18 '05 #1
14 Replies

 P: n/a >>>>> "John" == John Hunter writes: John> Given a new point x,y, I would like to find the point in the list John> closest to x,y. I have to do this a lot, in an inner loop, and John> then I add each new point x,y to the list. I know the range of x John> and y in advance. John> One solution that comes to mind is to partition to space into John> quadrants and store the elements by quadrant. When a new element John> comes in, identify it's quadrant and only search the appropriate John> quadrant for nearest neighbor. This could be done recursively, a John> 2D binary search of sorts.... By recursion your solution would work in O(log n) time. The construction would take O(n log n) time. Unluckily, it can return the wrong point, as the nearest point within the nearest quadrant might not be the nearest point. The problem is a well-studied basic computational geometry problem, although I don't really know any Python code that actually do it. Try to look at the web for "Voronoi diagrams" and "radial triangulation" to understand how to solve it properly in the above mentioned (perhaps randomized) time complexity. Regards, Isaac. Jul 18 '05 #2

 P: n/a John Hunter wrote: I have a list of two tuples containing x and y coord (x0, y0) (x1, y1) ... (xn, yn) Given a new point x,y, I would like to find the point in the list closest to x,y. I have to do this a lot, in an inner loop, and then I add each new point x,y to the list. I know the range of x and y in advance. One solution that comes to mind is to partition to space into quadrants and store the elements by quadrant. When a new element comes in, identify it's quadrant and only search the appropriate quadrant for nearest neighbor. This could be done recursively, a 2D binary search of sorts.... Can anyone point me to some code or module that provides the appropriate data structures and algorithms to handle this task efficiently? The size of the list will likely be in the range of 10-1000 elements. Thanks, John Hunter You could to a for loop, and inside that loop you will have a variable lessest_distance. I dont know much geometric mathematics, but Im pretty sure you can use pytagoras stuff to find the lenght from (Xn,Yn) to (X,Y) using sinus cosinus and such. And when the function is finished, you should return lessest_distance Jul 18 '05 #3

 P: n/a John Hunter wrote: One solution that comes to mind is to partition to space into quadrants and store the elements by quadrant. When a new element comes in, identify it's quadrant and only search the appropriate quadrant for nearest neighbor. This could be done recursively, a 2D binary search of sorts.... What happens when you put a particle in near/at the boundary of a quadrant though? It's possible for the nearest neighbour to be in the nearest neighbour quadrant...although you could search over these as well. However, the number of independent areas implied by use of the word 'quadrant' suggests that this would be the same as iterating over all space.... :-) -- Graham Lee Wadham College Oxford Jul 18 '05 #4

 P: n/a At 2003-11-03T04:12:47Z, John Hunter writes: One solution that comes to mind is to partition to space into quadrants and store the elements by quadrant. When a new element comes in, identify it's quadrant and only search the appropriate quadrant for nearest neighbor. Erm, no. Imagine that your new point is in one corner of a quadrant. The other point in the quadrant is in the opposite corner. There is a point in the adjacent quadrant that is infinitessimaly close to your new point. That's where your algorithm breaks down. -- Kirk Strauser The Day Companies Jul 18 '05 #5

 P: n/a >>>>> "Ron" == Ron Adam writes: Ron> I really don't know how you can make this faster. There Ron> might be a library that has a distance between two points Ron> function that could speed it up. If you only had to compare one point to all the other points, then the brute force approach -- check every point -- will work great. This is O(N) and I don't think you can beat it. The idea is that I will be repeatedly checking and adding points to the list, so it is worthwhile at the outset to set up a data structure which allows a more efficient search. The analogy is a binary search in 1D. If you plan to repeatedly search a (possibly growing) list of numbers to see whether it contains some number or find the nearest neighbor to a number, it is worthwhile at the outset to put them in a data structure that allows a binary search. Setting up the initial data structure costs you some time, but subsequent searches are O(log2(N)). See google for 'binary search' and the python module bisect. So roughly, for a list with 1,000,000 elements, your brute force approach requires a million comparisons per search. If the data is setup for binary search, on average only 13-14 comparisons will be required. Well worth the effort if you need to do a lot of searches, as I do. John Hunter Jul 18 '05 #6

 P: n/a On Tue, 04 Nov 2003 08:25:36 -0800, David Eppstein wrote: In article , Peter Otten <__*******@web.de> wrote: This is of course all premature optimization as the most promising approach is to try hard to reduce the number of candidate points, as David Eppstein seems to have done. But then, he could use complex numbers, too.Well, yes, but then my code wouldn't work very well in dimensions higherthan two... I rearranged my first example to match the output of yours and used a random number seed to get identical results. Moving the square root to the return line of the find shortest distance function increased the speed of my routine about 20%. Then using the p*p form instead of p**2 added anouther 4%. With printing turned there is only a very small difference. Of course printing is the bottle neck. Turning off printing resulted in the following. All are best of 3 runs. 1000 points: Standard loop: 0:00:00.958192 Kdtree: 0:00:00.248096 Quite a difference. I'm not quite sure how kdtree's work. (yet) But they can be worth while when working with large data sets. The standard loop seems to be better for small lists of < 100 points. 100 points: Standard loop: 0:00:00.009966 kdtree 0:00:00.015247 But for really large lists. 10000 points: Standard loop: 0:01:39.246454 kdtree 0:00:03.424873 Hehe.... no comparison. The number of calculations the standard loop does: 100 new points, 4950 distance calculations. 1000 new points, 499500 distance calculations. 10000 new points, 49995000 distance calculations. And I don't know how to figure it for kdtree. But we can estimate it by using the ratio of the speeds. 1000 points: kdtree (3.42/99.25)*49995000 = 1722749.62 est. dist. calculations. There's probably a better way to do that. Python is fun to do this stuff with. Playing around like this with other languages is just too much trouble. _Ron Jul 18 '05 #7

 P: n/a On Tue, Nov 04, 2003 at 01:02:36PM -0600, John Hunter wrote:>> "Ron" == Ron Adam writes: Ron> I really don't know how you can make this faster. There Ron> might be a library that has a distance between two points Ron> function that could speed it up. If you only had to compare one point to all the other points, then the brute force approach -- check every point -- will work great. This is O(N) and I don't think you can beat it. The idea is that I will be repeatedly checking and adding points to the list, so it is worthwhile at the outset to set up a data structure which allows a more efficient search. The analogy is a binary search in 1D. If you plan to repeatedly search a (possibly growing) list of numbers to see whether it contains some number or find the nearest neighbor to a number, it is worthwhile at the outset to put them in a data structure that allows a binary search. Setting up the initial data structure costs you some time, but subsequent searches are O(log2(N)). See google for 'binary search' and the python module bisect. So roughly, for a list with 1,000,000 elements, your brute force approach requires a million comparisons per search. If the data is setup for binary search, on average only 13-14 comparisons will be required. Well worth the effort if you need to do a lot of searches, as I do. John Hunter Breaking into the thread late, I've been busy enough to put down c.l.py for a couple weeks. If you only need to compare 10-1000 points, try this approach below. I wrote it for the ICFP programming contest where I was sorting lots and lots of points lots and lots of times. It sorts a list of points for their manhattan distance from a particular point. I tweaked it until it was just fast enough to do what I wanted. I won't pretend it is optimal for all N, just that it was good enough for _my_ N. The speed trick is that the point we are sorting around is stored in an object that is created only once, and then we make the object's __call__ be the comparison function. Since it uses the list.sort() method we don't have to do anything more clever than write our comparison function in C. I trust the Timbot has written a better list.sort() than I ever will, so let's use it. it is used thusly: import icfp l = [your big list of point tuples like (1, 10)] p = (99, 22) # find nearest point in l to p sort_ob = icfp.manhattan_sort(p) # creates a callable Manhat object that remembers p l.sort(sort_ob) # sorts around p print "closest", l[0] print "farthest", l[-1] The below code is from my CVS tree, so it should probably work. It was written in a rush for a contest (JDH, you may recognize parts that are cut-n-pasted from "probstat") so YMMV. -jack NB, if there is a boost or pyrex guy listening you might want to throw in an easilly derived class that makes this [ab]use of __call__ easy. It was a big performance win for me for very little effort. """ icfp_module.c """ #include "Python.h" #include #include /* * stats module interface */ static PyObject *ErrorObject; PyObject *manhat_new(PyObject *self, PyObject *args); static PyTypeObject Manhat_Type; static PyObject * heur1(PyObject *self, PyObject *args) { PyObject *tup1; PyObject *tup2; long x,y; if (!PyArg_ParseTuple(args, "O!O!", &PyTuple_Type, &tup1, &PyTuple_Type, &tup2)) { return NULL; } x = PyInt_AsLong(PyTuple_GET_ITEM(tup1,0)) - PyInt_AsLong(PyTuple_GET_ITEM(tup2,0)); y = PyInt_AsLong(PyTuple_GET_ITEM(tup1,1)) - PyInt_AsLong(PyTuple_GET_ITEM(tup2,1)); return PyInt_FromLong(labs(x * x) + labs(y * y)); } static PyMethodDef stats_methods[] = { {"manhattan", heur1, METH_VARARGS}, {"manhattan_sort", manhat_new, METH_VARARGS}, {NULL, NULL} /* sentinel */ }; DL_EXPORT(void) initicfp(void) { PyObject *m, *d; PyPQueue_Type.ob_type = &PyType_Type; Manhat_Type.ob_type = &PyType_Type; /* Create the module and add the functions */ m = Py_InitModule("icfp", stats_methods); /* Add some symbolic constants to the module */ d = PyModule_GetDict(m); ErrorObject = PyErr_NewException("icfp.error", NULL, NULL); } """ manhattan.c """ #include "Python.h" #include #define ManhatObject_Check(v) ((v)->ob_type == &Manhat_Type) staticforward PyTypeObject Manhat_Type; typedef struct { PyObject_HEAD long x; long y; } ManhatObject; static void Manhat_dealloc(ManhatObject *self) { PyObject_Del(self); } static PyObject * Manhat_call(ManhatObject *self, PyObject *args) { PyObject *tup1; PyObject *tup2; long a; long b; long x; long y; if (!PyArg_ParseTuple(args, "O!O!", &PyTuple_Type, &tup1, &PyTuple_Type, &tup2)) { return NULL; } x = PyInt_AsLong(PyTuple_GET_ITEM(tup1,0)) - self->x; y = PyInt_AsLong(PyTuple_GET_ITEM(tup1,1)) - self->y; a = labs(x * x) + labs(y * y); x = PyInt_AsLong(PyTuple_GET_ITEM(tup2,0)) - self->x; y = PyInt_AsLong(PyTuple_GET_ITEM(tup2,1)) - self->y; b = labs(x * x) + labs(y * y); if (a == b) return PyInt_FromLong(0); else if (a < b) return PyInt_FromLong(-1); else return PyInt_FromLong(1); } static PyMethodDef Manhat_methods[] = { {NULL, NULL} /* sentinel */ }; statichere PyTypeObject Manhat_Type = { /* The ob_type field must be initialized in the module init function * to be portable to Windows without using C++. */ PyObject_HEAD_INIT(&PyType_Type) 0, /*ob_size*/ "Manhat", /*tp_name*/ sizeof(ManhatObject), /*tp_basicsize*/ 0, /*tp_itemsize*/ /* methods */ (destructor)Manhat_dealloc, /*tp_dealloc*/ 0, /*tp_print*/ 0, /*tp_getattr*/ 0, //(setattrfunc)Permute_setattr, /*tp_setattr*/ 0, /*tp_compare*/ 0, /*tp_repr*/ 0, /*tp_as_number*/ 0, /*tp_as_sequence*/ 0, /*tp_as_mapping*/ 0, /*tp_hash*/ (ternaryfunc)Manhat_call, /*tp_call*/ 0, /*tp_str*/ PyObject_GenericGetAttr, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/ Py_TPFLAGS_DEFAULT, /*tp_flags*/ 0, /*tp_doc*/ 0, /*tp_traverse*/ 0, /*tp_clear*/ 0, /*tp_richcompare*/ 0, /*tp_weaklistoffset*/ 0, /*tp_iter*/ 0, /*tp_iternext*/ Manhat_methods, /*tp_methods*/ 0, /*tp_members*/ 0, /*tp_getset*/ 0, /*tp_base*/ 0, /*tp_dict*/ 0, /*tp_descr_get*/ 0, /*tp_descr_set*/ 0, /*tp_dictoffset*/ 0, /*tp_init*/ 0, /*tp_alloc*/ 0, /*tp_new*/ 0, /*tp_free*/ 0, /*tp_is_gc*/ }; // not static so ifcp_module.c can see it PyObject * manhat_new(PyObject *self, PyObject *args) { ManhatObject *mh; PyObject *tup1; if (!PyArg_ParseTuple(args, "O!", &PyTuple_Type, &tup1)) { return NULL; } // call object create function and return mh = PyObject_New(ManhatObject, &Manhat_Type); if (mh == NULL) return NULL; mh->x = PyInt_AsLong(PyTuple_GET_ITEM(tup1,0)); mh->y = PyInt_AsLong(PyTuple_GET_ITEM(tup1,1)); return (PyObject *)mh; } """ setup.py """ from distutils.core import setup, Extension files = [ 'manhattan.c', 'icfp_module.c', ] libraries = [] #libraries = ["efence"] # uncomment to use ElectricFence includes = [] if (__name__ == '__main__'): setup(name = "icfp", version = "0.2", ext_modules = [Extension("icfp", files, libraries = libraries, include_dirs = includes, ) ] ) Jul 18 '05 #8

 P: n/a On Tue, 04 Nov 2003 17:13:58 +0100, Peter Otten <__*******@web.de> wrote: Alex Martelli wrote: Andrew Dalke wrote: Ron Adam for point in p: distance = math.sqrt((new_point[0]-point[0])**2 \ +(new_point[1]-point[1])**2) I really don't know how you can make this faster. There might be a Hmmm, that's what math.hypot is for, isn't it...? [alex@lancelot Lib]\$ timeit.py -c -s'import math; p=1.6,2.5; np=2.4,1.3' 'math.sqrt((np[0]-p[0])**2 + (np[1]-p[1])**2)' 100000 loops, best of 3: 3 usec per loop [alex@lancelot Lib]\$ timeit.py -c -s'import math; p=1.6,2.5; np=2.4,1.3' 'math.hypot(np[0]-p[0], np[1]-p[1])' 100000 loops, best of 3: 1.9 usec per loop library that has a distance between two points function that could speed it up. An easy way is to move the math.sqrt call outside the loop, since sqrt(d1) < sqrt(d2) iff d1 < d2 (when d1,d2>=0) Yes, omitting the math.sqrt gives the same speed as calling math.hypot, and it's the classic solution to speed up minimum-distance problems. I vaguely suspect you could shave some further fraction of a microsecond by saving those differences as dx and dy and then computing dx*dx+dy*dy -- since another classic tip is that a**2 is slower than a*2. Let's see...: [alex@lancelot Lib]\$ timeit.py -c -s'import math; p=1.6,2.5; np=2.4,1.3' 'dx=np[0]-p[0]; dy=np[1]-p[1]; disq=dx*dx+dy*dy' 1000000 loops, best of 3: 1.39 usec per loop ...yep, another small enhancement. Ain't measuring _FUN_?-) Finally found an application for complex numbers: ...> timeit.py -s"p= 1.6+2.5j; np=2.4+1.3j" "d=abs(p-np)" 1000000 loops, best of 3: 0.436 usec per loop ...> timeit.py -s"p= 1.6,2.5; np=2.4,1.3" "dx=np[0]-p[0]; dy=np[1]-p[1];d=dx*dx+dy*dy" 1000000 loops, best of 3: 1.15 usec per loop This is of course all premature optimization as the most promising approach is to try hard to reduce the number of candidate points, as David Eppstein seems to have done. But then, he could use complex numbers, too. Peter I am new to timeit.py, but this is odd. jim@grendel:~\$ /usr/lib/python2.3/timeit.py -c ' p=1.6+2.5j;np=2.4+1.3j; d=abs(p-np)' 100000 loops, best of 3: 3.1 usec per loop vs jim@grendel:~\$ /usr/lib/python2.3/timeit.py -c -s'import math; p=1.6+2.5j;np=2.4+1.3j; d=abs(p-np)' 10000000 loops, best of 3: 0.141 usec per loop Is it because the builtin math functions are much slower? -- Jim Richardson http://www.eskimo.com/~warlock Televangelists: The Pro Wrestlers of Religion Jul 18 '05 #9

 P: n/a On Mon, 03 Nov 2003 22:15:34 -0600, John Hunter wrote: I had two use cases just yesterday. The one that prompted thequestion arose in making a contour plot. I'm defining a contour as anordered sequence of values over a 2D MxN matrix where the valuesdiffer from some target value by at most some tolerance. I maintain alist of i,j indices into the matrix for a given contour value, andfollow the contour from a given i,j location by examining itsneighbors. In order to close the loop (eg, when the contour finderhas passed once around a level curve of a mountain, I want to testwhether a given point i,j is close to a previously discovered pointk,l. Since I have a list of these 2 tuple coordinates, I want to findthe nearest neighbor in the list and terminate the contour when thenearest neighbor falls within some minimum distance 3 4 5 2 613 1 712 8 11 10 9In the example above, I am traversing a contour identifying points inthe order 1,2,3...; as above each point represents an i,j tuple whichis an index into the matrix I am contouring. I would like toterminate the search at 13 rather than spiral around the existingcontour 1-12. Each time I add a new point to the contour, I would liketo query the existing list (excluding the most recently added pointswhich are close by construction) of locations and find the minimumdistance. If I'm not too close to the preexisting contour, I add thenew point and proceed.As I write this I realize there is an important efficiency. Sincefrom an existing point I add the closest neighbor, the biggest step Ican make is 1,1. If on the last nearest neighbor query I find aminimum distance of d, it will take me d minimum steps to approach theexisting contour. So I don't need to check the distance again for atleast d steps. So the algorithm can proceed 1) obtain the distance dfrom the existing contour to the most recently obtained point 2) maked steps adding points that meet the value criteria 3) repeat.The second use case arose with gdmodule, which can only allocate 256colors, which I cache as a dict from rgb tuples (eg, 0.0, 0.05, 1.0)to color. When the total number of color allocations is made, and anew rgb request comes into the color manager, I pick the alreadyallocated point in rgb space closest to the requested point.I'll try David Eppstein's approach tomorrow and see how this fares.Thanks to all for suggestions,John Hunter Ah, a contour map. Maybe something like this? """ Where pointA and pointB are constitutive points in a list, and pointC is a new point from a list of new points. For each pointC in a list of new points. For each consecutive 2 points in a list of sequential points. If lineAC < lineAB and lineBC < lineAB Insert pointC between pointA and pointB If pointC was not placed in list of sequential points. Where pointA and pointB are the beginning and end points of the list. IF lineAC < lineBC Add pointC to beginning of list. Else add pointC to end of list. When done copy point from beginning of list to end of list to complete polygon. """ Just knowing the closest point isn't quite enough because it doesn't tell you weather to put it in front or behind the point in the list. Storing the distance to the next point along with each point might make it work faster. This method has an advantage in that it doesn't have to go through the whole list. You could start from the closest end of the list and maybe make it quicker. One catch is you need to know in advance that the set of points are not divided by a hill or valley. I'm not sure what this would do with a list of random points. Maybe a long squiggly line where the beginning to end segment cuts across them. I don't think you will have that problem. _Ron Jul 18 '05 #10

 P: n/a Hello John, Can anyone point me to some code or module that provides the appropriate data structures and algorithms to handle this task efficiently? The size of the list will likely be in the range of 10-1000 elements. boost (http://www.boost.org) has a graph library and a good connection to Python using Boost.Python. HTH. Miki Jul 18 '05 #11

 P: n/a Jim Richardson wrote: I am new to timeit.py, but this is odd. jim@grendel:~\$ /usr/lib/python2.3/timeit.py -c ' p=1.6+2.5j;np=2.4+1.3j; d=abs(p-np)' 100000 loops, best of 3: 3.1 usec per loop The above is actually timed. jim@grendel:~\$ /usr/lib/python2.3/timeit.py -c -s'import math; p=1.6+2.5j;np=2.4+1.3j; d=abs(p-np)' 10000000 loops, best of 3: 0.141 usec per loop Here you put everything in the setup parameter. As the timed statement defaults to pass, you are essentially timing an empty loop :-( Is it because the builtin math functions are much slower? You should have used math.abs() or from math import abs to really get the abs() function in the math module. But "abs" in dir(math) False Not there, so we cannot compare. Peter Jul 18 '05 #12

 P: n/a John Hunter wrote: I have a list of two tuples containing x and y coord (x0, y0) (x1, y1) ... (xn, yn) Given a new point x,y, I would like to find the point in the list closest to x,y. I have to do this a lot, in an inner loop, and then I add each new point x,y to the list. I know the range of x and y in advance. Can anyone point me to some code or module that provides the appropriate data structures and algorithms to handle this task efficiently? The size of the list will likely be in the range of 10-1000 elements. The plotting-library dislin (http://www.dislin.com) contains a really fast triangulation subroutine (~1 hour for 700000 points on an 1.5 GHz Athlon). For an example triangulation/nearest-neighbor-search see the attached python-script. Dislin is available for many platforms (Linux, Winxxx, ...) and it can be used for free if you are using free languages like Python, Perl, etc. Happy pythoning Herbert #!/usr/bin/python # (c) by H. Weinhandl 04.Nov.2003 import math import dislin def dist( ia,ib ) : return math.sqrt( (X1[ia]-X1[ib])**2 + (Y1[ia]-Y1[ib])**2 ) def find_nearest_neighb() : print 'extracting neighbours ... ', for i in range( nr_dat+3 ) : # initualize list wit the point i itself neighb.append( [i] ) for i in range( nr_tri ) : # get a indes-triple, dislin seems to start indices with 1, # so I'm subtracting 1 to get a zero-based index U,V,W = I1[i]-1, I2[i]-1, I3[i]-1 # add indices from all triangles, which contain the point itself if ( U in neighb[u] ) : if not (V in neighb[u] ) : neighb[u].append( V ) # do not append V if already in the list if not (W in neighb[u] ) : neighb[u].append( W ) # do not append W if already in the list if ( V in neighb[V] ) : if not (U in neighb[V] ) : neighb[V].append( U ) # do not append U if already in the list if not (W in neighb[V] ) : neighb[V].append( W ) # do not append W if already in the list if ( W in neighb[W] ) : if not (U in neighb[W] ) : neighb[W].append( U ) # do not append U if already in the list if not (V in neighb[W] ) : neighb[W].append( V ) # do not append V if already in the list print ' OK' for i in range( nr_dat ) : neighb[i].remove( i ) # remove the point i itself neighb[i].sort() r_mi = 9.9e9 i_mi = i # search for the nearest neighbor of point i for j in neighb[i] : r = dist( i, j ) if r < r_mi : r_mi = r i_mi = j print 'NB %2d : r_mi=%5.3f, i_mi=%2d '% (i, r_mi, i_mi), neighb[i] if __name__ == '__main__' : X1 = [ 1.0, 2.0, 3.0, 4.0, 5.0, 4.0, 3.0 ] Y1 = [ 1.0, 6.0, 3.0, 3.0, 2.0, 6.0, 0.0 ] nr_dat = len( X1 ) nr_max = 2 * nr_dat + 1 I1 = [ 0 ] * nr_max I2 = [ 0 ] * nr_max I3 = [ 0 ] * nr_max neighb = [ ] # padding the 2 input-lists with 3 additional elements is required by dislin X1.append( 0 ) X1.append( 0 ) X1.append( 0 ) Y1.append( 0 ) Y1.append( 0 ) Y1.append( 0 ) # delaunay triangulation of the input lists # I1,I2,I3 are the indices of the triangular edge-points nr_tri = dislin.triang( X1,Y1, nr_dat, I1,I2,I3, nr_max ) print X1 print Y1 print nr_dat, nr_tri, nr_max print I1 print I2 print I3 find_nearest_neighb() #---- end ---- Jul 18 '05 #13

 P: n/a -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 05 Nov 2003 09:40:23 +0100, Peter Otten <__*******@web.de> wrote: Jim Richardson wrote: I am new to timeit.py, but this is odd. jim@grendel:~\$ /usr/lib/python2.3/timeit.py -c ' p=1.6+2.5j;np=2.4+1.3j; d=abs(p-np)' 100000 loops, best of 3: 3.1 usec per loop The above is actually timed. jim@grendel:~\$ /usr/lib/python2.3/timeit.py -c -s'import math; p=1.6+2.5j;np=2.4+1.3j; d=abs(p-np)' 10000000 loops, best of 3: 0.141 usec per loop Here you put everything in the setup parameter. As the timed statement defaults to pass, you are essentially timing an empty loop :-( Ah! that explains it, thanks. breaking the import off, gets the loop to 3.2 usec per loop, which makes more sense. Is it because the builtin math functions are much slower? You should have used math.abs() or from math import abs to really get the abs() function in the math module. But "abs" in dir(math) False Not there, so we cannot compare. Actually, I just cut'n'pasted, with a tiny alteration (timeit.py isn't in my path) I like the idea of using the complex number there, it's a tad easier that the other method for me to visualize. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQE/qbFpd90bcYOAWPYRArgzAKDiNBXw3tIpooWqglmxoqn2Gebyxg CcDEKV u60SPPopUrkJ1Kak0t4eias= =+Guh -----END PGP SIGNATURE----- -- Jim Richardson http://www.eskimo.com/~warlock "Man invented language to satisfy his deep need to complain." -- Lily Tomlin Jul 18 '05 #14

 P: n/a This enquiry has generated a lot of interest. I thought it might be useful to try numarray on the problem. numarray has a compress function, which is used to discard points which are not of interest. The code is below. As would be expected, each scan over the points in the neigbourhood discards, on average, a little more than half the points. I have not tried it, but it should be possible, with numarray, to generalize this from 2D to multimimensional space. Colin W. # Neighbour.py ''' To find the closest neighbour, in a neghbourhood P, to a point p. ''' import math import numarray as N import numarray.numerictypes as _nt import numarray.random_array as R trial= 0 lengthRemaining= [] def find(P, p): ''' In the 2D neighbourhood P, find the point closest to p. The approach is based on the selection of a trial value Pt, from P, and discarding all values further than Pt from p. To avoid repeated sqrt calculations the discard is based on an enclosing square. ''' global lengthRemaining, trial lengthRemaining+= [[P.shape[0]]] Pz= P - p # zero based neighbourhood while len(Pz): Pt= Pz[0] # trial value Pta= N.abs(Pt) Pz= Pz[1:] pd= math.sqrt(N.dot(Pta, Pta)) # distance of p from the trial value goodCases= N.logical_and((Pz < pd), (Pz > -pd))# use the enclosing square goodCases= N.logical_and.reduce(goodCases, 1) Pz= N.compress(goodCases, Pz) # discard points outside the square if len(Pz) == 1: Pt= Pz[0] # We have found the closest Pz= [] lengthRemaining[trial]+= [len(Pz)] z= 100 trial+= 1 return Pt + p if __name__ == '__main__': for sampleSize in range(100, 5000, 100): P= R.random(shape= (sampleSize, 2)) for i in range(20): p= R.random((1, 2)) # point a= find(P, p) ## print 'Closest neighbour:', a[0] ## print 'Check - Point(p):', p[0] ## check= [] ## for i in range(len(P)): ## check+= [(math.sqrt((P[i, 0]-p[0, 0])**2 + (P[i, 1]-p[0, 1])**2), P[i, 0], P[i, 1])] ## print P[i], math.sqrt((P[i, 0]-p[0, 0])**2 + (P[i, 1]-p[0, 1])**2) ## check.sort() ## print check[0] ## print check print 'Number of scans:', sum([len(lst) for lst in lengthRemaining]) print 'Sample size:', P.shape[0], ' Average numner of scans:', sum([len(lst) for lst in lengthRemaining])/float(len(lengthRemaining)) WEINHANDL Herbert wrote: John Hunter wrote: I have a list of two tuples containing x and y coord (x0, y0) (x1, y1) ... (xn, yn) Given a new point x,y, I would like to find the point in the list closest to x,y. I have to do this a lot, in an inner loop, and then I add each new point x,y to the list. I know the range of x and y in advance. Can anyone point me to some code or module that provides the appropriate data structures and algorithms to handle this task efficiently? The size of the list will likely be in the range of 10-1000 elements. The plotting-library dislin (http://www.dislin.com) contains a really fast triangulation subroutine (~1 hour for 700000 points on an 1.5 GHz Athlon). For an example triangulation/nearest-neighbor-search see the attached python-script. Dislin is available for many platforms (Linux, Winxxx, ...) and it can be used for free if you are using free languages like Python, Perl, etc. Happy pythoning Herbert ------------------------------------------------------------------------ #!/usr/bin/python # (c) by H. Weinhandl 04.Nov.2003 import math import dislin def dist( ia,ib ) : return math.sqrt( (X1[ia]-X1[ib])**2 + (Y1[ia]-Y1[ib])**2 ) def find_nearest_neighb() : print 'extracting neighbours ... ', for i in range( nr_dat+3 ) : # initualize list wit the point i itself neighb.append( [i] ) for i in range( nr_tri ) : # get a indes-triple, dislin seems to start indices with 1, # so I'm subtracting 1 to get a zero-based index U,V,W = I1[i]-1, I2[i]-1, I3[i]-1 # add indices from all triangles, which contain the point itself if ( U in neighb[u] ) : if not (V in neighb[u] ) : neighb[u].append( V ) # do not append V if already in the list if not (W in neighb[u] ) : neighb[u].append( W ) # do not append W if already in the list if ( V in neighb[V] ) : if not (U in neighb[V] ) : neighb[V].append( U ) # do not append U if already in the list if not (W in neighb[V] ) : neighb[V].append( W ) # do not append W if already in the list if ( W in neighb[W] ) : if not (U in neighb[W] ) : neighb[W].append( U ) # do not append U if already in the list if not (V in neighb[W] ) : neighb[W].append( V ) # do not append V if already in the list print ' OK' for i in range( nr_dat ) : neighb[i].remove( i ) # remove the point i itself neighb[i].sort() r_mi = 9.9e9 i_mi = i # search for the nearest neighbor of point i for j in neighb[i] : r = dist( i, j ) if r < r_mi : r_mi = r i_mi = j print 'NB %2d : r_mi=%5.3f, i_mi=%2d '% (i, r_mi, i_mi), neighb[i] if __name__ == '__main__' : X1 = [ 1.0, 2.0, 3.0, 4.0, 5.0, 4.0, 3.0 ] Y1 = [ 1.0, 6.0, 3.0, 3.0, 2.0, 6.0, 0.0 ] nr_dat = len( X1 ) nr_max = 2 * nr_dat + 1 I1 = [ 0 ] * nr_max I2 = [ 0 ] * nr_max I3 = [ 0 ] * nr_max neighb = [ ] # padding the 2 input-lists with 3 additional elements is required by dislin X1.append( 0 ) X1.append( 0 ) X1.append( 0 ) Y1.append( 0 ) Y1.append( 0 ) Y1.append( 0 ) # delaunay triangulation of the input lists # I1,I2,I3 are the indices of the triangular edge-points nr_tri = dislin.triang( X1,Y1, nr_dat, I1,I2,I3, nr_max ) print X1 print Y1 print nr_dat, nr_tri, nr_max print I1 print I2 print I3 find_nearest_neighb() #---- end ---- Jul 18 '05 #15

### This discussion thread is closed

Replies have been disabled for this discussion.