473,395 Members | 1,702 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Best match searching

Hi everyone.
I was wondering if anyone might be able to help me out here. I'm currently
looking to find the quickest way to find a best fit match in a large array.
My problem is that I have an array of, say, 600*400, which contains a value
at each point, and I need to find the value in that array which is closest
to the input value. It's basically some euclidean distances that I've
calculated, and I need to be able to find the best matches over a large
number of input values, so I was looking for a speedy solution to this.
The array is implemented simply using a list of lists. The only solution I
could think of is to use some for statements, but that seems to take a
while, even for just one search.
Any hints or tips would be really helpful. :-)

Daniel

__________________________________________________ _______________
Express yourself with cool new emoticons http://www.msn.co.uk/specials/myemo
Jul 18 '05 #1
5 2527
Daniel Pryde wrote:
Hi everyone.
I was wondering if anyone might be able to help me out here. I'm
currently looking to find the quickest way to find a best fit match in a
large array.
My problem is that I have an array of, say, 600*400, which contains a
value at each point, and I need to find the value in that array which is
closest to the input value. It's basically some euclidean distances that
I've calculated, and I need to be able to find the best matches over a
large number of input values, so I was looking for a speedy solution to
this.
The array is implemented simply using a list of lists. The only solution
I could think of is to use some for statements, but that seems to take a
while, even for just one search.
Any hints or tips would be really helpful. :-)

Daniel

__________________________________________________ _______________
Express yourself with cool new emoticons
http://www.msn.co.uk/specials/myemo

I'm not an expert, but I'd use numarray module. The most simple
example can look like:

a = numarray.array(....) # An array to look in
value = x # Value to look for
c = a.copy() # Create a copy
print numarray.abs(a.copy() - value).argmin() # Finds minimal absolute
difference

If performance of numarray is still too slow, I'd resort to simple
hasing scheme: you can split your inputs in reasonable ranges and look
for closest match only in this ranges.

hth,
anton.
Jul 18 '05 #2
On Mon, 29 Dec 2003 12:14:42 +0000, rumours say that "Daniel Pryde"
<ab********@hotmail.com> might have written:
Hi everyone.
I was wondering if anyone might be able to help me out here. I'm currently
looking to find the quickest way to find a best fit match in a large array. [snip]It's basically some euclidean distances that I've
calculated, and I need to be able to find the best matches over a large
number of input values, so I was looking for a speedy solution to this.


These might be helpful:

http://groups.google.com.gr/groups?s....austin.rr.com

or

http://groups.google.com.gr/groups?t...hon.org&rnum=1

groups.google.com is your friend :)
--
TZOTZIOY, I speak England very best,
Ils sont fous ces Redmontains! --Harddix
Jul 18 '05 #3
"Daniel Pryde" <ab********@hotmail.com> wrote in message
news:ma***********************************@python. org...
Hi everyone.
I was wondering if anyone might be able to help me out here. I'm currently
looking to find the quickest way to find a best fit match in a large array. My problem is that I have an array of, say, 600*400, which contains a value at each point, and I need to find the value in that array which is closest
to the input value. <snip> Any hints or tips would be really helpful. :-)

Daniel

I think inverting your matrix to a dictionary may be more "Pythonic". The
included snippet constructs an array of values, then creates a dictionary of
which values are located at which positions in the array (also handles
possibility of duplicates in the array). If you change ROWS and COLS to
your problem values of 400 and 600, the matrix-to-dictionary inversion takes
a bit of time - but the searches are blindingly fast.

-- Paul
# mat2Dict.py
from random import random
import pprint

ROWS = 4
COLS = 6

# create matrix
print "creating data matrix"
matrix = [ [ random() for i in range(COLS) ] for j in range(ROWS) ]

print "create dictionary of values"
valdict = {}
for i,row in enumerate(matrix):
for j,elem in enumerate(row):
if valdict.has_key(elem):
valdict[elem].append( (i,j) )
else:
valdict[elem] = [ (i,j) ]

# uncomment this line to view the dictionary of values
# pprint.pprint( valdict )

matvals = valdict.keys()
matvals.sort()
print len(matvals),"unique values in matrix"

def findClosestVal( searchval ):
if searchval <= matvals[0]:
return matvals[0]
if searchval >= matvals[-1]:
return matvals[-1]

# do binary search - see
http://www.tbray.org/ongoing/When/20...3/03/22/Binary
hi = len(matvals)
lo = -1
while (hi-lo) > 1:
loc = ( hi + lo ) / 2
if matvals[loc] > searchval:
hi = loc
else:
lo = loc

if abs(searchval-matvals[lo]) < abs(searchval-matvals[hi]):
return matvals[lo]
else:
return matvals[hi]

# look up some values
for v in range(10):
vv = v/10.
closestVal = findClosestVal( vv )
print vv, "->", closestVal, "occurring at", valdict[closestVal]

Jul 18 '05 #4

"Daniel Pryde" <ab********@hotmail.com> wrote in message
news:ma***********************************@python. org...
Hi everyone.
I was wondering if anyone might be able to help me out here. I'm currently looking to find the quickest way to find a best fit match in a large array. My problem is that I have an array of, say, 600*400, which contains a value at each point, and I need to find the value in that array which is closest to the input value. It's basically some euclidean distances that I've
calculated, and I need to be able to find the best matches over a large
number of input values, so I was looking for a speedy solution to this.
The array is implemented simply using a list of lists. The only solution I could think of is to use some for statements, but that seems to take a
while, even for just one search.
Any hints or tips would be really helpful. :-)


If the values in the array have any structure or constraints, try to
exploit that. If they are ordered in both directions, use binary search on
one axis and then trace the zigzag frontier between lower and height
values. If there are just a 'few' values (as in 0-255), then a dict might
work, though changing values would complicate this.

Terry J. Reedy
Jul 18 '05 #5
Daniel Pryde wrote:
The array is implemented simply using a list of lists. The only solution
I could think of is to use some for statements, but that seems to take a
while, even for just one search.

Ahh this brings back memories of my image processing days :)

I've implemented a couple of approaches.
1) using just one list to hold the data
2) using a list of lists
3) using numeric

I've optimized the search function for 1 and 2 a little bit by defining
the scoring function on the fly like this and binding input during the
function definition:

input = 0.5
def euclidean(x, input=input): return abs(x-input)

We can also use a bounding box technique such that if a minimum x is
found and the x < input then we can reject all points less than x
without having to perform the euclidean distance. Conversely if x >
input then we can reject all points greater than x. This really only
works if x is a single value and not a vector.

Don't forget that when looking for minimums using a euclidean distance
you don't have to perform the square root. (x1*x1+x2*x2) will find the
correct minimum as well as opposed to the more time consuming
math.sqrt(x1*x1+x2*x2)

Here are the results looking for data closest to an input value of 0.5
using a pretty loaded pentium 4 2ghz. The result is for a single search
against a random array of 600x400.

time row col value
0.130000114441 396 328 0.500001351171 single list
0.119999885559 396 328 0.500001351171 list of lists
0.0600000619888 396 328 0.500001351171 matrix

So numeric is the clear winner here. However, when psyco is used
http://psyco.sourceforge.net/

time row col value
0.0400000810623 396 328 0.500001351171 single list
0.0499999523163 396 328 0.500001351171 list of lists
0.0600000619888 396 328 0.500001351171 matrix

psyco actually wins here but more interesting is that the timing order
is reversed! I'll have to do a lot more trials to generate good
profiling, but I think the results are pretty good here.

Here is the test code, feel free to rip it apart as you will.
import Numeric, random, time

# create data
data = [random.random() for x in xrange(600*400)]

# test list of list
lists = []
for i in range(400): # rows
lists.append(data[i*600:(i+1)*600]) # columns

matrix = Numeric.array(lists)

def testarray(data=data):
t1 = time.time()
# straight forward approach
mn = 10e99
index = None
lx = -10e99 # keep track of the best found
hx = 10e99 # approaching from the bottom and top

input = 0.5
def euclidean(x, input=input):
return abs(x-input)

for i,x in enumerate(data):
if x < lx or x > hx: # can we reject this point?
continue
y = euclidean(x)
if y < mn:
if x < input:
lx = x
else:
hx = x
mn = y
index = i

t2 = time.time()
row = index/600
col = index%600
print t2-t1, row, col, data[index], "single list"

def testlists(list=list):
mn = 10e99
index = None
input = 0.5
lx = -10e99 # keep track of the best found
hx = 10e99 # approaching from the bottom and top

def euclidean(x, input=input):
return abs(x-input)
t1 = time.time()
for r, row in enumerate(lists):
for c, x in enumerate(row):
if x < lx or x > hx: # can we reject this point?
continue
y = euclidean(x)
if y < mn:
if x < input:
lx = x
else:
hx = x
mn = y
index = r,c

t2 = time.time()
r,c = index
print t2-t1, r,c, lists[r][c], "list of lists"

def testmatrix(matrix=matrix):
t1 = time.time()
mins = abs(matrix-0.5)
mn = 10e99
index = None
for r,c in enumerate(Numeric.argmin(mins)):
y = mins[r,c]
if y < mn:
mn = y
index = r,c
t2 = time.time()
r,c = index
print t2-t1, r,c, lists[r][c], "matrix"

# list of lists approach
testarray()
testlists()
testmatrix()

print "*"*44
print "try with psyco"
try:
import psyco
psyco.full()
testarray()
testlists()
testmatrix()
except ImportError:
print "psyco not available"
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: fartsniff | last post by:
i found this code out in the ng, and its seems long and clunky, i am still experimenting with preg_match and _replace, but the syntax is a bit confusing. it seems i always misplace or mistype...
0
by: R U B'n | last post by:
Hi everyone, I have to make a (case-insensitive) search from a form with only one search string, e.g. "Doe Peters english California", which will search in several fields of my table for each...
19
by: Tom Deco | last post by:
Hi, I'm trying to use a regular expression to match a string containing a # (basically i'm looking for #include ...) I don't seem to manage to write a regular expression that matches this. ...
4
by: Johnny Lee | last post by:
Hi, I've met a problem in match a regular expression in python. Hope any of you could help me. Here are the details: I have many tags like this: xxx<a href="http://xxx.xxx.xxx" xxx>xxx xxx<a...
19
by: Johnny Google | last post by:
Here is an example of the type of data from a file I will have: Apple,4322,3435,4653,6543,4652 Banana,6934,5423,6753,6531 Carrot,3454,4534,3434,1111,9120,5453 Cheese,4411,5522,6622,6641 The...
2
by: ShadowOfTheBeast | last post by:
Hello All, i have been cracking my skull on how to seach a particular reg exp match within a string? it does not seem to happen except the whole arbitrary string is the exact match of the regular...
8
by: Art | last post by:
Hi folks, I'm writing a traditional desktop app using VB.NET and am stumbling over what seems like a very basic question: My app does not need to be connected to a server or another computer....
5
by: ngocviet | last post by:
I have a so big DB, It's about 10-100GB of text. Now I want to create a searching function but It take too long when I use LIKE from MySQL Anyone know the best sollution? Can I use Google Desktop...
8
by: Guy | last post by:
Is there a better way to search identical elements in a sorted array list than the following: iIndex = Array.BinarySearch( m_Array, 0, m_Array.Count, aSearchedObject ); aFoundObject= m_Array;...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.