473,769 Members | 1,736 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Best match searching

Hi everyone.
I was wondering if anyone might be able to help me out here. I'm currently
looking to find the quickest way to find a best fit match in a large array.
My problem is that I have an array of, say, 600*400, which contains a value
at each point, and I need to find the value in that array which is closest
to the input value. It's basically some euclidean distances that I've
calculated, and I need to be able to find the best matches over a large
number of input values, so I was looking for a speedy solution to this.
The array is implemented simply using a list of lists. The only solution I
could think of is to use some for statements, but that seems to take a
while, even for just one search.
Any hints or tips would be really helpful. :-)

Daniel

_______________ _______________ _______________ _______________ _____
Express yourself with cool new emoticons http://www.msn.co.uk/specials/myemo
Jul 18 '05 #1
5 2548
Daniel Pryde wrote:
Hi everyone.
I was wondering if anyone might be able to help me out here. I'm
currently looking to find the quickest way to find a best fit match in a
large array.
My problem is that I have an array of, say, 600*400, which contains a
value at each point, and I need to find the value in that array which is
closest to the input value. It's basically some euclidean distances that
I've calculated, and I need to be able to find the best matches over a
large number of input values, so I was looking for a speedy solution to
this.
The array is implemented simply using a list of lists. The only solution
I could think of is to use some for statements, but that seems to take a
while, even for just one search.
Any hints or tips would be really helpful. :-)

Daniel

_______________ _______________ _______________ _______________ _____
Express yourself with cool new emoticons
http://www.msn.co.uk/specials/myemo

I'm not an expert, but I'd use numarray module. The most simple
example can look like:

a = numarray.array( ....) # An array to look in
value = x # Value to look for
c = a.copy() # Create a copy
print numarray.abs(a. copy() - value).argmin() # Finds minimal absolute
difference

If performance of numarray is still too slow, I'd resort to simple
hasing scheme: you can split your inputs in reasonable ranges and look
for closest match only in this ranges.

hth,
anton.
Jul 18 '05 #2
On Mon, 29 Dec 2003 12:14:42 +0000, rumours say that "Daniel Pryde"
<ab********@hot mail.com> might have written:
Hi everyone.
I was wondering if anyone might be able to help me out here. I'm currently
looking to find the quickest way to find a best fit match in a large array. [snip]It's basically some euclidean distances that I've
calculated, and I need to be able to find the best matches over a large
number of input values, so I was looking for a speedy solution to this.


These might be helpful:

http://groups.google.com.gr/groups?s....austin.rr.com

or

http://groups.google.com.gr/groups?t...hon.org&rnum=1

groups.google.c om is your friend :)
--
TZOTZIOY, I speak England very best,
Ils sont fous ces Redmontains! --Harddix
Jul 18 '05 #3
"Daniel Pryde" <ab********@hot mail.com> wrote in message
news:ma******** *************** ************@py thon.org...
Hi everyone.
I was wondering if anyone might be able to help me out here. I'm currently
looking to find the quickest way to find a best fit match in a large array. My problem is that I have an array of, say, 600*400, which contains a value at each point, and I need to find the value in that array which is closest
to the input value. <snip> Any hints or tips would be really helpful. :-)

Daniel

I think inverting your matrix to a dictionary may be more "Pythonic". The
included snippet constructs an array of values, then creates a dictionary of
which values are located at which positions in the array (also handles
possibility of duplicates in the array). If you change ROWS and COLS to
your problem values of 400 and 600, the matrix-to-dictionary inversion takes
a bit of time - but the searches are blindingly fast.

-- Paul
# mat2Dict.py
from random import random
import pprint

ROWS = 4
COLS = 6

# create matrix
print "creating data matrix"
matrix = [ [ random() for i in range(COLS) ] for j in range(ROWS) ]

print "create dictionary of values"
valdict = {}
for i,row in enumerate(matri x):
for j,elem in enumerate(row):
if valdict.has_key (elem):
valdict[elem].append( (i,j) )
else:
valdict[elem] = [ (i,j) ]

# uncomment this line to view the dictionary of values
# pprint.pprint( valdict )

matvals = valdict.keys()
matvals.sort()
print len(matvals),"u nique values in matrix"

def findClosestVal( searchval ):
if searchval <= matvals[0]:
return matvals[0]
if searchval >= matvals[-1]:
return matvals[-1]

# do binary search - see
http://www.tbray.org/ongoing/When/20...3/03/22/Binary
hi = len(matvals)
lo = -1
while (hi-lo) > 1:
loc = ( hi + lo ) / 2
if matvals[loc] > searchval:
hi = loc
else:
lo = loc

if abs(searchval-matvals[lo]) < abs(searchval-matvals[hi]):
return matvals[lo]
else:
return matvals[hi]

# look up some values
for v in range(10):
vv = v/10.
closestVal = findClosestVal( vv )
print vv, "->", closestVal, "occurring at", valdict[closestVal]

Jul 18 '05 #4

"Daniel Pryde" <ab********@hot mail.com> wrote in message
news:ma******** *************** ************@py thon.org...
Hi everyone.
I was wondering if anyone might be able to help me out here. I'm currently looking to find the quickest way to find a best fit match in a large array. My problem is that I have an array of, say, 600*400, which contains a value at each point, and I need to find the value in that array which is closest to the input value. It's basically some euclidean distances that I've
calculated, and I need to be able to find the best matches over a large
number of input values, so I was looking for a speedy solution to this.
The array is implemented simply using a list of lists. The only solution I could think of is to use some for statements, but that seems to take a
while, even for just one search.
Any hints or tips would be really helpful. :-)


If the values in the array have any structure or constraints, try to
exploit that. If they are ordered in both directions, use binary search on
one axis and then trace the zigzag frontier between lower and height
values. If there are just a 'few' values (as in 0-255), then a dict might
work, though changing values would complicate this.

Terry J. Reedy
Jul 18 '05 #5
Daniel Pryde wrote:
The array is implemented simply using a list of lists. The only solution
I could think of is to use some for statements, but that seems to take a
while, even for just one search.

Ahh this brings back memories of my image processing days :)

I've implemented a couple of approaches.
1) using just one list to hold the data
2) using a list of lists
3) using numeric

I've optimized the search function for 1 and 2 a little bit by defining
the scoring function on the fly like this and binding input during the
function definition:

input = 0.5
def euclidean(x, input=input): return abs(x-input)

We can also use a bounding box technique such that if a minimum x is
found and the x < input then we can reject all points less than x
without having to perform the euclidean distance. Conversely if x >
input then we can reject all points greater than x. This really only
works if x is a single value and not a vector.

Don't forget that when looking for minimums using a euclidean distance
you don't have to perform the square root. (x1*x1+x2*x2) will find the
correct minimum as well as opposed to the more time consuming
math.sqrt(x1*x1 +x2*x2)

Here are the results looking for data closest to an input value of 0.5
using a pretty loaded pentium 4 2ghz. The result is for a single search
against a random array of 600x400.

time row col value
0.130000114441 396 328 0.500001351171 single list
0.119999885559 396 328 0.500001351171 list of lists
0.0600000619888 396 328 0.500001351171 matrix

So numeric is the clear winner here. However, when psyco is used
http://psyco.sourceforge.net/

time row col value
0.0400000810623 396 328 0.500001351171 single list
0.0499999523163 396 328 0.500001351171 list of lists
0.0600000619888 396 328 0.500001351171 matrix

psyco actually wins here but more interesting is that the timing order
is reversed! I'll have to do a lot more trials to generate good
profiling, but I think the results are pretty good here.

Here is the test code, feel free to rip it apart as you will.
import Numeric, random, time

# create data
data = [random.random() for x in xrange(600*400)]

# test list of list
lists = []
for i in range(400): # rows
lists.append(da ta[i*600:(i+1)*600]) # columns

matrix = Numeric.array(l ists)

def testarray(data= data):
t1 = time.time()
# straight forward approach
mn = 10e99
index = None
lx = -10e99 # keep track of the best found
hx = 10e99 # approaching from the bottom and top

input = 0.5
def euclidean(x, input=input):
return abs(x-input)

for i,x in enumerate(data) :
if x < lx or x > hx: # can we reject this point?
continue
y = euclidean(x)
if y < mn:
if x < input:
lx = x
else:
hx = x
mn = y
index = i

t2 = time.time()
row = index/600
col = index%600
print t2-t1, row, col, data[index], "single list"

def testlists(list= list):
mn = 10e99
index = None
input = 0.5
lx = -10e99 # keep track of the best found
hx = 10e99 # approaching from the bottom and top

def euclidean(x, input=input):
return abs(x-input)
t1 = time.time()
for r, row in enumerate(lists ):
for c, x in enumerate(row):
if x < lx or x > hx: # can we reject this point?
continue
y = euclidean(x)
if y < mn:
if x < input:
lx = x
else:
hx = x
mn = y
index = r,c

t2 = time.time()
r,c = index
print t2-t1, r,c, lists[r][c], "list of lists"

def testmatrix(matr ix=matrix):
t1 = time.time()
mins = abs(matrix-0.5)
mn = 10e99
index = None
for r,c in enumerate(Numer ic.argmin(mins) ):
y = mins[r,c]
if y < mn:
mn = y
index = r,c
t2 = time.time()
r,c = index
print t2-t1, r,c, lists[r][c], "matrix"

# list of lists approach
testarray()
testlists()
testmatrix()

print "*"*44
print "try with psyco"
try:
import psyco
psyco.full()
testarray()
testlists()
testmatrix()
except ImportError:
print "psyco not available"
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
4075
by: fartsniff | last post by:
i found this code out in the ng, and its seems long and clunky, i am still experimenting with preg_match and _replace, but the syntax is a bit confusing. it seems i always misplace or mistype something. this is the code that i am trying to replace with better, optimized code. i would like to make a preg_match and then preg_replace for this, and if a match is found, just removing it. i started with this:
0
2189
by: R U B'n | last post by:
Hi everyone, I have to make a (case-insensitive) search from a form with only one search string, e.g. "Doe Peters english California", which will search in several fields of my table for each word. My fields that are searchable by this query are: nr smallint(5) company varchar(100) lastname(100)
19
2159
by: Tom Deco | last post by:
Hi, I'm trying to use a regular expression to match a string containing a # (basically i'm looking for #include ...) I don't seem to manage to write a regular expression that matches this. My (probably to naive) approach is: p = re.compile(r'\b#include\b) I also tried p = re.compile(r'\b\#include\b) in a futile attempt to use a backslash as escape character before the #
4
2150
by: Johnny Lee | last post by:
Hi, I've met a problem in match a regular expression in python. Hope any of you could help me. Here are the details: I have many tags like this: xxx<a href="http://xxx.xxx.xxx" xxx>xxx xxx<a href="wap://xxx.xxx.xxx" xxx>xxx xxx<a href="http://xxx.xxx.xxx" xxx>xxx ..... And I want to find all the "http://xxx.xxx.xxx" out, so I do it
19
3220
by: Johnny Google | last post by:
Here is an example of the type of data from a file I will have: Apple,4322,3435,4653,6543,4652 Banana,6934,5423,6753,6531 Carrot,3454,4534,3434,1111,9120,5453 Cheese,4411,5522,6622,6641 The first position is the info (the product) I want to retreive for the corresponding code. Assuming that the codes are unique for each product and all code data is on one line.
2
1898
by: ShadowOfTheBeast | last post by:
Hello All, i have been cracking my skull on how to seach a particular reg exp match within a string? it does not seem to happen except the whole arbitrary string is the exact match of the regular expression specified....but the question is how do i search for and get all or a number of specified matches within an arbitrary long string. e.g for example this is my reg exp i am using: @"^-*-$" which will match any string within two...
8
3771
by: Art | last post by:
Hi folks, I'm writing a traditional desktop app using VB.NET and am stumbling over what seems like a very basic question: My app does not need to be connected to a server or another computer. It just runs locally. So what is the best way to store user-entered data? The app is like an address book, where users can enter contacts and save the data.
5
2834
by: ngocviet | last post by:
I have a so big DB, It's about 10-100GB of text. Now I want to create a searching function but It take too long when I use LIKE from MySQL Anyone know the best sollution? Can I use Google Desktop search or an others? Please help!
8
2596
by: Guy | last post by:
Is there a better way to search identical elements in a sorted array list than the following: iIndex = Array.BinarySearch( m_Array, 0, m_Array.Count, aSearchedObject ); aFoundObject= m_Array; m_ResultArray.Add ( aFoundObject);
0
9589
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10047
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9863
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8872
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5304
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5447
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3962
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3563
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2815
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.