Hi Bob,

No I haven't published anything on the algorithm except for the brief

description on how the software works on the webpage mentioned. However,

it's not rocket science :)

People that have to do a comparison can simply try out the software (30-use

functional trial, sales price $29). Software engineers/developers wanting to

make use of the software can always buy the source code from my company. I

have sold it already to a few companies creating image cataloguers and image

processing software. I think anyone could write such a thing themself,

however it might make sense to buy it to save yourself a few weeks of work.

Here is the basic idea with these assumptions:

a) We only compare the grayscale version

b) We are not interested in aspect ratio

1. Start with an image of dimensions WxH

2. Scale down into thumbnail of 16x16 pixels, only grayscale, 256 levels

3. Normalize the thumbnail (so it contains values 0..255 instead of eg.

25..230)

4. Create a subthumbnail of 8x8, 4x4, 2x2 and 1x1

5. Store these thumbnails such that 1x1 comes first, then 2x2, etc

Now the comparison. Realise that when comparing two images, we are only

interested in images that are close. So if there's a big difference between

them, we can abort the comparison quite soon.

Comparing two images with metrics A and B, metric consisting of

A = {a1..aN}, N = 341 (1 + 4 + 16 + 64 + 256 = 341)

B = {b1..bN}, N = 341

Weighting: w1..wN, where

w1 = 256

w2 ..w5 = 64

w6 ..w21 = 16

w21..w85 = 4

w21..w277 = 1

comparison value between A and B is

Cp = sum_i,i=1..p{ma x(0, abs(ai - bi) - 1)) * wi}, p can be 1..N

Note the term max(0, abs(ai - bi) - 1): We compare two pixels, and use the

"difference - 1", because often a difference of 1 occurs through

resampling/normalization.

When comparing two metrics we define a threshold T, so we can stop

comparison if Cp T, and just store the value Cp (p <= N) up to that point.

If T is low enough we often can stop comparison after just comparing one

byte!

Now.. when comparing a large list of metrics, one can simply sort them, then

take one as start S, put a sliding window {-T/256, T/256} on the sorted list

around S and compare all the metrics in that group with S, to find any

matching metrics to S.

this way we can build a new list, beginning with S, then the one closest

matching that one, then find again the closest match to this one, etc. Each

time we remove the metric from the original list. In the end we have a

sorted list of images, by similarity.

The algorithm is still O(N^2) but nevertheless cuts out a large portion of

work compared to the full N^2 algorithm.

There are some specialities not mentioned here (for colours, for aspect

ratio, etc), but this is the general principle.

Note: I looked into a lot of different techniques (Fourier-transformations ,

Gabor wavelets, feature point extraction, etc) but more complex is not

always better. In this case, simplicity seems to favour.

Hope that helps,

Nils Haeck

www.simdesign.nl
"Bob" <ra******@spamb ob.netschreef in bericht

news:11******** **************@ b28g2000cwb.goo glegroups.com.. .

Nils

That sounds interesting. Have you published anything on the algorithm

for computing the similarity metric?

Bob

Nils wrote:
>I wrote exactly this quite a few years back, and it is still available in

the form of a shareware image browser having a special "Similar Images"

filter. It works quite well if one wants to find similar images in large

databases (up to e.g. 20.000 files).

Info here:

http://www.abc-view.com/articles/article3.html