Hi Bob,
No I haven't published anything on the algorithm except for the brief
description on how the software works on the webpage mentioned. However,
it's not rocket science :)
People that have to do a comparison can simply try out the software (30-use
functional trial, sales price $29). Software engineers/developers wanting to
make use of the software can always buy the source code from my company. I
have sold it already to a few companies creating image cataloguers and image
processing software. I think anyone could write such a thing themself,
however it might make sense to buy it to save yourself a few weeks of work.
Here is the basic idea with these assumptions:
a) We only compare the grayscale version
b) We are not interested in aspect ratio
1. Start with an image of dimensions WxH
2. Scale down into thumbnail of 16x16 pixels, only grayscale, 256 levels
3. Normalize the thumbnail (so it contains values 0..255 instead of eg.
25..230)
4. Create a subthumbnail of 8x8, 4x4, 2x2 and 1x1
5. Store these thumbnails such that 1x1 comes first, then 2x2, etc
Now the comparison. Realise that when comparing two images, we are only
interested in images that are close. So if there's a big difference between
them, we can abort the comparison quite soon.
Comparing two images with metrics A and B, metric consisting of
A = {a1..aN}, N = 341 (1 + 4 + 16 + 64 + 256 = 341)
B = {b1..bN}, N = 341
Weighting: w1..wN, where
w1 = 256
w2 ..w5 = 64
w6 ..w21 = 16
w21..w85 = 4
w21..w277 = 1
comparison value between A and B is
Cp = sum_i,i=1..p{max(0, abs(ai - bi) - 1)) * wi}, p can be 1..N
Note the term max(0, abs(ai - bi) - 1): We compare two pixels, and use the
"difference - 1", because often a difference of 1 occurs through
resampling/normalization.
When comparing two metrics we define a threshold T, so we can stop
comparison if Cp T, and just store the value Cp (p <= N) up to that point.
If T is low enough we often can stop comparison after just comparing one
byte!
Now.. when comparing a large list of metrics, one can simply sort them, then
take one as start S, put a sliding window {-T/256, T/256} on the sorted list
around S and compare all the metrics in that group with S, to find any
matching metrics to S.
this way we can build a new list, beginning with S, then the one closest
matching that one, then find again the closest match to this one, etc. Each
time we remove the metric from the original list. In the end we have a
sorted list of images, by similarity.
The algorithm is still O(N^2) but nevertheless cuts out a large portion of
work compared to the full N^2 algorithm.
There are some specialities not mentioned here (for colours, for aspect
ratio, etc), but this is the general principle.
Note: I looked into a lot of different techniques (Fourier-transformations,
Gabor wavelets, feature point extraction, etc) but more complex is not
always better. In this case, simplicity seems to favour.
Hope that helps,
Nils Haeck
www.simdesign.nl
"Bob" <ra******@spambob.netschreef in bericht
news:11**********************@b28g2000cwb.googlegr oups.com...
Nils
That sounds interesting. Have you published anything on the algorithm
for computing the similarity metric?
Bob
Nils wrote:
>I wrote exactly this quite a few years back, and it is still available in
the form of a shareware image browser having a special "Similar Images"
filter. It works quite well if one wants to find similar images in large
databases (up to e.g. 20.000 files).
Info here:
http://www.abc-view.com/articles/article3.html