By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,101 Members | 1,416 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,101 IT Pros & Developers. It's quick & easy.

Image comparison tool

P: n/a
I am looking for a way to take a large number of images and find
matches among them. These images may not be exact replicas. Images
may have been resized, cropped, faded, color corrected, etc.

Approach 1
Programmatically extract the information (such as Eigen Vectors/Eigen
Spaces) and store them in a database. Then apply a comparison
algorithm between the database entries to find like images.

Approach 2
Store all the images and run the comparison tool against the individual
images

Approach 1 is preferred since I believe the comparison could be
performed much more rapidly once the comparison information has been
extracted. However, this requires a library capable of independent
extraction and comparison.

Does anybody have and suggestions for a dll library that can perform
the above stated tasks?

Any suggestions on how to store the information in the database? More
specifically, what would the schema look like.

Any help is appreciated

Jul 21 '06 #1
Share this Question
Share on Google+
7 Replies


P: n/a
bcutting,
There is an author at codeproject.com who has at least one article on motion
detection algorithms that should be very close.
Peter

--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com


"bc******@gmail.com" wrote:
I am looking for a way to take a large number of images and find
matches among them. These images may not be exact replicas. Images
may have been resized, cropped, faded, color corrected, etc.

Approach 1
Programmatically extract the information (such as Eigen Vectors/Eigen
Spaces) and store them in a database. Then apply a comparison
algorithm between the database entries to find like images.

Approach 2
Store all the images and run the comparison tool against the individual
images

Approach 1 is preferred since I believe the comparison could be
performed much more rapidly once the comparison information has been
extracted. However, this requires a library capable of independent
extraction and comparison.

Does anybody have and suggestions for a dll library that can perform
the above stated tasks?

Any suggestions on how to store the information in the database? More
specifically, what would the schema look like.

Any help is appreciated

Jul 21 '06 #2

P: n/a
If you want to extract shape info or some other kind of metric, you must
first know what exactly. E.g. for face recognition, eigenvalues are used,
for other types of recognition feature points are used, etc.

If you have no a-priori info, you can still compare the thumbnails (create
mini-thumbnails of the same size for each image), and use something like a
Hamming distance to find the matches. If you convert them all to the same
normalized grayscale images, you can also detect slight colour mismatches.

I wrote exactly this quite a few years back, and it is still available in
the form of a shareware image browser having a special "Similar Images"
filter. It works quite well if one wants to find similar images in large
databases (up to e.g. 20.000 files).

Info here:
http://www.abc-view.com/articles/article3.html

I termed the information I store "image metrics", well they are nothing more
than a smart wavelet-like way of storing the minithumbnails. Since the
metrics information is small (couple of hundred bytes each), they can be
kept in memory which speeds up the comparison process enormously.

The procedure to find duplicates consists of:
1. Calculate an image metric for each image
2. Compare the list, using hamming distance with a smart similarity sorter
3. Output the list to a thumbnail viewer, sorted by similarity, showing only
similar images as colour-coded groups.

#1 can take quite some time, but the image database can be stored, so this
only needs to be done once.

Hope that helps,

Nils Haeck
www.simdesign.nl
<bc******@gmail.comschreef in bericht
news:11**********************@h48g2000cwc.googlegr oups.com...
>I am looking for a way to take a large number of images and find
matches among them. These images may not be exact replicas. Images
may have been resized, cropped, faded, color corrected, etc.

Approach 1
Programmatically extract the information (such as Eigen Vectors/Eigen
Spaces) and store them in a database. Then apply a comparison
algorithm between the database entries to find like images.

Approach 2
Store all the images and run the comparison tool against the individual
images

Approach 1 is preferred since I believe the comparison could be
performed much more rapidly once the comparison information has been
extracted. However, this requires a library capable of independent
extraction and comparison.

Does anybody have and suggestions for a dll library that can perform
the above stated tasks?

Any suggestions on how to store the information in the database? More
specifically, what would the schema look like.

Any help is appreciated

Jul 21 '06 #3

P: n/a
Bob
Nils

That sounds interesting. Have you published anything on the algorithm
for computing the similarity metric?

Bob
Nils wrote:
I wrote exactly this quite a few years back, and it is still available in
the form of a shareware image browser having a special "Similar Images"
filter. It works quite well if one wants to find similar images in large
databases (up to e.g. 20.000 files).

Info here:
http://www.abc-view.com/articles/article3.html
Jul 21 '06 #4

P: n/a
On 21 Jul 2006 09:51:10 -0700, bc******@gmail.com wrote:
>I am looking for a way to take a large number of images and find
matches among them. These images may not be exact replicas. Images
may have been resized, cropped, faded, color corrected, etc.

Approach 1
Programmatically extract the information (such as Eigen Vectors/Eigen
Spaces) and store them in a database. Then apply a comparison
algorithm between the database entries to find like images.
There is a program that does this, called DupDetector. I've used it,
and it's pretty good at finding similar images. It's freeware.

However, it is (was) made and distributed by prismatic software, at
www.prismatic.com, and when you go there now, you get a page saying
closed as of 5/28/06. You might be able to find it still someplace
else by googling.

Also, it's an executable, not a library that you can use, nor was
source code released as I recall, so it may not be of any use to you.

Terry
Jul 21 '06 #5

P: n/a
Thumbs Plus (shareware, or it used to be) has done this for years.
It's reasonably effective, and maybe they will share their secrets..

Jul 22 '06 #6

P: n/a
Hi Bob,

No I haven't published anything on the algorithm except for the brief
description on how the software works on the webpage mentioned. However,
it's not rocket science :)

People that have to do a comparison can simply try out the software (30-use
functional trial, sales price $29). Software engineers/developers wanting to
make use of the software can always buy the source code from my company. I
have sold it already to a few companies creating image cataloguers and image
processing software. I think anyone could write such a thing themself,
however it might make sense to buy it to save yourself a few weeks of work.

Here is the basic idea with these assumptions:

a) We only compare the grayscale version
b) We are not interested in aspect ratio

1. Start with an image of dimensions WxH
2. Scale down into thumbnail of 16x16 pixels, only grayscale, 256 levels
3. Normalize the thumbnail (so it contains values 0..255 instead of eg.
25..230)
4. Create a subthumbnail of 8x8, 4x4, 2x2 and 1x1
5. Store these thumbnails such that 1x1 comes first, then 2x2, etc

Now the comparison. Realise that when comparing two images, we are only
interested in images that are close. So if there's a big difference between
them, we can abort the comparison quite soon.

Comparing two images with metrics A and B, metric consisting of

A = {a1..aN}, N = 341 (1 + 4 + 16 + 64 + 256 = 341)
B = {b1..bN}, N = 341

Weighting: w1..wN, where

w1 = 256
w2 ..w5 = 64
w6 ..w21 = 16
w21..w85 = 4
w21..w277 = 1

comparison value between A and B is

Cp = sum_i,i=1..p{max(0, abs(ai - bi) - 1)) * wi}, p can be 1..N

Note the term max(0, abs(ai - bi) - 1): We compare two pixels, and use the
"difference - 1", because often a difference of 1 occurs through
resampling/normalization.

When comparing two metrics we define a threshold T, so we can stop
comparison if Cp T, and just store the value Cp (p <= N) up to that point.
If T is low enough we often can stop comparison after just comparing one
byte!

Now.. when comparing a large list of metrics, one can simply sort them, then
take one as start S, put a sliding window {-T/256, T/256} on the sorted list
around S and compare all the metrics in that group with S, to find any
matching metrics to S.

this way we can build a new list, beginning with S, then the one closest
matching that one, then find again the closest match to this one, etc. Each
time we remove the metric from the original list. In the end we have a
sorted list of images, by similarity.

The algorithm is still O(N^2) but nevertheless cuts out a large portion of
work compared to the full N^2 algorithm.

There are some specialities not mentioned here (for colours, for aspect
ratio, etc), but this is the general principle.

Note: I looked into a lot of different techniques (Fourier-transformations,
Gabor wavelets, feature point extraction, etc) but more complex is not
always better. In this case, simplicity seems to favour.

Hope that helps,

Nils Haeck
www.simdesign.nl
"Bob" <ra******@spambob.netschreef in bericht
news:11**********************@b28g2000cwb.googlegr oups.com...
Nils

That sounds interesting. Have you published anything on the algorithm
for computing the similarity metric?

Bob
Nils wrote:
>I wrote exactly this quite a few years back, and it is still available in
the form of a shareware image browser having a special "Similar Images"
filter. It works quite well if one wants to find similar images in large
databases (up to e.g. 20.000 files).

Info here:
http://www.abc-view.com/articles/article3.html

Jul 22 '06 #7

P: n/a
Bob
Hi Nils

Yes that does help. Thanks for the explanation.

Bob

Jul 22 '06 #8

This discussion thread is closed

Replies have been disabled for this discussion.