473,398 Members | 2,404 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

Image comparison tool

I am looking for a way to take a large number of images and find
matches among them. These images may not be exact replicas. Images
may have been resized, cropped, faded, color corrected, etc.

Approach 1
Programmatically extract the information (such as Eigen Vectors/Eigen
Spaces) and store them in a database. Then apply a comparison
algorithm between the database entries to find like images.

Approach 2
Store all the images and run the comparison tool against the individual
images

Approach 1 is preferred since I believe the comparison could be
performed much more rapidly once the comparison information has been
extracted. However, this requires a library capable of independent
extraction and comparison.

Does anybody have and suggestions for a dll library that can perform
the above stated tasks?

Any suggestions on how to store the information in the database? More
specifically, what would the schema look like.

Any help is appreciated

Jul 21 '06 #1
7 13508
bcutting,
There is an author at codeproject.com who has at least one article on motion
detection algorithms that should be very close.
Peter

--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com


"bc******@gmail.com" wrote:
I am looking for a way to take a large number of images and find
matches among them. These images may not be exact replicas. Images
may have been resized, cropped, faded, color corrected, etc.

Approach 1
Programmatically extract the information (such as Eigen Vectors/Eigen
Spaces) and store them in a database. Then apply a comparison
algorithm between the database entries to find like images.

Approach 2
Store all the images and run the comparison tool against the individual
images

Approach 1 is preferred since I believe the comparison could be
performed much more rapidly once the comparison information has been
extracted. However, this requires a library capable of independent
extraction and comparison.

Does anybody have and suggestions for a dll library that can perform
the above stated tasks?

Any suggestions on how to store the information in the database? More
specifically, what would the schema look like.

Any help is appreciated

Jul 21 '06 #2
If you want to extract shape info or some other kind of metric, you must
first know what exactly. E.g. for face recognition, eigenvalues are used,
for other types of recognition feature points are used, etc.

If you have no a-priori info, you can still compare the thumbnails (create
mini-thumbnails of the same size for each image), and use something like a
Hamming distance to find the matches. If you convert them all to the same
normalized grayscale images, you can also detect slight colour mismatches.

I wrote exactly this quite a few years back, and it is still available in
the form of a shareware image browser having a special "Similar Images"
filter. It works quite well if one wants to find similar images in large
databases (up to e.g. 20.000 files).

Info here:
http://www.abc-view.com/articles/article3.html

I termed the information I store "image metrics", well they are nothing more
than a smart wavelet-like way of storing the minithumbnails. Since the
metrics information is small (couple of hundred bytes each), they can be
kept in memory which speeds up the comparison process enormously.

The procedure to find duplicates consists of:
1. Calculate an image metric for each image
2. Compare the list, using hamming distance with a smart similarity sorter
3. Output the list to a thumbnail viewer, sorted by similarity, showing only
similar images as colour-coded groups.

#1 can take quite some time, but the image database can be stored, so this
only needs to be done once.

Hope that helps,

Nils Haeck
www.simdesign.nl
<bc******@gmail.comschreef in bericht
news:11**********************@h48g2000cwc.googlegr oups.com...
>I am looking for a way to take a large number of images and find
matches among them. These images may not be exact replicas. Images
may have been resized, cropped, faded, color corrected, etc.

Approach 1
Programmatically extract the information (such as Eigen Vectors/Eigen
Spaces) and store them in a database. Then apply a comparison
algorithm between the database entries to find like images.

Approach 2
Store all the images and run the comparison tool against the individual
images

Approach 1 is preferred since I believe the comparison could be
performed much more rapidly once the comparison information has been
extracted. However, this requires a library capable of independent
extraction and comparison.

Does anybody have and suggestions for a dll library that can perform
the above stated tasks?

Any suggestions on how to store the information in the database? More
specifically, what would the schema look like.

Any help is appreciated

Jul 21 '06 #3
Bob
Nils

That sounds interesting. Have you published anything on the algorithm
for computing the similarity metric?

Bob
Nils wrote:
I wrote exactly this quite a few years back, and it is still available in
the form of a shareware image browser having a special "Similar Images"
filter. It works quite well if one wants to find similar images in large
databases (up to e.g. 20.000 files).

Info here:
http://www.abc-view.com/articles/article3.html
Jul 21 '06 #4
On 21 Jul 2006 09:51:10 -0700, bc******@gmail.com wrote:
>I am looking for a way to take a large number of images and find
matches among them. These images may not be exact replicas. Images
may have been resized, cropped, faded, color corrected, etc.

Approach 1
Programmatically extract the information (such as Eigen Vectors/Eigen
Spaces) and store them in a database. Then apply a comparison
algorithm between the database entries to find like images.
There is a program that does this, called DupDetector. I've used it,
and it's pretty good at finding similar images. It's freeware.

However, it is (was) made and distributed by prismatic software, at
www.prismatic.com, and when you go there now, you get a page saying
closed as of 5/28/06. You might be able to find it still someplace
else by googling.

Also, it's an executable, not a library that you can use, nor was
source code released as I recall, so it may not be of any use to you.

Terry
Jul 21 '06 #5
Thumbs Plus (shareware, or it used to be) has done this for years.
It's reasonably effective, and maybe they will share their secrets..

Jul 22 '06 #6
Hi Bob,

No I haven't published anything on the algorithm except for the brief
description on how the software works on the webpage mentioned. However,
it's not rocket science :)

People that have to do a comparison can simply try out the software (30-use
functional trial, sales price $29). Software engineers/developers wanting to
make use of the software can always buy the source code from my company. I
have sold it already to a few companies creating image cataloguers and image
processing software. I think anyone could write such a thing themself,
however it might make sense to buy it to save yourself a few weeks of work.

Here is the basic idea with these assumptions:

a) We only compare the grayscale version
b) We are not interested in aspect ratio

1. Start with an image of dimensions WxH
2. Scale down into thumbnail of 16x16 pixels, only grayscale, 256 levels
3. Normalize the thumbnail (so it contains values 0..255 instead of eg.
25..230)
4. Create a subthumbnail of 8x8, 4x4, 2x2 and 1x1
5. Store these thumbnails such that 1x1 comes first, then 2x2, etc

Now the comparison. Realise that when comparing two images, we are only
interested in images that are close. So if there's a big difference between
them, we can abort the comparison quite soon.

Comparing two images with metrics A and B, metric consisting of

A = {a1..aN}, N = 341 (1 + 4 + 16 + 64 + 256 = 341)
B = {b1..bN}, N = 341

Weighting: w1..wN, where

w1 = 256
w2 ..w5 = 64
w6 ..w21 = 16
w21..w85 = 4
w21..w277 = 1

comparison value between A and B is

Cp = sum_i,i=1..p{max(0, abs(ai - bi) - 1)) * wi}, p can be 1..N

Note the term max(0, abs(ai - bi) - 1): We compare two pixels, and use the
"difference - 1", because often a difference of 1 occurs through
resampling/normalization.

When comparing two metrics we define a threshold T, so we can stop
comparison if Cp T, and just store the value Cp (p <= N) up to that point.
If T is low enough we often can stop comparison after just comparing one
byte!

Now.. when comparing a large list of metrics, one can simply sort them, then
take one as start S, put a sliding window {-T/256, T/256} on the sorted list
around S and compare all the metrics in that group with S, to find any
matching metrics to S.

this way we can build a new list, beginning with S, then the one closest
matching that one, then find again the closest match to this one, etc. Each
time we remove the metric from the original list. In the end we have a
sorted list of images, by similarity.

The algorithm is still O(N^2) but nevertheless cuts out a large portion of
work compared to the full N^2 algorithm.

There are some specialities not mentioned here (for colours, for aspect
ratio, etc), but this is the general principle.

Note: I looked into a lot of different techniques (Fourier-transformations,
Gabor wavelets, feature point extraction, etc) but more complex is not
always better. In this case, simplicity seems to favour.

Hope that helps,

Nils Haeck
www.simdesign.nl
"Bob" <ra******@spambob.netschreef in bericht
news:11**********************@b28g2000cwb.googlegr oups.com...
Nils

That sounds interesting. Have you published anything on the algorithm
for computing the similarity metric?

Bob
Nils wrote:
>I wrote exactly this quite a few years back, and it is still available in
the form of a shareware image browser having a special "Similar Images"
filter. It works quite well if one wants to find similar images in large
databases (up to e.g. 20.000 files).

Info here:
http://www.abc-view.com/articles/article3.html

Jul 22 '06 #7
Bob
Hi Nils

Yes that does help. Thanks for the explanation.

Bob

Jul 22 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: democratix | last post by:
Hi, I've only got a couple years experience developing for Access but have recently been experimenting with HTML/javascript for gui and client-side scripting, mysql for database and php for...
7
by: theonlydrayk | last post by:
the script that show image is : <?php include('dbinfo.inc.php'); mysql_connect($localhost,$username,$password); @mysql_select_db($database) or die( "Unable to select database"); $query =...
24
by: Generic Usenet Account | last post by:
Does anyone have an opinion on how IDL and WSDL compare to each other? Are they equally powerful in their "expressive power"? Sometimes it appears to me that IDL is a little easier for humans to...
3
by: for.fun | last post by:
Hi everybody, I am looking for a XML comparison tool (I do not mean a standard char-by-char diff tool but a tool which understand XML syntax) More precisely, I can have serveral XML...
1
by: sympatico | last post by:
Hi, I am trying to compare 2 images to check if they are exactly identical (in terms of data), I thought this would be quicker than analysing pixels of the images. I have found lots of examples...
4
by: AJ | last post by:
i am making a small application, which acts like a Messenger sort of thing. As a part of it, there is a small module in which i need to implement an image comparison. am not quite sure of how...
2
by: news.microsoft.com | last post by:
PHP versus Microsoft ASP.net - A Straightforward Comparison "I do ask that you read this article with an open mind, and consider that it is quite possible that PHP is no better or worse than...
6
by: sarika | last post by:
Hi All i m making a web site having number of big images. My requirement is i want to dunamically split the images while downloading so that my website works fast and image downloading does not...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.