473,889 Members | 1,846 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Image comparison tool

I am looking for a way to take a large number of images and find
matches among them. These images may not be exact replicas. Images
may have been resized, cropped, faded, color corrected, etc.

Approach 1
Programmaticall y extract the information (such as Eigen Vectors/Eigen
Spaces) and store them in a database. Then apply a comparison
algorithm between the database entries to find like images.

Approach 2
Store all the images and run the comparison tool against the individual
images

Approach 1 is preferred since I believe the comparison could be
performed much more rapidly once the comparison information has been
extracted. However, this requires a library capable of independent
extraction and comparison.

Does anybody have and suggestions for a dll library that can perform
the above stated tasks?

Any suggestions on how to store the information in the database? More
specifically, what would the schema look like.

Any help is appreciated

Jul 21 '06 #1
7 13551
bcutting,
There is an author at codeproject.com who has at least one article on motion
detection algorithms that should be very close.
Peter

--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com


"bc******@gmail .com" wrote:
I am looking for a way to take a large number of images and find
matches among them. These images may not be exact replicas. Images
may have been resized, cropped, faded, color corrected, etc.

Approach 1
Programmaticall y extract the information (such as Eigen Vectors/Eigen
Spaces) and store them in a database. Then apply a comparison
algorithm between the database entries to find like images.

Approach 2
Store all the images and run the comparison tool against the individual
images

Approach 1 is preferred since I believe the comparison could be
performed much more rapidly once the comparison information has been
extracted. However, this requires a library capable of independent
extraction and comparison.

Does anybody have and suggestions for a dll library that can perform
the above stated tasks?

Any suggestions on how to store the information in the database? More
specifically, what would the schema look like.

Any help is appreciated

Jul 21 '06 #2
If you want to extract shape info or some other kind of metric, you must
first know what exactly. E.g. for face recognition, eigenvalues are used,
for other types of recognition feature points are used, etc.

If you have no a-priori info, you can still compare the thumbnails (create
mini-thumbnails of the same size for each image), and use something like a
Hamming distance to find the matches. If you convert them all to the same
normalized grayscale images, you can also detect slight colour mismatches.

I wrote exactly this quite a few years back, and it is still available in
the form of a shareware image browser having a special "Similar Images"
filter. It works quite well if one wants to find similar images in large
databases (up to e.g. 20.000 files).

Info here:
http://www.abc-view.com/articles/article3.html

I termed the information I store "image metrics", well they are nothing more
than a smart wavelet-like way of storing the minithumbnails. Since the
metrics information is small (couple of hundred bytes each), they can be
kept in memory which speeds up the comparison process enormously.

The procedure to find duplicates consists of:
1. Calculate an image metric for each image
2. Compare the list, using hamming distance with a smart similarity sorter
3. Output the list to a thumbnail viewer, sorted by similarity, showing only
similar images as colour-coded groups.

#1 can take quite some time, but the image database can be stored, so this
only needs to be done once.

Hope that helps,

Nils Haeck
www.simdesign.nl
<bc******@gmail .comschreef in bericht
news:11******** **************@ h48g2000cwc.goo glegroups.com.. .
>I am looking for a way to take a large number of images and find
matches among them. These images may not be exact replicas. Images
may have been resized, cropped, faded, color corrected, etc.

Approach 1
Programmaticall y extract the information (such as Eigen Vectors/Eigen
Spaces) and store them in a database. Then apply a comparison
algorithm between the database entries to find like images.

Approach 2
Store all the images and run the comparison tool against the individual
images

Approach 1 is preferred since I believe the comparison could be
performed much more rapidly once the comparison information has been
extracted. However, this requires a library capable of independent
extraction and comparison.

Does anybody have and suggestions for a dll library that can perform
the above stated tasks?

Any suggestions on how to store the information in the database? More
specifically, what would the schema look like.

Any help is appreciated

Jul 21 '06 #3
Bob
Nils

That sounds interesting. Have you published anything on the algorithm
for computing the similarity metric?

Bob
Nils wrote:
I wrote exactly this quite a few years back, and it is still available in
the form of a shareware image browser having a special "Similar Images"
filter. It works quite well if one wants to find similar images in large
databases (up to e.g. 20.000 files).

Info here:
http://www.abc-view.com/articles/article3.html
Jul 21 '06 #4
On 21 Jul 2006 09:51:10 -0700, bc******@gmail. com wrote:
>I am looking for a way to take a large number of images and find
matches among them. These images may not be exact replicas. Images
may have been resized, cropped, faded, color corrected, etc.

Approach 1
Programmatical ly extract the information (such as Eigen Vectors/Eigen
Spaces) and store them in a database. Then apply a comparison
algorithm between the database entries to find like images.
There is a program that does this, called DupDetector. I've used it,
and it's pretty good at finding similar images. It's freeware.

However, it is (was) made and distributed by prismatic software, at
www.prismatic.com, and when you go there now, you get a page saying
closed as of 5/28/06. You might be able to find it still someplace
else by googling.

Also, it's an executable, not a library that you can use, nor was
source code released as I recall, so it may not be of any use to you.

Terry
Jul 21 '06 #5
Thumbs Plus (shareware, or it used to be) has done this for years.
It's reasonably effective, and maybe they will share their secrets..

Jul 22 '06 #6
Hi Bob,

No I haven't published anything on the algorithm except for the brief
description on how the software works on the webpage mentioned. However,
it's not rocket science :)

People that have to do a comparison can simply try out the software (30-use
functional trial, sales price $29). Software engineers/developers wanting to
make use of the software can always buy the source code from my company. I
have sold it already to a few companies creating image cataloguers and image
processing software. I think anyone could write such a thing themself,
however it might make sense to buy it to save yourself a few weeks of work.

Here is the basic idea with these assumptions:

a) We only compare the grayscale version
b) We are not interested in aspect ratio

1. Start with an image of dimensions WxH
2. Scale down into thumbnail of 16x16 pixels, only grayscale, 256 levels
3. Normalize the thumbnail (so it contains values 0..255 instead of eg.
25..230)
4. Create a subthumbnail of 8x8, 4x4, 2x2 and 1x1
5. Store these thumbnails such that 1x1 comes first, then 2x2, etc

Now the comparison. Realise that when comparing two images, we are only
interested in images that are close. So if there's a big difference between
them, we can abort the comparison quite soon.

Comparing two images with metrics A and B, metric consisting of

A = {a1..aN}, N = 341 (1 + 4 + 16 + 64 + 256 = 341)
B = {b1..bN}, N = 341

Weighting: w1..wN, where

w1 = 256
w2 ..w5 = 64
w6 ..w21 = 16
w21..w85 = 4
w21..w277 = 1

comparison value between A and B is

Cp = sum_i,i=1..p{ma x(0, abs(ai - bi) - 1)) * wi}, p can be 1..N

Note the term max(0, abs(ai - bi) - 1): We compare two pixels, and use the
"difference - 1", because often a difference of 1 occurs through
resampling/normalization.

When comparing two metrics we define a threshold T, so we can stop
comparison if Cp T, and just store the value Cp (p <= N) up to that point.
If T is low enough we often can stop comparison after just comparing one
byte!

Now.. when comparing a large list of metrics, one can simply sort them, then
take one as start S, put a sliding window {-T/256, T/256} on the sorted list
around S and compare all the metrics in that group with S, to find any
matching metrics to S.

this way we can build a new list, beginning with S, then the one closest
matching that one, then find again the closest match to this one, etc. Each
time we remove the metric from the original list. In the end we have a
sorted list of images, by similarity.

The algorithm is still O(N^2) but nevertheless cuts out a large portion of
work compared to the full N^2 algorithm.

There are some specialities not mentioned here (for colours, for aspect
ratio, etc), but this is the general principle.

Note: I looked into a lot of different techniques (Fourier-transformations ,
Gabor wavelets, feature point extraction, etc) but more complex is not
always better. In this case, simplicity seems to favour.

Hope that helps,

Nils Haeck
www.simdesign.nl
"Bob" <ra******@spamb ob.netschreef in bericht
news:11******** **************@ b28g2000cwb.goo glegroups.com.. .
Nils

That sounds interesting. Have you published anything on the algorithm
for computing the similarity metric?

Bob
Nils wrote:
>I wrote exactly this quite a few years back, and it is still available in
the form of a shareware image browser having a special "Similar Images"
filter. It works quite well if one wants to find similar images in large
databases (up to e.g. 20.000 files).

Info here:
http://www.abc-view.com/articles/article3.html

Jul 22 '06 #7
Bob
Hi Nils

Yes that does help. Thanks for the explanation.

Bob

Jul 22 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
2841
by: democratix | last post by:
Hi, I've only got a couple years experience developing for Access but have recently been experimenting with HTML/javascript for gui and client-side scripting, mysql for database and php for server-side scripting. I've been running it all on the development machine until the application I'm building is advanced enough to start optimising/testing with network lag in mind. So it's no slower, and in some ways at least it seems faster.
7
3600
by: theonlydrayk | last post by:
the script that show image is : <?php include('dbinfo.inc.php'); mysql_connect($localhost,$username,$password); @mysql_select_db($database) or die( "Unable to select database"); $query = "SELECT file,filesize,filetype FROM user WHERE id=1;"; $result=mysql_query($query); mysql_close();
24
8716
by: Generic Usenet Account | last post by:
Does anyone have an opinion on how IDL and WSDL compare to each other? Are they equally powerful in their "expressive power"? Sometimes it appears to me that IDL is a little easier for humans to follow. Also, it appears to be more compact. Are IDL and WSDL equally powerful in expressing complex data types, and describing inheritance and association relationships? What would be the most compelling reasons to choose one over the other?
3
1709
by: for.fun | last post by:
Hi everybody, I am looking for a XML comparison tool (I do not mean a standard char-by-char diff tool but a tool which understand XML syntax) More precisely, I can have serveral XML structures organized differently. The XML nodes can store the same data but be organized differently => in such a case, I would like the diff tool to tell me that both XML files are identicals.
1
10149
by: sympatico | last post by:
Hi, I am trying to compare 2 images to check if they are exactly identical (in terms of data), I thought this would be quicker than analysing pixels of the images. I have found lots of examples for encrypting text strings using MD5 hash, but I cannot find a way to get this to work with images that exist in picture boxes. I am trying to get a hash for the image file, but not from the file but from the picture box as there is some editing...
4
8841
by: AJ | last post by:
i am making a small application, which acts like a Messenger sort of thing. As a part of it, there is a small module in which i need to implement an image comparison. am not quite sure of how to proceed. any guidance in this regard would be well appreciated. Thanx
2
1534
by: news.microsoft.com | last post by:
PHP versus Microsoft ASP.net - A Straightforward Comparison "I do ask that you read this article with an open mind, and consider that it is quite possible that PHP is no better or worse than ASP.net. I have become weary of the whole PHP is superior to ASP.net debate. I believe after reading you may find that ASP.net has a lot to offer you as a developer, maybe more so than PHP. We as business persons evaluate technology based on its...
6
4917
by: sarika | last post by:
Hi All i m making a web site having number of big images. My requirement is i want to dunamically split the images while downloading so that my website works fast and image downloading does not not much time. I need a script in php for doing this. Please help me .Your help will be highly appreciated.
0
9966
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9810
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11199
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
7997
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7150
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5830
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
6029
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
4251
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3256
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.