473,405 Members | 2,344 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

search for duplicated via IPTC field (OS X)

I am looking to filter a folder of images by their IPTC value, this way I can find the duplicates which riddle my collection, sometimes five versions of a single image exist, but all with different file names. The key IPTC field is the "Headline" (or "IPTC:By-line"), this has the duplicate value, when found, the images are shuttled to a separate folder from the main collection.

The following code works in unison with a program called ExifTool (http://www.sno.phy.queensu.ca/~phil/exiftool/), that is how the code is able to interrogate the IPTC data:

Expand|Select|Wrap|Line Numbers
  1.  
  2. #!/usr/bin/perl -w 
  3. use strict; 
  4. BEGIN { unshift @INC, '/usr/bin/lib' } 
  5. use Image::ExifTool; 
  6. print "Using ExifTool version $Image::ExifTool::VERSION\n"; 
  7.  
  8. @ARGV > 2 or die "Syntax: script TAG DIR FILE [FILE...]\n"; 
  9.  
  10. my $tag = shift; 
  11. my $dstdir = shift; 
  12.  
  13. my $exifTool = new Image::ExifTool; 
  14. my $moved = 0; 
  15. my ($file, %foundValue); 
  16. foreach $file (@ARGV) { 
  17.     my $info = $exifTool->ImageInfo($file, $tag); 
  18.     unless (%$info) { 
  19.         warn "$tag not found in $file\n"; 
  20.         next; 
  21.     } 
  22.     my ($val) = values %$info; 
  23.     if ($foundValue{$val}) { 
  24.         # duplicate value, so move to destination directory 
  25.         print "$tag is the same in $file as $foundValue{$val}\n"; 
  26.         my $dst = $file; 
  27.         $dst =~ s{.*/}{};  # remove directory name 
  28.         $dst = "$dstdir/$dst"; 
  29.         if (-e $dst) { 
  30.             warn "$dst already exists!\n"; 
  31.         } elsif (not rename($file, $dst)) { 
  32.             warn "Error moving $file to $dst\n"; 
  33.         } else { 
  34.             print "  --> moved\n"; 
  35.             ++$moved; 
  36.         } 
  37.     } else { 
  38.         $foundValue{$val} = $file;  # save first file with this value 
  39.     } 
  40. printf "%5d files processed\n", scalar(@ARGV); 
  41. printf "%5d files moved\n", $moved; 
  42. # end
  43.  

However there would be many to sort even still, so I was looking to have this code modified in such a way as to have the groups of images (all the ones with "Henry Ford Clinic") in the Headline field placed in a single folder and that folder named by the value of the IPTC field data, so the folder would be called "Henry Ford Clinic".

On top of which, some images are the same but on the end they have -framed in mahogany, of which these images have a faux framing but essentially are the same image. For this I thought then if I could restrict the data compared to say the first "15" chars then I would catch all the duplicates indeed. This also has not been programed in.
Aug 9 '10 #1
4 1899
numberwhun
3,509 Expert Mod 2GB
What you want to do in the case of what you are doing is look through $info and find the IPTC field and then use a regular expression to grab the first 15 characters of its value. If you could post:

1. a print out of $info
2. what the IPTC entry itself looks like

we can help you. Otherwise you can try to do it yourself. Either way, after you get the value you wanted, you could use it to create a directory name and put the files into there.

Regards,

Jeff
Aug 10 '10 #2
Thanks for the reply. I was looking into substr for the ability to look at the first n chars, so I was wondering if this would go from char 0 to the 15th?

Expand|Select|Wrap|Line Numbers
  1. my $oneName = substr($names, 0, 15);
  2.  

I'm not sure what you mean by the IPTC entry, all of the data in that By-line field will be different, so I probably just don't understand, do you mean to to print the info so as to use the info as the folder name?


Thanks.
Aug 10 '10 #3
numberwhun
3,509 Expert Mod 2GB
Well, if you are sure that the name is 15 characters in length, the substr() function should do the job. I am just a regex hound so I tend to recommend them.

Test it and see if it works to create the directories you expect, but don't move files.
Aug 10 '10 #4
Not entirely sure it will be 15 chars, but of course I can mod it, I got this example but not sure where to place it in my code:

Expand|Select|Wrap|Line Numbers
  1. $ perl  -e 'my $names="12345678901234567890"; my $oneName = substr($na
  2. +mes, 0, 15); print "$oneName\n";'
  3. 123456789012345
Aug 10 '10 #5

Sign in to post your reply or Sign up for a free account.

Similar topics

0
by: Jonah Bossewitch | last post by:
Hi, I am using PIL's IptcImagePlugin to extract the IPTC tags from an image file. I was able to successfully retrieve the tags from the image (thanks!), but the results are keyed off of cryptic...
3
by: Eugene | last post by:
Hi, Is there any way to find all stored procedures that contain a given field Example: I want to find all stored procedures that work with the field ShipDate in tblOrder table Thanks, Eugene
3
by: AW | last post by:
Hi all, I have a form (named Offices) with a subform (named Occupants) that connect with the Master/Child Field "office number". I have a combo box that allows a user to pull up a particular...
3
by: AA Arens | last post by:
I have a few buttons on my form to search for text in a dedicated field: Private Sub CmdSearchA_Click() On Error GoTo Err_Find_Record_Click Me.CustomerID.SetFocus DoCmd.DoMenuItem...
1
by: pushp | last post by:
in table1- loan_type=1,3 and table2- type_loan=3,4. If any value match with loan_type to type_loan then it give true result. pls help me.
7
by: metalheadstorm | last post by:
God this is annoying me, im sorry if this has been asked before but ive looked though the forum and the net but my searches havent come up woth anything.... mainly due to that i dont really ...
1
by: Tine Müller | last post by:
In my table I have a field called "lat" and "lng" which have the coordinates for showing the markers on my map http://www.tinemuller.dk/alle_folkebiblioteker/dropdownmenu/PVII/. The problem is...
1
by: dhanu sahu | last post by:
Hi all I have a SQL database table with a datetime field in dd/mm/yyyy format. I have a web page that allows users to search this database. How do you search a datetime field order by...
3
by: John Torres | last post by:
I am trying to create a combo box search on the form with the Part Number but for some reason it’s pulling the Part Number Table’s Primary Key. Can’t figure out why. Any ideas where to start? FYI-...
5
by: Dave Rado | last post by:
Hi I have been the following code by Freefind to use on my search page: <form action="http://search.freefind.com/find.html" method="get" accept-charset="utf-8" target="_self"> <input...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.