sam wrote:
same as subject?
How can I write a short reply to a mere snippet of a question like that?
<minor preamble rant>Open an image file in a text editor to get a feel for
how easy it's going to be to extract the data. Actually manage to find
documentation that doesn't skip the important bits (the Photoshop 6 SDK has
a doc with the tag numbers you'll need - in the docs about tiffs). Read that
and wonder why some length data includes the length of the length data and
some doesn't. Realize that all you have to do is get to the xmp stuff, so
all the weird iptc etc metadata format doesn't matter. Figure out how to
extract the xmp data into a string via two encodings. Marvell at the
stupidity of making xml appear to be human-readable when it's only really
meant for computers, except that you're the one who has to tell the computer
how to understand it. Fiddle around with it until you coerce the data into
the format you want.</rant>
Read in the file (actually just the first 32KB will hopefully contain the
EXIF data).
Parse through the tags until you find the appropriate section. Extract that
(XMP) section into a stringbuilder, remembering that UTF8 is used.
Put that into a string, which will magically take it into the .net framework
world of UTF16.
Get rid of the <?xpacket stuff as that only messes things up.
Similarly, get rid of the namespacey stuff.
[See below for a function to do all that.]
Now you can use the Microsoft.Xml.XQuery to apply xqueries to the data.
(' XQuery info at
http://aspnet.4guysfromrolla.com/articles/071603-1.aspx
' XQuery.msi available from
http://aspnet.4guysfromrolla.com/code/XQueryStuff.zip )
///////////////////
Dim doc As New XPathDocument(New StringReader(getXMPdata()))
Dim nav As XPathNavigator = doc.CreateNavigator()
Dim col As New XQueryNavigatorCollection
col.AddNavigator(nav, "xmp")
\\\\\\\\\\\\\\\\\
Now, say that i has an enum value, and we are choosing which metadata to
extract:
/////////////////////
Select Case i
Case MetaDataType.Caption
query = "FOR $x IN document(""xmp"")//description RETURN $x//li"
Case MetaDataType.Keywords
query = "FOR $x IN document(""xmp"")//subject RETURN $x//li"
case ....
\\\\\\\\\\\\\\\\\\\\\
Now perform the xquery to get some raw xml:
//////////////////////
Dim expr As New XQueryExpression(query)
Dim rawXML As String = (expr.Execute(col)).ToXml()
\\\\\\\\\\\\\\\\\\\\
And depending on the type of the metadata, we may have to process it in
different ways to get the value/values:
//////////////////
' the item needs to be extracted from the remaining surrounding xml tags
' uses RegexOptions.Singleline because sometimes trailing CRLFs get in
Select Case i
Case MetaDataType.Caption
myCaption = Regex.Replace(rawXML, "<li.*>(.+)</li>", "$1",
RegexOptions.Singleline).Trim
Case MetaDataType.Keywords
Dim mc As MatchCollection = Regex.Matches(rawXML,
"(?:<li>)(?<data>.+?)(?:</li>)")
For j As Integer = 0 To mc.Count - 1
myKeywords.Add(mc(j).Groups("data").Value)
Next
case...
\\\\\\\\\\\\\\\\\\\\
'Function to extract xap packet:
' Note: the following variables need to be declared at the class level:
' Dim src as Byte() ' and read in the first 32KB of a jfif file.
' Dim ptr As Int32 = 0 ' pointer to current position in data
///////////////////////////////////
Friend Function getXMPdata() As String
reset()
Dim s As New StringBuilder
' Start of image tag
Dim SOI() As Byte = {&HFF, &HD8}
If Not (checkBytes(src, SOI)) Then
Throw New Exception("Wrong SOI")
End If
' application marker
Dim AppMarker() As Byte = {&HFF, &HE0}
If Not (checkBytes(src, AppMarker)) Then
Throw New Exception("Wrong AppMarker")
End If
' skip the appmarker segment
' s.Append("ptr=&" & Hex(ptr) & vbCrLf)
' s.Append("src(ptr)=" & src(ptr).ToString & " src(ptr) *256=" & (src(ptr)
* 256).ToString & " src(ptr + 1)=" & src(ptr + 1).ToString & vbCrLf)
ptr += (src(ptr) * 256) + src(ptr + 1)
' s.Append("ptr=&" & Hex(ptr) & vbCrLf)
' Now go through the segments until we find the APP1 one with the
namespace field containing
http://ns.adobe.com/xap/1.0/
Dim APP1() As Byte = {&HFF, &HE1}
Dim markerFound As Boolean = False
'Dim segmentMarker As String
Dim namespaceFieldEncoding As New System.Text.ASCIIEncoding
's.Append("Seek APP14 0xFF 0xED" & vbCrLf)
While Not (markerFound)
If checkBytes(src, APP1) AndAlso namespaceFieldEncoding.GetString(src,
ptr + 2, 28) = "http://ns.adobe.com/xap/1.0/" Then
's.Append("Found at &" & Hex(ptr) & vbCrLf)
markerFound = True
Else
's.Append("Segment size: &" & Hex(src(ptr)) & " &" & Hex(src(ptr + 1)) &
" " & ((src(ptr) * 256) + src(ptr + 1)).ToString & vbCrLf)
ptr += (readBytes(src, 1)(0) * 256) + readBytes(src, 1)(0)
's.Append("ptr=" & ptr.ToString & vbCrLf)
If ptr >= UBound(src) Then
Throw New Exception("Out of data seeking APP1 marker.")
's.Append("Fallen off end." & vbCrLf)
'markerFound = True
End If
End If
End While
' Return s.ToString
' ptr is now at the start of the APP1 segment
' APP1 segment format:-
'[Byte Offset] [Field value] [Field name] [Length (bytes)]
Comments
'0 0xFFE1 APP1 2 APP 1 marker.
'2 2 + length of namespace (29) Lp 2 Size in
bytes of this count plus the
' + length of XMP Packet following
two portions.
'
'4 Null-terminated namespace 29 XMP
namespace URI, used as unique ID:
' ASCII string
http://ns.adobe.com/xap/1.0/
' without(quotation)
' marks.
'
'33 < XMP Packet Must be encoded
as UTF-8.
'
Dim APP1SegSize As Int32 = readBytes(src, 1)(0) * 256 + readBytes(src,
1)(0)
Dim XMPpacketSize As Int32 = APP1SegSize - 29 - 2
ptr += 31 ' skip fields "Lp" and "namespace"
Dim encoding As System.Text.Encoding = System.Text.Encoding.UTF8
If XMPpacketSize 2 Then
s.Append(encoding.GetString(src, ptr - 2, XMPpacketSize))
End If
Dim r As String = s.ToString ' this will convert it to Unicode (UTF16)
' get rid of the <?xpacket stuff
Dim re As New Regex(".*\?>")
r = re.Replace(r, "").Trim
' get rid of the namespaces in <namespace:nameand </namespace/name>
re = New Regex("(</{0,1})[A-Za-z]*?:")
r = re.Replace(r, "$1")
Return r
End Function
' Functions to help getXMPdata()
<System.Diagnostics.DebuggerStepThrough()Private Function checkBytes(ByVal
src() As Byte, ByVal cf() As Byte) As Boolean
' compare bytes in the source to the given array of bytes
' and advance the pointer
Dim nBytes As Int32 = 0
' if it goes past the end of src, it is not a match
If ptr + nBytes UBound(src) Then
Return False
End If
Dim a As Boolean = True
'Debug.WriteLine("Checkbytes starting at ptr=&" & Hex(ptr) & vbCrLf)
For i As Int32 = 0 To UBound(cf)
'Debug.WriteLine(Hex(src(ptr + i)) & " ")
'b = src(ptr + i) : c = cf(i) ' just to make debugger show them
nBytes += 1
If src(ptr + i) <cf(i) Then
a = False
Exit For
End If
Next
ptr += nBytes
Return a
End Function
<System.Diagnostics.DebuggerStepThrough()Private Function
peekCheckBytes(ByVal src() As Byte, ByVal cf() As Byte) As Boolean
' compare bytes in the source to the given array of bytes
' do not advance the pointer
Dim nBytes As Int32 = UBound(cf) + 1
' if it goes past the end of src, it is not a match
If ptr + nBytes UBound(src) Then
Return False
End If
Dim a As Boolean = True
For i As Int32 = 0 To UBound(cf)
If src(ptr + i) <cf(i) Then
a = False
Exit For
End If
Next
Return a
End Function
<System.Diagnostics.DebuggerStepThrough()Private Function readBytes(ByVal
src() As Byte, ByVal nBytes As Int32) As Byte()
' return an array of bytes from the source
' and advance the pointer
'Debug.WriteLine("Readbytes at ptr=&" & Hex(ptr) & vbCrLf)
If ptr + nBytes UBound(src) Then
Throw New Exception("Attempted read past end of source in readBytes.")
End If
Dim dest(nBytes - 1) As Byte
Array.Copy(src, ptr, dest, 0, nBytes)
ptr += nBytes
Return dest
End Function
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
I did that with .NET 1.1, so xpath stuff wasn't available, hence using
Microsoft.Xml.Xquery. Either that or it was so convoluted I went with
xquery. The pointer stuff would have looked neater with C#, I suppose.
Go on then you lot, laugh at my coding :-) But it does work.
HTH
Andrew