By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,763 Members | 1,609 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,763 IT Pros & Developers. It's quick & easy.

Parse OLE Object - C#

P: 2
I have an Access database (images.mdb) that has 2 columns: one is the id of
the picture (an integer) and one (column named picture) is a field of type
OLE Object which contains an image stored as on OLE Object (it can store
jpg, bmp, gif, but I don't
know what image is stored inside).
I want to retrieve the picture stored in the database and identified by a
given id and display it in a web page (.aspx).
I write in Visual C# , but it does not matter, VB answers are just as
wellcome.
My problem is that this OLE Object field does not contain just the raw array
of bytes that form the image. So I can not just read the array of bytes and
output it to the browser.
no, the OLE Object contains some extra information about the type of the
file stored
(which would be good to know so I can know what kind of image it is). but I
don't know how to get this information.
I also don't know how to separate this information from the actual image.
Does anyone know how to solve this ?
Nov 22 '07 #1
Share this Question
Share on Google+
3 Replies


kenobewan
Expert 2.5K+
P: 4,871
I notice that you copied a post from 2005:

Expand|Select|Wrap|Line Numbers
  1. using System;
  2. using System.Collections.Generic;
  3. using System.ComponentModel;
  4. using System.Data;
  5. using System.Data.OleDb;
  6. using System.Drawing;
  7. using System.Drawing.Imaging;
  8. using System.Text;
  9. using System.IO;
  10. using System.Windows.Forms;
  11. namespace OleImages
  12. {
  13.     public partial class Form1 : Form
  14.     {
  15.         public Form1()
  16.         {
  17.             InitializeComponent();
  18.         }
  19.         private void Form1_Load(object sender, EventArgs e)
  20.         {
  21.             String strConn = @"Provider = Microsoft.Jet.OLEDB.4.0;Data Source = C:\Nwind.mdb;";
  22.             OleDbConnection conn = new OleDbConnection(strConn);
  23.             Byte[] byPicture;
  24.             String strCmd = "Select Picture From Categories where CategoryID=1";
  25.             OleDbCommand cmd = new OleDbCommand(strCmd, conn);
  26.             try
  27.             {
  28.                 conn.Open();
  29.                 byPicture = (Byte[]) cmd.ExecuteScalar();
  30.                 conn.Close();
  31.                 MemoryStream ms = new MemoryStream();
  32.                 Bitmap bm;
  33.                 ms.Write(byPicture, 78, byPicture.Length - 78);
  34.                 bm = new Bitmap(ms);
  35.                 pictureBox1.Image = bm;
  36.                 String strPath = Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + "\\test.jpg";
  37.                 bm.Save(strPath, ImageFormat.Jpeg);
  38.             }
  39.             catch
  40.             {
  41.  
  42.             }
  43.         }
  44.     }
  45. }
Nov 22 '07 #2

P: 2
Hi,

Thanks for the reply !!!

Yes i copied that post because i was not able to find any solution on that post and currentlyi am also facing the same issue.

I have tried the solution provided by you but i get "Parameter is not valid error" while initializing the Bitmap. Moreover, the Ole Object field in my case can contain any file: word, excel, jpg, bmp, gif. So, really not sure whether the offset will be 78 in every case.

i am not getting any way forward... will appreciate any help on this.
Nov 23 '07 #3

P: 1
I recently had to fight this battle, and found no answers online. I normally don't write code at the bit level, so excuse any novice mistakes, but lacking a library to deal with OLE Files from Access I had to parse it as well as I could.

In C# I retrieve my document from Access (or SQL Server if the Access Database has been upsized) as a byte array. Access encapsulates files with its own header, which isn't an OLE file structure. It's something different...

Expand|Select|Wrap|Line Numbers
  1. byte[] doc = ld.GetSupportingDocument(docID);
  2.  
  3. MemoryStream ms = new MemoryStream();
  4. ms.Write(doc, 0, doc.Length);
  5. int firstByte;
  6. int secondByte;
  7. ms.Seek(0, SeekOrigin.Begin);
  8. firstByte = ms.ReadByte();
  9. secondByte = ms.ReadByte();
  10.  
  11. if (firstByte != 0x15 && secondByte != 0x1C) {
  12.     ErrorResponse("Stored object is not an Access File.");
  13.     return;
  14. }
  15.  
The first two bytes are a signature, if they don't equal to 0x15 and 0x1C, it's not an access ole file. The next short is the end of the file type:

Expand|Select|Wrap|Line Numbers
  1. int fileTypeLoc = 20; // begin of the file type
  2. short offset; // end of the file type
  3.  
  4. byte[] buffer = new byte[2];
  5. ms.Read(buffer, 0, 2);
  6. offset = BitConverter.ToInt16(buffer, 0);
  7.  
Keeping track of how far I've read into the file, I store a portion of the bytes as a string, starting from 0x14 (the 20th byte) up to the offset I retrieved in the previous block.

Expand|Select|Wrap|Line Numbers
  1. long seekTotal = 0; 
  2. seekTotal += offset;
  3.  
  4. string docType = String.Empty;
  5. for (int i = fileTypeLoc; i < offset; i++) {
  6.     docType += (char)doc[i];
  7. }
  8.  
The next bit is how I'm figuring out what type of file it is (so that when I serve it via HTTP, I can set the file name and content type properly. There's no real parsing going on in this block, with the exception of the Package type. A package can be anything, a zip file, a gif, a pdf, whatever. When you have a package, the original file name is stored in the access header, so I read in 256 bytes (an arbitrary number I selected based on trial and error), and pluck the original file extension from it. Because of my database I have no worries that it's anything but a pdf, but if you can't guaruntee that, you need to do a better job of parsing than I am.

Expand|Select|Wrap|Line Numbers
  1. bool packageIsPdf = false;
  2. string ext = "dat";
  3. string filename = "supporting-document";
  4. string contentType = "application/octet-stream";
  5. if (docType.Contains("Word.Document.8")) {
  6.     ext = "doc";
  7.     contentType = "application/ms-word";
  8. } else if (docType.Contains("AcroExch.Document.7")) {
  9.     contentType = "application/pdf";
  10.     ext = "pdf";
  11. } else if (docType.Contains("Package")) {
  12.     // packages are generic and require more processing
  13.     string packageBuffer = String.Empty;
  14.     for (int i = 20; i < 256; i++) {
  15.         packageBuffer += (char)doc[i];
  16.     }
  17.     if (packageBuffer.Contains(".pdf")) {
  18.         contentType = "application/pdf";
  19.         ext = "pdf";
  20.         packageIsPdf = true;
  21.     } else if (packageBuffer.Contains(".zip")) {
  22.         contentType = "application/zip";
  23.         ext = "zip";
  24.     } else {
  25.         ext = "dat";
  26.     }
  27. } else if (docType.Contains("Excel.Sheet.8")) {
  28.     ext = "xls";
  29.     contentType = "application/ms-excel";
  30. } else if (docType.Contains("PowerPoint.Show.8")) {
  31.     ext = "ppt";
  32.     contentType = "application/ms-powerpoint";
  33. } else if (docType.Contains("Word.Document.12")) {
  34.     ext = "docx";
  35.     contentType = "application/ms-word";
  36. } else if (docType.Contains("PowerPoint.Show.12")) {
  37.     ext = "pptx";
  38.     contentType = "application/ms-powerpoint";
  39. } else if (docType.Contains("Excel.Sheet.12")) {
  40.     ext = "xlsx";
  41.     contentType = "application/ms-excel";
  42. }
  43.  
Read 8 more bytes. These bytes should always be 01 05 00 00 02 00 00 00.

Expand|Select|Wrap|Line Numbers
  1. // magic eight bytes 01 05 00 00 02 00 00 00
  2. ms.Seek(seekTotal, SeekOrigin.Begin);
  3. buffer = new byte[8];
  4. ms.Read(buffer, 0, 8);
  5. seekTotal += 8;
  6.  
Read the next long. Move to that location.

Expand|Select|Wrap|Line Numbers
  1. // Second offset to move to 
  2. buffer = new byte[4];
  3. ms.Read(buffer, 0, 4);
  4. seekTotal += 4;
  5. long offset2 = BitConverter.ToInt32(buffer, 0);
  6. seekTotal += offset2;
  7. ms.Seek(seekTotal, SeekOrigin.Begin);
  8.  
Read 8 empty bytes.

Expand|Select|Wrap|Line Numbers
  1. // eight empty bytes
  2. buffer = new byte[8];
  3. ms.Read(buffer, 0, 8);
  4. seekTotal += 8;
  5.  
The next long will tell you how many bytes your encapsulated file is

Expand|Select|Wrap|Line Numbers
  1. // next n bytes are the length of the file
  2. buffer = new byte[4];
  3. ms.Read(buffer, 0, 4);
  4. seekTotal += 4;
  5. long fileByteLength = BitConverter.ToInt32(buffer, 0);
  6.  
The next N bytes consist of your file. Create a new buffer of this length and read from your memory stream into the buffer.

Expand|Select|Wrap|Line Numbers
  1. // next N bytes are the file
  2. byte[] data = new byte[fileByteLength];
  3.  
  4. // store file bytes in data buffer
  5. ms.Read(data, 0, Convert.ToInt32(fileByteLength));
  6.  
If your file is a PDF, you have another headache to deal with, OLE2 Compound Files. I deal with extracting the pdf from the OLE2 file in another method using the Gembox Compound File 1.1 library.

Expand|Select|Wrap|Line Numbers
  1.     if (contentType == "application/pdf" && !packageIsPdf) {
  2.         data = GetPdfFromOle(data, Convert.ToInt32(seekTotal), Convert.ToInt32(fileByteLength));
  3.     }
  4.  
If everything went well, I can return the byte array to my Response object

Expand|Select|Wrap|Line Numbers
  1.     if (data == null) {
  2.         Response.Write("Unable to retrieve file");
  3.         Response.End();
  4.     }
  5.  
  6.     string contentDisposition = String.Format("attachment; filename={0}.{1}", filename, ext);
  7.     Response.ContentType = contentType;
  8.     Response.AppendHeader("Content-Disposition", contentDisposition);
  9.     Response.BinaryWrite(data);
  10.     Response.End();
  11.  
A code sample of extracting the pdf from an OLE2 Compound file using Gembox's library

Expand|Select|Wrap|Line Numbers
  1. private byte[] GetPdfFromOle(byte[] data, int offset, int length) {
  2.     string tmpFileName = Path.GetTempFileName();
  3.  
  4.     FileStream fstmp = new FileStream(tmpFileName, FileMode.Create, FileAccess.Write, FileShare.None);
  5.     fstmp.Write(data, 0, data.Length);
  6.     fstmp.Close();
  7.  
  8.     byte[] pdfData = null;
  9.     Ole2CompoundFile ole2file = new Ole2CompoundFile();
  10.  
  11.     try {
  12.  
  13.         ole2file.Load(tmpFileName, false);
  14.         foreach (Ole2Stream entry in ole2file.Root) {
  15.             log.Debug(entry.Name);
  16.  
  17.             if (entry.Name.ToLower() == "contents") {
  18.                 pdfData = entry.GetData();
  19.                 break;
  20.             }
  21.         }
  22.     }catch (Exception ex) {
  23.         ErrorResponse(ex.Message);
  24.     } finally {
  25.         File.Delete(tmpFileName);
  26.         ole2file.Close();
  27.     }
  28.     return pdfData;
  29. }
  30.  
Hope this helps.
Nov 29 '07 #4

Post your reply

Sign in to post your reply or Sign up for a free account.