By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,492 Members | 1,242 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,492 IT Pros & Developers. It's quick & easy.

Searching for a small byte array in a large binary file (Quickly!)

P: n/a
I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!

Josh
Jul 21 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Skwerl <ju******@anotherretarded.com> wrote:
I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!


You can do slightly better than that - examine every 5th byte. If it's
one of the bytes in the string, examine the appropriate bytes around it
(so if it's 0xc0, examine the previous one to check whether or not it's
0xff, then the subsequent three to see whether they're 0001108). If the
byte you first examine isn't one of the five you're looking at, you
know the string can't be found in the vicinity of it, so can move to
the position 5 bytes along.

It probably won't save *very* much time, as you'll still do the same
amount of IO, but there'll be less CPU time used examining the memory.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #2

P: n/a
guy
load the file into an array and use Array.BinarySearch?

hth

guy

"Skwerl" wrote:
I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #3

P: n/a
Binarysearch only works for sorted Arrays.

--
cody

Freeware Tools, Games and Humour
http://www.deutronium.de.vu || http://www.deutronium.tk
"guy" <gu*@discussions.microsoft.com> schrieb im Newsbeitrag
news:77**********************************@microsof t.com...
load the file into an array and use Array.BinarySearch?

hth

guy

"Skwerl" wrote:
I'm trying to quickly grab the dimensions out of the headers of JPEG files. I need to look for the hex string FFC0001108 in a file that will be from 1-40 megs in size. If someone has experience doing a similar type of search, what is the most efficient method you have found for doing this? I can see doing this by comparing the first byte of the string to each consecutive byte of the file until it is found, and then checking for the rest of the string, but that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #4

P: n/a
It's quite surprising you would have to browse the whole file to find out
this information.

I would check the file format specification, it's likely you can quickly
locate the header and this information. Try for example :
http://www.funducode.com/freec/Filef...3/format3b.htm

Patrice

--

"Skwerl" <ju******@anotherretarded.com> a écrit dans le message de
news:7E**********************************@microsof t.com...
I'm trying to quickly grab the dimensions out of the headers of JPEG files. I need to look for the hex string FFC0001108 in a file that will be from 1-40 megs in size. If someone has experience doing a similar type of search, what is the most efficient method you have found for doing this? I can see doing this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #5

P: n/a
Yes Patrice, you are correct in that I should not need to scan the entire
file. The problem is that I don't always know that a .jpg image file will
actually BE an JPEG. The other problem is that the various internal
thumbnails in the file that Photoshop, ACDSee etc. make come before the
actual image and are marked the same way since they are in effect full JPEG
files inside the JPEG. I haven't found a really quick method to hop right to
the actual image and skip all the thumbnails yet. Hopping through the file
using the length bytes might be the best way. Thanks!

"Patrice" wrote:
It's quite surprising you would have to browse the whole file to find out
this information.

I would check the file format specification, it's likely you can quickly
locate the header and this information. Try for example :
http://www.funducode.com/freec/Filef...3/format3b.htm

Patrice

--

"Skwerl" <ju******@anotherretarded.com> a écrit dans le message de
news:7E**********************************@microsof t.com...
I'm trying to quickly grab the dimensions out of the headers of JPEG

files.
I need to look for the hex string FFC0001108 in a file that will be from

1-40
megs in size. If someone has experience doing a similar type of search,

what
is the most efficient method you have found for doing this? I can see

doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string,

but
that sounds a bit slow. Thanks in advance for suggestions!

Josh


Jul 21 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.