473,503 Members | 1,633 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Searching for a small byte array in a large binary file (Quickly!)

I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!

Josh
Jul 21 '05 #1
5 2501
Skwerl <ju******@anotherretarded.com> wrote:
I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!


You can do slightly better than that - examine every 5th byte. If it's
one of the bytes in the string, examine the appropriate bytes around it
(so if it's 0xc0, examine the previous one to check whether or not it's
0xff, then the subsequent three to see whether they're 0001108). If the
byte you first examine isn't one of the five you're looking at, you
know the string can't be found in the vicinity of it, so can move to
the position 5 bytes along.

It probably won't save *very* much time, as you'll still do the same
amount of IO, but there'll be less CPU time used examining the memory.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #2
guy
load the file into an array and use Array.BinarySearch?

hth

guy

"Skwerl" wrote:
I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #3
Binarysearch only works for sorted Arrays.

--
cody

Freeware Tools, Games and Humour
http://www.deutronium.de.vu || http://www.deutronium.tk
"guy" <gu*@discussions.microsoft.com> schrieb im Newsbeitrag
news:77**********************************@microsof t.com...
load the file into an array and use Array.BinarySearch?

hth

guy

"Skwerl" wrote:
I'm trying to quickly grab the dimensions out of the headers of JPEG files. I need to look for the hex string FFC0001108 in a file that will be from 1-40 megs in size. If someone has experience doing a similar type of search, what is the most efficient method you have found for doing this? I can see doing this by comparing the first byte of the string to each consecutive byte of the file until it is found, and then checking for the rest of the string, but that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #4
It's quite surprising you would have to browse the whole file to find out
this information.

I would check the file format specification, it's likely you can quickly
locate the header and this information. Try for example :
http://www.funducode.com/freec/Filef...3/format3b.htm

Patrice

--

"Skwerl" <ju******@anotherretarded.com> a écrit dans le message de
news:7E**********************************@microsof t.com...
I'm trying to quickly grab the dimensions out of the headers of JPEG files. I need to look for the hex string FFC0001108 in a file that will be from 1-40 megs in size. If someone has experience doing a similar type of search, what is the most efficient method you have found for doing this? I can see doing this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #5
Yes Patrice, you are correct in that I should not need to scan the entire
file. The problem is that I don't always know that a .jpg image file will
actually BE an JPEG. The other problem is that the various internal
thumbnails in the file that Photoshop, ACDSee etc. make come before the
actual image and are marked the same way since they are in effect full JPEG
files inside the JPEG. I haven't found a really quick method to hop right to
the actual image and skip all the thumbnails yet. Hopping through the file
using the length bytes might be the best way. Thanks!

"Patrice" wrote:
It's quite surprising you would have to browse the whole file to find out
this information.

I would check the file format specification, it's likely you can quickly
locate the header and this information. Try for example :
http://www.funducode.com/freec/Filef...3/format3b.htm

Patrice

--

"Skwerl" <ju******@anotherretarded.com> a écrit dans le message de
news:7E**********************************@microsof t.com...
I'm trying to quickly grab the dimensions out of the headers of JPEG

files.
I need to look for the hex string FFC0001108 in a file that will be from

1-40
megs in size. If someone has experience doing a similar type of search,

what
is the most efficient method you have found for doing this? I can see

doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string,

but
that sounds a bit slow. Thanks in advance for suggestions!

Josh


Jul 21 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
10038
by: Betty Hickman | last post by:
I'm having trouble writing a 2D array to a binary file. Here's what I have: FILE *outfile; short **clump_class; clump_class = malloc(nrows * sizeof(short *)); for (i=0; i<nrows; ++i)...
2
1260
by: Paul L | last post by:
Hi, what's the correct way to redirect to a large binary file? I have a simple database that holds a list of names and URLs. The URLs are of large mp3s (around 80-100mb). I take the name as a...
2
1656
by: tim | last post by:
i have an array of bytes which i write to a binary file seems to me the only way is to write one element at a time, which takes *forever is there a way to write the entire array to the binary...
5
4820
by: kids_pro | last post by:
Hi, How does File.Move implmented? I am like to use it a lot but when I come a cross a large file >500 MB my UI is freezed. I think about implement my own fileMove function but I am not sure...
5
304
by: Skwerl | last post by:
I'm trying to quickly grab the dimensions out of the headers of JPEG files. I need to look for the hex string FFC0001108 in a file that will be from 1-40 megs in size. If someone has experience...
2
4605
by: gauravkhanna | last post by:
Hi All I need some help for the below problem: Scenario We need to send large binary files (audio file of about 10 MB or so) from the client machine (.Net Windows based application, located...
11
2300
by: Gina_Marano | last post by:
Hey all, I need to validate a large binary file. I just need to read the last 100 or so bytes of the file. Here is what I am currently doing but it seems slow: private bool ValidFile(string...
11
10559
by: rvr | last post by:
Would someone mind showing me how to strip the first byte from a binary file? For some reason I can't figure this out from the binary file editing examples I've read. Thanks. ~rvr
1
1381
by: jim2348 | last post by:
how do i write to a overwrite a specific byte in a binary file? i looked around, but i couldn't figure it out.
0
7202
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7280
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7332
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
7462
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
4673
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3154
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1512
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
736
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
382
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.