473,569 Members | 2,700 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Searching for a small byte array in a large binary file (Quickly!)

I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!

Josh
Jul 21 '05 #1
5 2512
Skwerl <ju******@anoth erretarded.com> wrote:
I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!


You can do slightly better than that - examine every 5th byte. If it's
one of the bytes in the string, examine the appropriate bytes around it
(so if it's 0xc0, examine the previous one to check whether or not it's
0xff, then the subsequent three to see whether they're 0001108). If the
byte you first examine isn't one of the five you're looking at, you
know the string can't be found in the vicinity of it, so can move to
the position 5 bytes along.

It probably won't save *very* much time, as you'll still do the same
amount of IO, but there'll be less CPU time used examining the memory.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #2
guy
load the file into an array and use Array.BinarySea rch?

hth

guy

"Skwerl" wrote:
I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #3
Binarysearch only works for sorted Arrays.

--
cody

Freeware Tools, Games and Humour
http://www.deutronium.de.vu || http://www.deutronium.tk
"guy" <gu*@discussion s.microsoft.com > schrieb im Newsbeitrag
news:77******** *************** ***********@mic rosoft.com...
load the file into an array and use Array.BinarySea rch?

hth

guy

"Skwerl" wrote:
I'm trying to quickly grab the dimensions out of the headers of JPEG files. I need to look for the hex string FFC0001108 in a file that will be from 1-40 megs in size. If someone has experience doing a similar type of search, what is the most efficient method you have found for doing this? I can see doing this by comparing the first byte of the string to each consecutive byte of the file until it is found, and then checking for the rest of the string, but that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #4
It's quite surprising you would have to browse the whole file to find out
this information.

I would check the file format specification, it's likely you can quickly
locate the header and this information. Try for example :
http://www.funducode.com/freec/Filef...3/format3b.htm

Patrice

--

"Skwerl" <ju******@anoth erretarded.com> a écrit dans le message de
news:7E******** *************** ***********@mic rosoft.com...
I'm trying to quickly grab the dimensions out of the headers of JPEG files. I need to look for the hex string FFC0001108 in a file that will be from 1-40 megs in size. If someone has experience doing a similar type of search, what is the most efficient method you have found for doing this? I can see doing this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #5
Yes Patrice, you are correct in that I should not need to scan the entire
file. The problem is that I don't always know that a .jpg image file will
actually BE an JPEG. The other problem is that the various internal
thumbnails in the file that Photoshop, ACDSee etc. make come before the
actual image and are marked the same way since they are in effect full JPEG
files inside the JPEG. I haven't found a really quick method to hop right to
the actual image and skip all the thumbnails yet. Hopping through the file
using the length bytes might be the best way. Thanks!

"Patrice" wrote:
It's quite surprising you would have to browse the whole file to find out
this information.

I would check the file format specification, it's likely you can quickly
locate the header and this information. Try for example :
http://www.funducode.com/freec/Filef...3/format3b.htm

Patrice

--

"Skwerl" <ju******@anoth erretarded.com> a écrit dans le message de
news:7E******** *************** ***********@mic rosoft.com...
I'm trying to quickly grab the dimensions out of the headers of JPEG

files.
I need to look for the hex string FFC0001108 in a file that will be from

1-40
megs in size. If someone has experience doing a similar type of search,

what
is the most efficient method you have found for doing this? I can see

doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string,

but
that sounds a bit slow. Thanks in advance for suggestions!

Josh


Jul 21 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
10043
by: Betty Hickman | last post by:
I'm having trouble writing a 2D array to a binary file. Here's what I have: FILE *outfile; short **clump_class; clump_class = malloc(nrows * sizeof(short *)); for (i=0; i<nrows; ++i) clump_class = calloc (ncolumns, sizeof(short *));
2
1263
by: Paul L | last post by:
Hi, what's the correct way to redirect to a large binary file? I have a simple database that holds a list of names and URLs. The URLs are of large mp3s (around 80-100mb). I take the name as a query string parameter, find it in the db and redirect to the URL using Response.Redirect(url, true). However people are reporting that the server...
2
1660
by: tim | last post by:
i have an array of bytes which i write to a binary file seems to me the only way is to write one element at a time, which takes *forever is there a way to write the entire array to the binary file without looping something like binarywriter.writeEntireArray(arrayOfBytes would be perfect!
5
4825
by: kids_pro | last post by:
Hi, How does File.Move implmented? I am like to use it a lot but when I come a cross a large file >500 MB my UI is freezed. I think about implement my own fileMove function but I am not sure what is the efficient way to implement it. There are many thing in the System.IO such as BinaryRead, BinaryWrite FileStream etc.
5
304
by: Skwerl | last post by:
I'm trying to quickly grab the dimensions out of the headers of JPEG files. I need to look for the hex string FFC0001108 in a file that will be from 1-40 megs in size. If someone has experience doing a similar type of search, what is the most efficient method you have found for doing this? I can see doing this by comparing the first byte of...
2
4619
by: gauravkhanna | last post by:
Hi All I need some help for the below problem: Scenario We need to send large binary files (audio file of about 10 MB or so) from the client machine (.Net Windows based application, located outside the home network) to the Web Server and then retrieve the file back from the web server to the client.
11
2309
by: Gina_Marano | last post by:
Hey all, I need to validate a large binary file. I just need to read the last 100 or so bytes of the file. Here is what I am currently doing but it seems slow: private bool ValidFile(string afilename) { byte ch;
11
10583
by: rvr | last post by:
Would someone mind showing me how to strip the first byte from a binary file? For some reason I can't figure this out from the binary file editing examples I've read. Thanks. ~rvr
1
1383
by: jim2348 | last post by:
how do i write to a overwrite a specific byte in a binary file? i looked around, but i couldn't figure it out.
0
7698
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7924
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
1
7673
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7970
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6284
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5219
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3640
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1213
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
937
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.