473,569 Members | 2,782 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Searching for a small byte array in a large binary file (Quickly!)

I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!

Josh
Jul 21 '05 #1
5 2511
Skwerl <ju******@anoth erretarded.com> wrote:
I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!


You can do slightly better than that - examine every 5th byte. If it's
one of the bytes in the string, examine the appropriate bytes around it
(so if it's 0xc0, examine the previous one to check whether or not it's
0xff, then the subsequent three to see whether they're 0001108). If the
byte you first examine isn't one of the five you're looking at, you
know the string can't be found in the vicinity of it, so can move to
the position 5 bytes along.

It probably won't save *very* much time, as you'll still do the same
amount of IO, but there'll be less CPU time used examining the memory.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #2
guy
load the file into an array and use Array.BinarySea rch?

hth

guy

"Skwerl" wrote:
I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #3
Binarysearch only works for sorted Arrays.

--
cody

Freeware Tools, Games and Humour
http://www.deutronium.de.vu || http://www.deutronium.tk
"guy" <gu*@discussion s.microsoft.com > schrieb im Newsbeitrag
news:77******** *************** ***********@mic rosoft.com...
load the file into an array and use Array.BinarySea rch?

hth

guy

"Skwerl" wrote:
I'm trying to quickly grab the dimensions out of the headers of JPEG files. I need to look for the hex string FFC0001108 in a file that will be from 1-40 megs in size. If someone has experience doing a similar type of search, what is the most efficient method you have found for doing this? I can see doing this by comparing the first byte of the string to each consecutive byte of the file until it is found, and then checking for the rest of the string, but that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #4
It's quite surprising you would have to browse the whole file to find out
this information.

I would check the file format specification, it's likely you can quickly
locate the header and this information. Try for example :
http://www.funducode.com/freec/Filef...3/format3b.htm

Patrice

--

"Skwerl" <ju******@anoth erretarded.com> a écrit dans le message de
news:7E******** *************** ***********@mic rosoft.com...
I'm trying to quickly grab the dimensions out of the headers of JPEG files. I need to look for the hex string FFC0001108 in a file that will be from 1-40 megs in size. If someone has experience doing a similar type of search, what is the most efficient method you have found for doing this? I can see doing this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #5
Yes Patrice, you are correct in that I should not need to scan the entire
file. The problem is that I don't always know that a .jpg image file will
actually BE an JPEG. The other problem is that the various internal
thumbnails in the file that Photoshop, ACDSee etc. make come before the
actual image and are marked the same way since they are in effect full JPEG
files inside the JPEG. I haven't found a really quick method to hop right to
the actual image and skip all the thumbnails yet. Hopping through the file
using the length bytes might be the best way. Thanks!

"Patrice" wrote:
It's quite surprising you would have to browse the whole file to find out
this information.

I would check the file format specification, it's likely you can quickly
locate the header and this information. Try for example :
http://www.funducode.com/freec/Filef...3/format3b.htm

Patrice

--

"Skwerl" <ju******@anoth erretarded.com> a écrit dans le message de
news:7E******** *************** ***********@mic rosoft.com...
I'm trying to quickly grab the dimensions out of the headers of JPEG

files.
I need to look for the hex string FFC0001108 in a file that will be from

1-40
megs in size. If someone has experience doing a similar type of search,

what
is the most efficient method you have found for doing this? I can see

doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string,

but
that sounds a bit slow. Thanks in advance for suggestions!

Josh


Jul 21 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
10043
by: Betty Hickman | last post by:
I'm having trouble writing a 2D array to a binary file. Here's what I have: FILE *outfile; short **clump_class; clump_class = malloc(nrows * sizeof(short *)); for (i=0; i<nrows; ++i) clump_class = calloc (ncolumns, sizeof(short *));
2
1263
by: Paul L | last post by:
Hi, what's the correct way to redirect to a large binary file? I have a simple database that holds a list of names and URLs. The URLs are of large mp3s (around 80-100mb). I take the name as a query string parameter, find it in the db and redirect to the URL using Response.Redirect(url, true). However people are reporting that the server...
2
1660
by: tim | last post by:
i have an array of bytes which i write to a binary file seems to me the only way is to write one element at a time, which takes *forever is there a way to write the entire array to the binary file without looping something like binarywriter.writeEntireArray(arrayOfBytes would be perfect!
5
4825
by: kids_pro | last post by:
Hi, How does File.Move implmented? I am like to use it a lot but when I come a cross a large file >500 MB my UI is freezed. I think about implement my own fileMove function but I am not sure what is the efficient way to implement it. There are many thing in the System.IO such as BinaryRead, BinaryWrite FileStream etc.
5
304
by: Skwerl | last post by:
I'm trying to quickly grab the dimensions out of the headers of JPEG files. I need to look for the hex string FFC0001108 in a file that will be from 1-40 megs in size. If someone has experience doing a similar type of search, what is the most efficient method you have found for doing this? I can see doing this by comparing the first byte of...
2
4619
by: gauravkhanna | last post by:
Hi All I need some help for the below problem: Scenario We need to send large binary files (audio file of about 10 MB or so) from the client machine (.Net Windows based application, located outside the home network) to the Web Server and then retrieve the file back from the web server to the client.
11
2309
by: Gina_Marano | last post by:
Hey all, I need to validate a large binary file. I just need to read the last 100 or so bytes of the file. Here is what I am currently doing but it seems slow: private bool ValidFile(string afilename) { byte ch;
11
10582
by: rvr | last post by:
Would someone mind showing me how to strip the first byte from a binary file? For some reason I can't figure this out from the binary file editing examples I've read. Thanks. ~rvr
1
1383
by: jim2348 | last post by:
how do i write to a overwrite a specific byte in a binary file? i looked around, but i couldn't figure it out.
0
7693
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7605
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7917
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8118
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
6277
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5501
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5217
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3651
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
1207
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.