Searching for a small byte array in a large binary file (Quickly!)

Skwerl

I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #1

Subscribe Reply

2511

Jon Skeet [C# MVP]

Skwerl <ju******@anoth erretarded.com> wrote:

I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!

You can do slightly better than that - examine every 5th byte. If it's
one of the bytes in the string, examine the appropriate bytes around it
(so if it's 0xc0, examine the previous one to check whether or not it's
0xff, then the subsequent three to see whether they're 0001108). If the
byte you first examine isn't one of the five you're looking at, you
know the string can't be found in the vicinity of it, so can move to
the position 5 bytes along.

It probably won't save *very* much time, as you'll still do the same
amount of IO, but there'll be less CPU time used examining the memory.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #2

guy

load the file into an array and use Array.BinarySea rch?

hth

guy

"Skwerl" wrote:

I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #3

cody

Binarysearch only works for sorted Arrays.

--
cody

Freeware Tools, Games and Humour
http://www.deutronium.de.vu || http://www.deutronium.tk
"guy" <gu*@discussion s.microsoft.com > schrieb im Newsbeitrag
news:77******** *************** ***********@mic rosoft.com...

load the file into an array and use Array.BinarySea rch?

hth

guy

"Skwerl" wrote:
I'm trying to quickly grab the dimensions out of the headers of JPEG files. I need to look for the hex string FFC0001108 in a file that will be from 1-40 megs in size. If someone has experience doing a similar type of search, what is the most efficient method you have found for doing this? I can see doing this by comparing the first byte of the string to each consecutive byte of the file until it is found, and then checking for the rest of the string, but that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #4

Patrice

It's quite surprising you would have to browse the whole file to find out
this information.

I would check the file format specification, it's likely you can quickly
locate the header and this information. Try for example :
http://www.funducode.com/freec/Filef...3/format3b.htm

Patrice

--

"Skwerl" <ju******@anoth erretarded.com> a écrit dans le message de
news:7E******** *************** ***********@mic rosoft.com...

I'm trying to quickly grab the dimensions out of the headers of JPEG files. I need to look for the hex string FFC0001108 in a file that will be from 1-40 megs in size. If someone has experience doing a similar type of search, what is the most efficient method you have found for doing this? I can see doing this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #5

Skwerl

Yes Patrice, you are correct in that I should not need to scan the entire
file. The problem is that I don't always know that a .jpg image file will
actually BE an JPEG. The other problem is that the various internal
thumbnails in the file that Photoshop, ACDSee etc. make come before the
actual image and are marked the same way since they are in effect full JPEG
files inside the JPEG. I haven't found a really quick method to hop right to
the actual image and skip all the thumbnails yet. Hopping through the file
using the length bytes might be the best way. Thanks!

"Patrice" wrote:

It's quite surprising you would have to browse the whole file to find out
this information.

I would check the file format specification, it's likely you can quickly
locate the header and this information. Try for example :
http://www.funducode.com/freec/Filef...3/format3b.htm

Patrice

--

"Skwerl" <ju******@anoth erretarded.com> a Ã©crit dans le message de
news:7E******** *************** ***********@mic rosoft.com...
I'm trying to quickly grab the dimensions out of the headers of JPEG

files.
I need to look for the hex string FFC0001108 in a file that will be from

1-40
megs in size. If someone has experience doing a similar type of search,

what
is the most efficient method you have found for doing this? I can see

doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string,

but
that sounds a bit slow. Thanks in advance for suggestions!

Josh

Jul 21 '05 #6

Similar topics

10043

write 2D array to binary file

by: Betty Hickman | last post by:

I'm having trouble writing a 2D array to a binary file. Here's what I have: FILE *outfile; short **clump_class; clump_class = malloc(nrows * sizeof(short *)); for (i=0; i<nrows; ++i) clump_class = calloc (ncolumns, sizeof(short *));

C / C++

1263

Redirecting to a large binary file

by: Paul L | last post by:

Hi, what's the correct way to redirect to a large binary file? I have a simple database that holds a list of names and URLs. The URLs are of large mp3s (around 80-100mb). I take the name as a query string parameter, find it in the db and redirect to the URL using Response.Redirect(url, true). However people are reporting that the server...

ASP.NET

1660

writing array to binary file

by: tim | last post by:

i have an array of bytes which i write to a binary file seems to me the only way is to write one element at a time, which takes *forever is there a way to write the entire array to the binary file without looping something like binarywriter.writeEntireArray(arrayOfBytes would be perfect!

Visual Basic .NET

4825

File.Move large binary file

by: kids_pro | last post by:

Hi, How does File.Move implmented? I am like to use it a lot but when I come a cross a large file >500 MB my UI is freezed. I think about implement my own fileMove function but I am not sure what is the efficient way to implement it. There are many thing in the System.IO such as BinaryRead, BinaryWrite FileStream etc.

C# / C Sharp

304

Searching for a small byte array in a large binary file (Quickly!)

by: Skwerl | last post by:

.NET Framework

4619

Transport large binary file from Window client application to Web Server & back

by: gauravkhanna | last post by:

Hi All I need some help for the below problem: Scenario We need to send large binary files (audio file of about 10 MB or so) from the client machine (.Net Windows based application, located outside the home network) to the Web Server and then retrieve the file back from the web server to the client.

.NET Framework

2309

Opening Large Binary file efficiently

by: Gina_Marano | last post by:

Hey all, I need to validate a large binary file. I just need to read the last 100 or so bytes of the file. Here is what I am currently doing but it seems slow: private bool ValidFile(string afilename) { byte ch;

C# / C Sharp

10582

stripping the first byte from a binary file

by: rvr | last post by:

Would someone mind showing me how to strip the first byte from a binary file? For some reason I can't figure this out from the binary file editing examples I've read. Thanks. ~rvr

Python

1383

C# writing to a specific byte in a binary file

by: jim2348 | last post by:

how do i write to a overwrite a specific byte in a binary file? i looked around, but i couldn't figure it out.

.NET Framework

7693

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...

General

7605

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...

Windows Server

7917

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...

C / C++

8118

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...

Online Marketing

6277

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...

Career Advice

5501

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...

Microsoft Access / VBA

5217

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...

C# / C Sharp

3651

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...

Networking - Hardware / Configuration

1207

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP