Opening Large Binary file efficiently

Gina_Marano

Hey all,

I need to validate a large binary file. I just need to read the last
100 or so bytes of the file.

Here is what I am currently doing but it seems slow:

private bool ValidFile(string afilename)
{
byte ch;
bool bGraybar = true;
FileStream fs = null;
BinaryReader reader = null;

try
{
fs = File.Open(afilename, FileMode.Open);
reader = new BinaryReader(fs);

fs.Seek(-101, SeekOrigin.End);

for (int i = 0; i <= 100; i++)
{
ch = reader.ReadByte();
if (ch != 0)
{
bGraybar = false;
break;
}
}

if (bGraybar)
return false;
else
return true;
}
finally
{
if (reader != null)
reader.Close();
if (fs != null)
{
fs.Close();
fs.Dispose();
}
}
}

~Gina~

Nov 30 '06 #1

Subscribe Post Reply

2295

Gina_Marano

Hey guys,

Actually, it is slow because of my network.

Can I get a code review anyhow?

~Gina~

Nov 30 '06 #2

Peter Duniho

"Gina_Marano" <gi*******@gmail.comwrote in message
news:11*********************@j44g2000cwa.googlegro ups.com...

Hey guys,

Actually, it is slow because of my network.

Can I get a code review anyhow?

Well, the other reason it's slow is that you read one byte at a time. If
you know you want to read 100 bytes, then read 100 bytes with a single call
to ReadBytes and then process the data from the byte array directly.

Of course, waiting on the network will slow you down too, but no reason to
make things worse than necessary.

As far as the rest of the code goes, I don't think you need to call Dispose
on the filestream, but otherwise I don't see anything obvious that I'd
change. It's not clear to me why you read the last 101 bytes instead of
100, but you did say "100 or so bytes", so I guess there's probably nothing
wrong with that. :)

Pete

Nov 30 '06 #3

For safety sake you should probably check that the file length is >= the
number of bytes you want to read.
"Peter Duniho" wrote:

"Gina_Marano" <gi*******@gmail.comwrote in message
news:11*********************@j44g2000cwa.googlegro ups.com...
Hey guys,

Actually, it is slow because of my network.

Can I get a code review anyhow?

Well, the other reason it's slow is that you read one byte at a time. If
you know you want to read 100 bytes, then read 100 bytes with a single call
to ReadBytes and then process the data from the byte array directly.

Of course, waiting on the network will slow you down too, but no reason to
make things worse than necessary.

As far as the rest of the code goes, I don't think you need to call Dispose
on the filestream, but otherwise I don't see anything obvious that I'd
change. It's not clear to me why you read the last 101 bytes instead of
100, but you did say "100 or so bytes", so I guess there's probably nothing
wrong with that. :)

Pete

Dec 1 '06 #4

Peter Duniho

"KH" <KH@discussions.microsoft.comwrote in message
news:9F**********************************@microsof t.com...

For safety sake you should probably check that the file length is >= the
number of bytes you want to read.

Perhaps. Though, presumably the files the person is talking about are
assured of being large enough if valid, and an exception will be thrown (and
handled) by the code if they are not valid (in that way, or perhaps other
ways, such as being locked for reads). Such a check may be superfluous.

Dec 1 '06 #5

Kev

Relying on exceptions to be thrown is sloppy coding in my mind - and not
what they are intended for.

If you can do a simple check to prevent an exception being raised then do
it. Exceptions trigger the CPU interrupt line and this causes the CPU to
stop what it is doing, store current info to the stack, handle the
exception, then reload data back off the stack and continue what it was
doing (ok, that was a really rough description - don't quote me exactly).
Why halt the CPU when you can do a simple check that does not have this side
effect?

I am not saying do not use try catch - you should use it often, just use it
to handle error situations you can't necessarily check for before the
operation in question.

"Presumably", and "assured of" are not terms I associate with good reliable
software design.

Cheers

"Peter Duniho" <Np*********@NnOwSlPiAnMk.comwrote in message
news:12*************@corp.supernews.com...

"KH" <KH@discussions.microsoft.comwrote in message
news:9F**********************************@microsof t.com...
>For safety sake you should probably check that the file length is >= the
number of bytes you want to read.

Perhaps. Though, presumably the files the person is talking about are
assured of being large enough if valid, and an exception will be thrown
(and handled) by the code if they are not valid (in that way, or perhaps
other ways, such as being locked for reads). Such a check may be
superfluous.

Dec 1 '06 #6

Peter Duniho

"Kev" <sp************@nochanceinhelloffindingmehere.comw rote in message
news:uP**************@TK2MSFTNGP06.phx.gbl...

Relying on exceptions to be thrown is sloppy coding in my mind - and not
what they are intended for.

IMHO, you are making too much of this. But since you brought it up, let's
look at your comments...

If you can do a simple check to prevent an exception being raised then do
it.

And do what? The OP's code already has a try/finally. There are a wide
variety of things that could go wrong in just the few lines of code he has,
especially since he's reading the file over a network. How is it better to
add an extra check, just to avoid having the Seek throw an exception, when
all he's likely to do is fall out of the code to the finally anyway.

Why not check for all the other things that could cause an exception?

Exceptions trigger the CPU interrupt line and this causes the CPU to stop
what it is doing, store current info to the stack, handle the exception,
then reload data back off the stack and continue what it was doing (ok,
that was a really rough description - don't quote me exactly). Why halt
the CPU when you can do a simple check that does not have this side
effect?

Because exception handling is for exceptional situations. Not that I really
agree with your characterization of "halting the CPU" anyway, but why would
you slow down the common case, just to save some time in the exceptional
case?

In fact, that's one of the nice things about exception handling. You can
write all of the code as if everything will work fine, not wasting code or
time on expensive checks like retrieving the file length and comparing it to
the minimum required length. After all, the code underlying Seek is going
to have to make that check anyway.

So, you're suggesting that we write the code in a way that forces the exact
same check to happen twice each time through the code, just so that in the
rare case when an exception happens, the exception can be handled more
quickly?

I am not saying do not use try catch - you should use it often, just use
it to handle error situations you can't necessarily check for before the
operation in question.

Well, I simply disagree. IMHO, it's a waste of time checking things that
the code you're calling is going to have to check anyway, especially if your
handling of a failure of the check is identical to how you'd handle an
exception.

"Presumably", and "assured of" are not terms I associate with good
reliable software design.

I used those terms because the code is not mine, and I don't have the full
information regarding the situation in which the code will be used (or even
of other code related to the problem). It makes no sense for you to assume
I'm using those terms as a programming concept, when in fact my use of those
terms has to do with my relationship (or rather, lack thereof) with the OP
and his code.

(Not that I think "assured of" is in any way a negative thing to consider
with respect to code anyway...seems to be, being "assured of" something is a
*good* thing. As in, "I am assured that the compiler will generate the
correct output given my source code").

Pete

Dec 1 '06 #7

Peter Duniho

"Peter Duniho" <Np*********@NnOwSlPiAnMk.comwrote in message
news:12*************@corp.supernews.com...

>[...] Why halt the CPU when you can do a simple check that does not have
this side effect?

Because exception handling is for exceptional situations. Not that I
really agree with your characterization of "halting the CPU" anyway, but
why would you slow down the common case, just to save some time in the
exceptional case?

And, by the way, I'll point out that while branch prediction in CPUs is a
fairly mature technology, branches can still be mispredicted. Exception
handling allows you to take branches out of the code entirely, ensuring that
your execution pipeline won't get flushed in the common case (well, not any
more often than is strictly necessary, anyway).

Dec 1 '06 #8

Willy Denoyette [MVP]

"Gina_Marano" <gi*******@gmail.comwrote in message
news:11*********************@j44g2000cwa.googlegro ups.com...

Hey guys,

Actually, it is slow because of my network.

Can I get a code review anyhow?

~Gina~

What kind of network are you talking about and what do you call *slow*? The size of the file
doesn't matter at all, reading the last 100 bytes of a giant file over the network must be
as fast as reading a tiny 100 bytes file. Over a 10MB ethernet it should take less than say
100 msec.

Willy.

Dec 1 '06 #9

Gina_Marano

Hey Willy,

I will have to check this out again.

I am running over a VPN. I would have thought it would have been zippy
as well but it isn't 100ms. The files sizes are typically 10-15mb
files.

Since the production environment is all local there is no problem. But
I too thought it should be much faster.

~Gina~
Willy Denoyette [MVP] wrote:

"Gina_Marano" <gi*******@gmail.comwrote in message
news:11*********************@j44g2000cwa.googlegro ups.com...
Hey guys,

Actually, it is slow because of my network.

Can I get a code review anyhow?

~Gina~

What kind of network are you talking about and what do you call *slow*? The size of the file
doesn't matter at all, reading the last 100 bytes of a giant file over the network must be
as fast as reading a tiny 100 bytes file. Over a 10MB ethernet it should take less than say
100 msec.

Willy.

Dec 11 '06 #10

Gina_Marano

Now, now boys. Your making me blush here. No need to fight over
little'ole me. :)

~Gina~

Peter Duniho wrote:

"Peter Duniho" <Np*********@NnOwSlPiAnMk.comwrote in message
news:12*************@corp.supernews.com...

[...] Why halt the CPU when you can do a simple check that does not have
this side effect?
Because exception handling is for exceptional situations. Not that I
really agree with your characterization of "halting the CPU" anyway, but
why would you slow down the common case, just to save some time in the
exceptional case?

And, by the way, I'll point out that while branch prediction in CPUs is a
fairly mature technology, branches can still be mispredicted. Exception
handling allows you to take branches out of the code entirely, ensuring that
your execution pipeline won't get flushed in the common case (well, not any
more often than is strictly necessary, anyway).

Dec 11 '06 #11

Willy Denoyette [MVP]

"Gina_Marano" <gi*******@gmail.comwrote in message
news:11*********************@n67g2000cwd.googlegro ups.com...

Hey Willy,

I will have to check this out again.

I am running over a VPN. I would have thought it would have been zippy
as well but it isn't 100ms. The files sizes are typically 10-15mb
files.

Since the production environment is all local there is no problem. But
I too thought it should be much faster.

Well, while 100msec is something you could expect over a local switched LAN, thing may be
slower over VPN, anyway wat matters is the network latency not the file size.

Willy.

Dec 11 '06 #12

Similar topics

Python for large projects

by: assaf__ | last post by:

Hello, I am beginning to work on a fairly large project and I'm considering to use python for most of the coding, but I need to make sure first that it is reliable enough. I need to make sure...

Python

Signed int -0 ?

by: Christian Stigen Larsen | last post by:

A signed int reserves one bit to signify whether a number is positive or negative. In light of this, a colleague asked me whether there existed an int in C++ that was -0, a zero with the negative...

C / C++

Question about unpacking a binary file: endian troubles

by: David Buchan | last post by:

Hi guys, This may be a dumb question; I'm just getting into C language here. I wrote a program to unpack a binary file and write out the contents to a new file as a list of unsigned integers....

C / C++

Writing Structures To Binary Files

by: phyzics | last post by:

I am porting an application from C++ to C#, and am having trouble finding a way to quickly and efficiently write structures to a binary file. In C++ this is trivial because all that is necessary is...

C# / C Sharp

Binary I/O in Javascript

by: Patient Guy | last post by:

Has anyone written code that successfully manipulates binary file data using Javascript? It might---and in the case of doing I/O, will---make use of browser- specific functions (ActiveX/COM with...

Javascript

Transport large binary file from Window client application to Web Server & back

by: gauravkhanna | last post by:

Hi All I need some help for the below problem: Scenario We need to send large binary files (audio file of about 10 MB or so) from the client machine (.Net Windows based application, located...

.NET Framework

serialized binary files vs Sql Server Performance..

by: Fabuloussites | last post by:

I'm considering deploying an application that will us an IP address locaiton database provided by Ip2location.com... http://www.ip2location.net/ip2location-dotnet-component.aspx their .net...

ASP.NET

Efficient techniques to handle large binary files

by: pedagani | last post by:

Dear comp.lang.c++, I'm interested in knowing the general techniques used to handle large binary files (>10GB) efficiently such as tweaking with filebuf , etc. Reading chunk by chunk seems to be...

C / C++

Misleading error message when opening a file (on Windows XP SP 2)

by: Claudio Grondi | last post by:

Here an example of what I mean (Python 2.4.2, IDLE 1.1.2, Windows XP SP2, NTFS file system, 80 GByte large file): Traceback (most recent call last): File "<pyshell#1>", line 1, in -toplevel-...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General