IO optimization when copying bytes from one file to another

Skwerl

Hi guys. I've written code to embed an ICC profile in a TIFF image, and I
think my IO operations are slowing things down. It is taking about a second
to embed each tag in 7-meg TIFF files. Doesn't sound too bad until you try
doing it to 500 files. Basically what I am doing is this:

1. Read header from image, add 12-byte tag to header, and write to a new
TIFF file
2. Update all the offset pointers in other tags in TIFF header to reflect
the change that adding the 12-bytes made and write them to the new TIFF file
3. Copy the rest of the original TIFF to the new one.
4. Append ICC profile (around 100K) to new TIFF file.
5. Delete original TIFF and rename new one to the name of the original

I believe the roblem to be in the method I am using the copy the data from
the original file to the new one. I have pasted it below. Any suggestions
on how to squeeze some more speed out, either in my main algorithm or the
following function? Thanks a bunch!

Josh

private void copyBytes(FileS tream source, FileStream destination, long
fromIndex, long length)
{
const int chunkSize = 1024;
long currentIndex = fromIndex;
long endIndex = fromIndex + length;
long bytesToCopy = 0;
byte[] bytes = new byte[chunkSize];
byte[] endLump;
byte[] twoBytes = new byte[2];

source.Seek(fro mIndex, SeekOrigin.Begi n);
//Copy a chunk at a time
for(bytesToCopy = length; bytesToCopy >= chunkSize; bytesToCopy -=
chunkSize)
{
source.Read(byt es, 0, chunkSize);
destination.Wri te(bytes, 0, chunkSize);
currentIndex += bytes.Length;
}

//Copy the rest now
endLump = new byte[bytesToCopy];
source.Read(end Lump, 0, endLump.Length) ;
destination.Wri te(endLump, 0, endLump.Length) ;
destination.Flu sh();
}

Jul 21 '05 #1

Subscribe Reply

1665

Jon Skeet [C# MVP]

Skwerl <ju******@anoth erretarded.com> wrote:

Hi guys. I've written code to embed an ICC profile in a TIFF image, and I
think my IO operations are slowing things down. It is taking about a second
to embed each tag in 7-meg TIFF files. Doesn't sound too bad until you try
doing it to 500 files. Basically what I am doing is this:

1. Read header from image, add 12-byte tag to header, and write to a new
TIFF file
2. Update all the offset pointers in other tags in TIFF header to reflect
the change that adding the 12-bytes made and write them to the new TIFF file
3. Copy the rest of the original TIFF to the new one.
4. Append ICC profile (around 100K) to new TIFF file.
5. Delete original TIFF and rename new one to the name of the original

I believe the roblem to be in the method I am using the copy the data from
the original file to the new one. I have pasted it below. Any suggestions
on how to squeeze some more speed out, either in my main algorithm or the
following function? Thanks a bunch!

You can make your method simpler and more reliable (by using the return
value of Read) quite easily. I've also increased the chunk size to
possibly speed things up a bit.

const int BufferSize = 32768;

void CopyBytes (Stream source, Stream dest, long fromIndex,
long length)
{
source.Seek(fro mIndex, SeekOrigin.Begi n);
byte[] buffer = new byte[BufferSize];

while (length > 0)
{
int read = source.Read(buf fer, 0,
Math.Min (length, BufferSize));

if (read <= 0)
{
throw new IOException ("Insufficie nt data remaining");
}

dest.Write (buffer, 0, read);
length -= read;
}
}

Assuming you're going to call Close or Dispose on the destination
stream, chances are you don't need to call Flush by the way.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #2

Skwerl

Thanks, Jon. I appreciate the response. I tried your code, and found it was
about 15% slower than what I tried doing last night, which was to just read
it all in and then write all of it out. I'm guessing it was just because of
the small (7-meg) file size. here's my code:
private void copyBytes(FileS tream source, FileStream destination, long
fromIndex, long length)
{
byte[] buffer = new byte[length];
source.Seek(fro mIndex, SeekOrigin.Begi n);
source.Read(buf fer, 0, buffer.Length);
destination.Wri te(buffer, 0, buffer.Length);
}
"Jon Skeet [C# MVP]" wrote:

Skwerl <ju******@anoth erretarded.com> wrote:
Hi guys. I've written code to embed an ICC profile in a TIFF image, and I
think my IO operations are slowing things down. It is taking about a second
to embed each tag in 7-meg TIFF files. Doesn't sound too bad until you try
doing it to 500 files. Basically what I am doing is this:

1. Read header from image, add 12-byte tag to header, and write to a new
TIFF file
2. Update all the offset pointers in other tags in TIFF header to reflect
the change that adding the 12-bytes made and write them to the new TIFF file
3. Copy the rest of the original TIFF to the new one.
4. Append ICC profile (around 100K) to new TIFF file.
5. Delete original TIFF and rename new one to the name of the original

I believe the roblem to be in the method I am using the copy the data from
the original file to the new one. I have pasted it below. Any suggestions
on how to squeeze some more speed out, either in my main algorithm or the
following function? Thanks a bunch!

You can make your method simpler and more reliable (by using the return
value of Read) quite easily. I've also increased the chunk size to
possibly speed things up a bit.

const int BufferSize = 32768;

void CopyBytes (Stream source, Stream dest, long fromIndex,
long length)
{
source.Seek(fro mIndex, SeekOrigin.Begi n);
byte[] buffer = new byte[BufferSize];

while (length > 0)
{
int read = source.Read(buf fer, 0,
Math.Min (length, BufferSize));

if (read <= 0)
{
throw new IOException ("Insufficie nt data remaining");
}

dest.Write (buffer, 0, read);
length -= read;
}
}

Assuming you're going to call Close or Dispose on the destination
stream, chances are you don't need to call Flush by the way.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #3

Jon Skeet [C# MVP]

Skwerl <ju******@anoth erretarded.com> wrote:

Thanks, Jon. I appreciate the response. I tried your code, and found it was
about 15% slower than what I tried doing last night, which was to just read
it all in and then write all of it out. I'm guessing it was just because of
the small (7-meg) file size. here's my code:
private void copyBytes(FileS tream source, FileStream destination, long
fromIndex, long length)
{
byte[] buffer = new byte[length];
source.Seek(fro mIndex, SeekOrigin.Begi n);
source.Read(buf fer, 0, buffer.Length);
destination.Wri te(buffer, 0, buffer.Length);
}

Did you run mine and then run yours? If so, things would be buffered.
You should either flush all buffers before running either of them, or
run both several times.

I'd be surprised if my method was really 15% slower (a little bit, but
not 15%). Of course, I've been known to be surprised before :)

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #4

Skwerl

Actually, I ran mine a few times last night, and it consistently ran
around 730 ms per image in trials of 30 images each. I changed the code
altogether to yours this morning, ran it and got about 860 ms per image on
several trials of the same 30 images. I changed it back to mine as it was
previously, and again ended up with about 730 ms per image. I'm a real
novice when it comes to this sort of operation, and I've always seen these
things done in pieces like with your method, so I'm surprised the simplistic
way I did it is actually a little faster. Just FYI, the system I am running
this on is an XP SP2 with NTFS partitions. Performance aside, is your method
a better way to do this?
Thanks once again,
Josh

"Jon Skeet [C# MVP]" wrote:

Skwerl <ju******@anoth erretarded.com> wrote:
Thanks, Jon. I appreciate the response. I tried your code, and found it was
about 15% slower than what I tried doing last night, which was to just read
it all in and then write all of it out. I'm guessing it was just because of
the small (7-meg) file size. here's my code:
private void copyBytes(FileS tream source, FileStream destination, long
fromIndex, long length)
{
byte[] buffer = new byte[length];
source.Seek(fro mIndex, SeekOrigin.Begi n);
source.Read(buf fer, 0, buffer.Length);
destination.Wri te(buffer, 0, buffer.Length);
}

Did you run mine and then run yours? If so, things would be buffered.
You should either flush all buffers before running either of them, or
run both several times.

I'd be surprised if my method was really 15% slower (a little bit, but
not 15%). Of course, I've been known to be surprised before :)

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #5

Jon Skeet [C# MVP]

Skwerl <ju******@anoth erretarded.com> wrote:

Actually, I ran mine a few times last night, and it consistently ran
around 730 ms per image in trials of 30 images each. I changed the code
altogether to yours this morning, ran it and got about 860 ms per image on
several trials of the same 30 images. I changed it back to mine as it was
previously, and again ended up with about 730 ms per image. I'm a real
novice when it comes to this sort of operation, and I've always seen these
things done in pieces like with your method, so I'm surprised the simplistic
way I did it is actually a little faster. Just FYI, the system I am running
this on is an XP SP2 with NTFS partitions. Performance aside, is your method
a better way to do this?

Yes, in terms of memory consumption. Consider an image which is several
hundred megs in size - with my code, you never need to have more than
32K in memory at a time. With yours, you read the whole thing into
memory, and then write the whole thing out. You're also assuming that
one call to Read will read the whole file, ignoring the return value,
which is never a good idea.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #6

Skwerl

Yes, I am assuming that it will read it all. Yikes, I didn't realize that it
wouldn't always read it all, unless an excepteion were thrown. Under what
conditions would it not read all of the file? Performance is a big issue
here, so I want to try to guage whether or not I need to worry about this.
The code will never need to handle anything but 3-7 meg TIFF files. Thanks
once again, Jon.

Josh
"Jon Skeet [C# MVP]" wrote:

Skwerl <ju******@anoth erretarded.com> wrote:
Actually, I ran mine a few times last night, and it consistently ran
around 730 ms per image in trials of 30 images each. I changed the code
altogether to yours this morning, ran it and got about 860 ms per image on
several trials of the same 30 images. I changed it back to mine as it was
previously, and again ended up with about 730 ms per image. I'm a real
novice when it comes to this sort of operation, and I've always seen these
things done in pieces like with your method, so I'm surprised the simplistic
way I did it is actually a little faster. Just FYI, the system I am running
this on is an XP SP2 with NTFS partitions. Performance aside, is your method
a better way to do this?

Yes, in terms of memory consumption. Consider an image which is several
hundred megs in size - with my code, you never need to have more than
32K in memory at a time. With yours, you read the whole thing into
memory, and then write the whole thing out. You're also assuming that
one call to Read will read the whole file, ignoring the return value,
which is never a good idea.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #7

Jon Skeet [C# MVP]

Skwerl <ju******@anoth erretarded.com> wrote:

Yes, I am assuming that it will read it all. Yikes, I didn't realize that it
wouldn't always read it all, unless an excepteion were thrown. Under what
conditions would it not read all of the file?
I don't know, for sure. I would imagine that some network file systems
might give data in chunks, like NetworkStreams do. It could be that
FileStreams will always read however much you ask for - but that's not
true for streams in general. Basically it's good practice not to ignore
the return value of Read :)
Performance is a big issue
here, so I want to try to guage whether or not I need to worry about this.
The code will never need to handle anything but 3-7 meg TIFF files.

Why not try increasing the buffer size of the code I gave you to, say,
1MB. That way you won't need to change the code if you ever get a huge
file, and you don't need to worry about whether or not FileStream will
always return the whole of the data. The performance difference should
be trivial at that stage.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #8

Skwerl

That sounds perfectly reasonable. Thanks a bunch!

Josh

"Jon Skeet [C# MVP]" wrote:

Skwerl <ju******@anoth erretarded.com> wrote:
Yes, I am assuming that it will read it all. Yikes, I didn't realize that it
wouldn't always read it all, unless an excepteion were thrown. Under what
conditions would it not read all of the file?

I don't know, for sure. I would imagine that some network file systems
might give data in chunks, like NetworkStreams do. It could be that
FileStreams will always read however much you ask for - but that's not
true for streams in general. Basically it's good practice not to ignore
the return value of Read :)
Performance is a big issue
here, so I want to try to guage whether or not I need to worry about this.
The code will never need to handle anything but 3-7 meg TIFF files.

Why not try increasing the buffer size of the code I gave you to, say,
1MB. That way you won't need to change the code if you ever get a huge
file, and you don't need to worry about whether or not FileStream will
always return the whole of the data. The performance difference should
be trivial at that stage.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #9

Similar topics

7415

How can I prevent CPU Usage increasing to 100 percent when copying files from CD?

by: Bryan Rickard | last post by:

I wrote a simple program in VB6 to copy all the files from a directory on a CD-ROM to my hard disk. There are about 10 files, each about 30MB. The program uses Get and Put to get data from the CD into a buffer and then put it into the disk. See code below. It works, but it slows down drastically before it copies all the files. Windows...

Visual Basic 4 / 5 / 6

2336

memory optimization

by: Evangelista Sami | last post by:

hello all i am implementing an application in which a large number of (char *) is stored into a hash table structure. As there may be collisions, each (char *) inserted into my hash table is put into an array. here is the definition of my hash table : typedef struct { short size;

C / C++

2682

Loop Optimization, Array Alignment

by: Rajeev | last post by:

Hello, I'm using gcc 3.4.2 on a Xeon (P4) platform, all kinds of speed optimizations turned on. For the following loop R=(evaluate here); // float N=(evaluate here); // N min=1 max=100 median=66 for (i=0;i<N;i++){ R+=A*B*K; // all variables are float=4 bytes

C / C++

6333

reading large text files in reverse - optimization doubts

by: Rajorshi Biswas | last post by:

Hi folks, Suppose I have a large (1 GB) text file which I want to read in reverse. The number of characters I want to read at a time is insignificant. I'm confused as to how best to do it. Upon browsing through this group and other sources on the web, it seems that there are many ways to do it. Some suggest that simply fseek'ing to 8K bytes...

C / C++

231

IO optimization when copying bytes from one file to another

by: Skwerl | last post by:

.NET Framework

4243

Debug symbols not loading when built on another machine

by: Bardo | last post by:

Hi all, We are trying to debug an assembly which was built in debug configuration on our build server. The output window of visual studio indicates that no symbols are being loaded for the assembly. However, when the PDB file is copied alongside the dll, the symbols load ok. We were a little confused, because we had already been debugging...

.NET Framework

3233

XmlTextReader ReadChars can enter an infinate loop when reading st

by: Rick | last post by:

The following code will enter an infinate loop when in ReadChars. I can only make it happen when reading a Stream and with this particular XML. If I use the ReadInnerXml call rather than my own ReadElementBodyAsXml the code works, but is less efficent. ReadElementBodyAsXml is required by my application with .Net Framework 1.1. The code...

.NET Framework

4965

Copying files with progress bar

by: kimiraikkonen | last post by:

Hi, I use system.io.file class to copy files but i have a difficulty about implementing a basic / XP-like progress bar indicator during copying process. My code is this with no progress bar, or i couldn't find sth which give totalbytes/written bytes class. And does system.io.file class provide awaring of the chunks / bytes of the files...

Visual Basic .NET

10613

Determining when a file has finished copying

by: writeson | last post by:

Hi all, I'm writing some code that monitors a directory for the appearance of files from a workflow. When those files appear I write a command file to a device that tells the device how to process the file. The appearance of the command file triggers the device to grab the original file. My problem is I don't want to write the command file...

Python

8215

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...

C / C++

8347

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...

Online Marketing

8220

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...

General

6626

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...

Career Advice

5718

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...

Microsoft Access / VBA

3844

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...

Networking - Hardware / Configuration

3879

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2358

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

1454

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP