469,126 Members | 1,258 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,126 developers. It's quick & easy.

get the actual size of a file

Hi

Generally,
FileInfo fi = new FileInfo(path);
long size = fi.Length;

gets you the length of a file in bytes. However, when copying files, even
while the copy operation is still in progress, the filesize, as indicated in
Windows Explorer or derived with the above two lines of code, will be the
size of the file once the copy operation has completed. Is there a way to
get the actual number of bytes written to the harddisk while a copy
operation is under way?

The reason I'm asking is that I have to copy rather large files and I'm
currently using File.Copy(input, output) to do this. For a progress
indication, I have a thread that gets the size of the output via the
abovementioned code. Once the file has been copied, I append a second
(binary) file, but prior to starting to append, I set the length of the
output file to the total length the output is going to have. So, my progress
indicator has 2 values only and my thread getting the filesize could just as
well not exist.

The only way around this I can imagine is dump File.Copy, create a new file
manually, and copy the binary data from input to output in chunks of a
certain size. Besides the additional complexity, is there any inheritent
performance disadvantage of such a mechanism versus the built-in file copy
mechanism? I'm just guessing here but I assume the size of I/O buffers could
have a noticeable effect on performance.

Regards
Stephan
Nov 17 '05 #1
11 31964
You are absolutely correct. There will be a noticeable effect on
performance.

"Stephan Steiner" <st*****@isuisse.com> wrote in message
news:OA**************@TK2MSFTNGP14.phx.gbl...
Hi

Generally,
FileInfo fi = new FileInfo(path);
long size = fi.Length;

gets you the length of a file in bytes. However, when copying files, even
while the copy operation is still in progress, the filesize, as indicated
in Windows Explorer or derived with the above two lines of code, will be
the size of the file once the copy operation has completed. Is there a way
to get the actual number of bytes written to the harddisk while a copy
operation is under way?

The reason I'm asking is that I have to copy rather large files and I'm
currently using File.Copy(input, output) to do this. For a progress
indication, I have a thread that gets the size of the output via the
abovementioned code. Once the file has been copied, I append a second
(binary) file, but prior to starting to append, I set the length of the
output file to the total length the output is going to have. So, my
progress indicator has 2 values only and my thread getting the filesize
could just as well not exist.

The only way around this I can imagine is dump File.Copy, create a new
file manually, and copy the binary data from input to output in chunks of
a certain size. Besides the additional complexity, is there any inheritent
performance disadvantage of such a mechanism versus the built-in file copy
mechanism? I'm just guessing here but I assume the size of I/O buffers
could have a noticeable effect on performance.

Regards
Stephan

Nov 17 '05 #2
Stephany Young <noone@localhost> wrote:
You are absolutely correct. There will be a noticeable effect on
performance.


Well, there will be a noticeable effect on performance depending on the
buffer size. There needn't be a noticeable effect on performance
between File.Copy and copying chunk-by-chunk if the buffer size is
chosen appropriately.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 17 '05 #3
So what would be a suitable buffer size? I need to make a copy of one file
and append another file to it. The first file can be anywhere from a 100 MB
to 2 GB, whereas the file to be appended will more likely be in the 10 - 100
MB area.
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Stephany Young <noone@localhost> wrote:
You are absolutely correct. There will be a noticeable effect on
performance.


Well, there will be a noticeable effect on performance depending on the
buffer size. There needn't be a noticeable effect on performance
between File.Copy and copying chunk-by-chunk if the buffer size is
chosen appropriately.

Nov 17 '05 #4
I think, depending on the OS (and if file copy is calling the right APIs),
File.Copy can be a huge winner especailly if both the files are not on the
same machine as the machine the copy is exceuted on. (For instance, \\machA
executes File.Copy("\\machB\foo\bar", "\\machC\foo\bar"). This isn't a
common scenerio, but I was under the impression that in certain
configurations one could avoid the bits going through \\machA at all.

m

"Stephan Steiner" <st*****@isuisse.com> wrote in message
news:%2****************@TK2MSFTNGP12.phx.gbl...
So what would be a suitable buffer size? I need to make a copy of one file
and append another file to it. The first file can be anywhere from a 100
MB to 2 GB, whereas the file to be appended will more likely be in the
10 - 100 MB area.
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Stephany Young <noone@localhost> wrote:
You are absolutely correct. There will be a noticeable effect on
performance.


Well, there will be a noticeable effect on performance depending on the
buffer size. There needn't be a noticeable effect on performance
between File.Copy and copying chunk-by-chunk if the buffer size is
chosen appropriately.


Nov 17 '05 #5
Mike <vi********@yahoo.com> wrote:
I think, depending on the OS (and if file copy is calling the right APIs),
File.Copy can be a huge winner especailly if both the files are not on the
same machine as the machine the copy is exceuted on. (For instance, \\machA
executes File.Copy("\\machB\foo\bar", "\\machC\foo\bar"). This isn't a
common scenerio, but I was under the impression that in certain
configurations one could avoid the bits going through \\machA at all.


I'm not sure, to be honest. I think I'd want to see it working before
saying for certain either way :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 17 '05 #6
Stephan Steiner <st*****@isuisse.com> wrote:
So what would be a suitable buffer size? I need to make a copy of one file
and append another file to it. The first file can be anywhere from a 100 MB
to 2 GB, whereas the file to be appended will more likely be in the 10 - 100
MB area.


I suspect with buffers larger than about 64K you end up with
diminishing returns - and if the buffers are large enough to get on the
large object heap, the memory won't be compacted. (It'll be collected
after a long time, but not compacted, as far as I know.) Of course, if
your app just runs and then exits after doing this copy, that isn't an
issue.

I suggest you try experiment with buffer sizes to find out what suits
your app best.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 17 '05 #7

"Stephan Steiner" <st*****@isuisse.com> wrote in message
news:%2****************@TK2MSFTNGP12.phx.gbl...
So what would be a suitable buffer size? I need to make a copy of one file
and append another file to it. The first file can be anywhere from a 100
MB to 2 GB, whereas the file to be appended will more likely be in the
10 - 100 MB area.


All you can do is measure, however, you should keep in mind that ALL file IO
using the Framework IO classes are buffered IO's, that means that
irrespective the buffer size you specify at the API level, the File System
will buffer reads and writes from/to disk in the FS cache. The amount of
bytes buffered depends on the used FS type (NTFS, FAT32, ....) and the usage
pattern (sequential, random, mixed).
So whether you read a byte or 256 KB at a time, the FS will always transfer
a block of data from the disk device to the FS cache. That means that the
transfer rate is theoretically determined by the speed of the physical IO
path, however transferring the data blocks to the FS cache and from the FS
cache further to the IO buffer in your application, means CPU overhead. It's
obvious that the smaller the buffers at the API level the larger the
overhead, buffers below a certain size will saturate the CPU, at which point
the IO rate becomes CPU bound.

So basically we have four determining factors for IO transfer rate:
1. Physical IO system (FS type, disk rotational speed, disk cache size, RAID
level ...)
2. CPU speed and number of...
3. Sequential or Random IO.
4. Buffer size.

I wrote a program to measure the impact of buffer size on the sequential IO
rate and measured the CPU consumption and IO count (logical) and transfer
speed.
Following are the results obtained reading a single large file (10GB) from a
single 10.000RPM, SATA drive (of course your mileage may vary).

blocksize = 16 bytes, cpu = 99,99% speed = 10,44 MB/s, IO = 684444/s
blocksize = 128 bytes, cpu = 78,31% speed = 57,89 MB/s, IO = 474199/s
blocksize = 256 bytes, cpu = 43,42% speed = 56,37 MB/s, IO = 230906/s
blocksize = 2 KB, cpu = 16,98% speed = 57,84 MB/s, IO = 29613/s
blocksize = 4 KB, cpu = 14,63% speed = 56,37 MB/s, IO = 14432/s
blocksize = 8 KB, cpu = 12,76% speed = 56,4 MB/s, IO = 7219/s
blocksize = 16 KB, cpu = 13,23% speed = 57,84 MB/s, IO = 3702/s

What does this tell us:
- The transfer speed is optimal at buffer sizes > 128 bytes, anything
smaller reduces the transfer speed to ~10MB/sec due to CPU saturation.
- Anything larger than 128 bytes doesn't increase the IO rate but reduces
the CPU consumption.
- CPU consumption stabilizes with >4KB buffers (Cache managers overhead). If
you want to further reduce CPU consumption, you will have to perform
unbuffered IO using PInvoke.

Conclusion, anything between 2KB and 8 KB gives you optimal results for both
IO transfer and CPU consumption. Bigger buffers are only a waste of memory,
too small buffers are a waste of CPU resources.
Willy.



Nov 17 '05 #8
TJB replied to:
Conclusion, anything between 2KB and 8 KB gives you optimal results for both IO transfer and CPU consumption. Bigger buffers are only a waste of memory, too small buffers are a waste of CPU resources.


That conclusion is a bit too sweeping given the benchmark. For a 10k
RPM drive, those numbers sound underperforming.

It sounds like your reads are synchronous and we also don't know what
the underlying options that got sent to CreateFile in that case. I
think you can get different results between sequential and random
access caching. Also, it might be interesting to see what happens if
you do a bunch of asynchronous random reads. Doesn't SATA support a
form of command queuing? The drive might be able to do the SCSI like
thing of ordering a batch of reads to conform to where it thinks the
fastest order is.

Nov 17 '05 #9

"stork" <tb******@mightyware.com> wrote in message
news:11*********************@o13g2000cwo.googlegro ups.com...
TJB replied to:
Conclusion, anything between 2KB and 8 KB gives you optimal results

for both
IO transfer and CPU consumption. Bigger buffers are only a waste of

memory,
too small buffers are a waste of CPU resources.


That conclusion is a bit too sweeping given the benchmark. For a 10k
RPM drive, those numbers sound underperforming.

It sounds like your reads are synchronous and we also don't know what
the underlying options that got sent to CreateFile in that case. I
think you can get different results between sequential and random
access caching. Also, it might be interesting to see what happens if
you do a bunch of asynchronous random reads. Doesn't SATA support a
form of command queuing? The drive might be able to do the SCSI like
thing of ordering a batch of reads to conform to where it thinks the
fastest order is.

Note that the transfer rates are Buffer from/to Disk NOT Buffer from/to
Host, what makes you think that the numbers are underperforming?
Note that this wasn't meant as a benchmark, my only purpose was to show the
impact of the buffer sizes on CPU consumption and IO throughput for simple
reads.
Anyway to answer some of your questions;
The reads are synchronous sequential from an non-fragmented single disk
using a buffered Filestream IO .NET API.
fs = new FileStream(fileName,
FileMode.OpenOrCreate,
FileAccess.ReadWrite,
FileShare.None,
blockSize);

No additional options can be specified running v1.1 of the framework.
Running on v2.0 with "Sequentialscan" option results in a 5% increase of the
transfer rate.

Running the same test asynchronously didn't result in a higher throughput
(as expected).

Doing sequential synchronous writes gave aprox. the same figures for the IO
throughput with a smaller CPU overhead compared to the reads.

Wily.




Nov 17 '05 #10
Sorry I've been so quiet these recent days.

Thanks a bunch for all your valuable suggestions. I have been using rather
large buffers that I have reduced considerably now.

Regards
Stephan
Nov 17 '05 #11
"Stephan Steiner" <st*****@isuisse.com> wrote in message news:<u#**************@TK2MSFTNGP14.phx.gbl>...
[...]

Thanks a bunch for all your valuable suggestions. I have been using rather
large buffers that I have reduced considerably now.

Regards
Stephan


During some work on a similar problem I've made some little
benchmarking:

File size approx. 110 MB, 2.4GHz, 1GB RAM, FW 1.1:

Method MM:SS

File.Copy = 1:41
FileStream = 1:17
FS with 10K = 1:24
FS with 1K = 1:46

I guess that less memory will outperform if swapping starts and
smaller chunks will reduce this effect.
Nethertheless, copying with OS alone was about 1:00, which is bit
faster, indeed.
This was only a short test, not a scientific proven method.

-Joerg
www.joerg.krause.net
Nov 17 '05 #12

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Alexander | last post: by
5 posts views Thread by IcingDeath via DotNetMonster.com | last post: by
3 posts views Thread by Linh Luong | last post: by
7 posts views Thread by Simon Wigzell | last post: by
3 posts views Thread by nospam | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by Mortomer39 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.