473,388 Members | 1,492 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

get the actual size of a file

Hi

Generally,
FileInfo fi = new FileInfo(path);
long size = fi.Length;

gets you the length of a file in bytes. However, when copying files, even
while the copy operation is still in progress, the filesize, as indicated in
Windows Explorer or derived with the above two lines of code, will be the
size of the file once the copy operation has completed. Is there a way to
get the actual number of bytes written to the harddisk while a copy
operation is under way?

The reason I'm asking is that I have to copy rather large files and I'm
currently using File.Copy(input, output) to do this. For a progress
indication, I have a thread that gets the size of the output via the
abovementioned code. Once the file has been copied, I append a second
(binary) file, but prior to starting to append, I set the length of the
output file to the total length the output is going to have. So, my progress
indicator has 2 values only and my thread getting the filesize could just as
well not exist.

The only way around this I can imagine is dump File.Copy, create a new file
manually, and copy the binary data from input to output in chunks of a
certain size. Besides the additional complexity, is there any inheritent
performance disadvantage of such a mechanism versus the built-in file copy
mechanism? I'm just guessing here but I assume the size of I/O buffers could
have a noticeable effect on performance.

Regards
Stephan
Nov 17 '05 #1
11 32532
You are absolutely correct. There will be a noticeable effect on
performance.

"Stephan Steiner" <st*****@isuisse.com> wrote in message
news:OA**************@TK2MSFTNGP14.phx.gbl...
Hi

Generally,
FileInfo fi = new FileInfo(path);
long size = fi.Length;

gets you the length of a file in bytes. However, when copying files, even
while the copy operation is still in progress, the filesize, as indicated
in Windows Explorer or derived with the above two lines of code, will be
the size of the file once the copy operation has completed. Is there a way
to get the actual number of bytes written to the harddisk while a copy
operation is under way?

The reason I'm asking is that I have to copy rather large files and I'm
currently using File.Copy(input, output) to do this. For a progress
indication, I have a thread that gets the size of the output via the
abovementioned code. Once the file has been copied, I append a second
(binary) file, but prior to starting to append, I set the length of the
output file to the total length the output is going to have. So, my
progress indicator has 2 values only and my thread getting the filesize
could just as well not exist.

The only way around this I can imagine is dump File.Copy, create a new
file manually, and copy the binary data from input to output in chunks of
a certain size. Besides the additional complexity, is there any inheritent
performance disadvantage of such a mechanism versus the built-in file copy
mechanism? I'm just guessing here but I assume the size of I/O buffers
could have a noticeable effect on performance.

Regards
Stephan

Nov 17 '05 #2
Stephany Young <noone@localhost> wrote:
You are absolutely correct. There will be a noticeable effect on
performance.


Well, there will be a noticeable effect on performance depending on the
buffer size. There needn't be a noticeable effect on performance
between File.Copy and copying chunk-by-chunk if the buffer size is
chosen appropriately.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 17 '05 #3
So what would be a suitable buffer size? I need to make a copy of one file
and append another file to it. The first file can be anywhere from a 100 MB
to 2 GB, whereas the file to be appended will more likely be in the 10 - 100
MB area.
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Stephany Young <noone@localhost> wrote:
You are absolutely correct. There will be a noticeable effect on
performance.


Well, there will be a noticeable effect on performance depending on the
buffer size. There needn't be a noticeable effect on performance
between File.Copy and copying chunk-by-chunk if the buffer size is
chosen appropriately.

Nov 17 '05 #4
I think, depending on the OS (and if file copy is calling the right APIs),
File.Copy can be a huge winner especailly if both the files are not on the
same machine as the machine the copy is exceuted on. (For instance, \\machA
executes File.Copy("\\machB\foo\bar", "\\machC\foo\bar"). This isn't a
common scenerio, but I was under the impression that in certain
configurations one could avoid the bits going through \\machA at all.

m

"Stephan Steiner" <st*****@isuisse.com> wrote in message
news:%2****************@TK2MSFTNGP12.phx.gbl...
So what would be a suitable buffer size? I need to make a copy of one file
and append another file to it. The first file can be anywhere from a 100
MB to 2 GB, whereas the file to be appended will more likely be in the
10 - 100 MB area.
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Stephany Young <noone@localhost> wrote:
You are absolutely correct. There will be a noticeable effect on
performance.


Well, there will be a noticeable effect on performance depending on the
buffer size. There needn't be a noticeable effect on performance
between File.Copy and copying chunk-by-chunk if the buffer size is
chosen appropriately.


Nov 17 '05 #5
Mike <vi********@yahoo.com> wrote:
I think, depending on the OS (and if file copy is calling the right APIs),
File.Copy can be a huge winner especailly if both the files are not on the
same machine as the machine the copy is exceuted on. (For instance, \\machA
executes File.Copy("\\machB\foo\bar", "\\machC\foo\bar"). This isn't a
common scenerio, but I was under the impression that in certain
configurations one could avoid the bits going through \\machA at all.


I'm not sure, to be honest. I think I'd want to see it working before
saying for certain either way :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 17 '05 #6
Stephan Steiner <st*****@isuisse.com> wrote:
So what would be a suitable buffer size? I need to make a copy of one file
and append another file to it. The first file can be anywhere from a 100 MB
to 2 GB, whereas the file to be appended will more likely be in the 10 - 100
MB area.


I suspect with buffers larger than about 64K you end up with
diminishing returns - and if the buffers are large enough to get on the
large object heap, the memory won't be compacted. (It'll be collected
after a long time, but not compacted, as far as I know.) Of course, if
your app just runs and then exits after doing this copy, that isn't an
issue.

I suggest you try experiment with buffer sizes to find out what suits
your app best.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 17 '05 #7

"Stephan Steiner" <st*****@isuisse.com> wrote in message
news:%2****************@TK2MSFTNGP12.phx.gbl...
So what would be a suitable buffer size? I need to make a copy of one file
and append another file to it. The first file can be anywhere from a 100
MB to 2 GB, whereas the file to be appended will more likely be in the
10 - 100 MB area.


All you can do is measure, however, you should keep in mind that ALL file IO
using the Framework IO classes are buffered IO's, that means that
irrespective the buffer size you specify at the API level, the File System
will buffer reads and writes from/to disk in the FS cache. The amount of
bytes buffered depends on the used FS type (NTFS, FAT32, ....) and the usage
pattern (sequential, random, mixed).
So whether you read a byte or 256 KB at a time, the FS will always transfer
a block of data from the disk device to the FS cache. That means that the
transfer rate is theoretically determined by the speed of the physical IO
path, however transferring the data blocks to the FS cache and from the FS
cache further to the IO buffer in your application, means CPU overhead. It's
obvious that the smaller the buffers at the API level the larger the
overhead, buffers below a certain size will saturate the CPU, at which point
the IO rate becomes CPU bound.

So basically we have four determining factors for IO transfer rate:
1. Physical IO system (FS type, disk rotational speed, disk cache size, RAID
level ...)
2. CPU speed and number of...
3. Sequential or Random IO.
4. Buffer size.

I wrote a program to measure the impact of buffer size on the sequential IO
rate and measured the CPU consumption and IO count (logical) and transfer
speed.
Following are the results obtained reading a single large file (10GB) from a
single 10.000RPM, SATA drive (of course your mileage may vary).

blocksize = 16 bytes, cpu = 99,99% speed = 10,44 MB/s, IO = 684444/s
blocksize = 128 bytes, cpu = 78,31% speed = 57,89 MB/s, IO = 474199/s
blocksize = 256 bytes, cpu = 43,42% speed = 56,37 MB/s, IO = 230906/s
blocksize = 2 KB, cpu = 16,98% speed = 57,84 MB/s, IO = 29613/s
blocksize = 4 KB, cpu = 14,63% speed = 56,37 MB/s, IO = 14432/s
blocksize = 8 KB, cpu = 12,76% speed = 56,4 MB/s, IO = 7219/s
blocksize = 16 KB, cpu = 13,23% speed = 57,84 MB/s, IO = 3702/s

What does this tell us:
- The transfer speed is optimal at buffer sizes > 128 bytes, anything
smaller reduces the transfer speed to ~10MB/sec due to CPU saturation.
- Anything larger than 128 bytes doesn't increase the IO rate but reduces
the CPU consumption.
- CPU consumption stabilizes with >4KB buffers (Cache managers overhead). If
you want to further reduce CPU consumption, you will have to perform
unbuffered IO using PInvoke.

Conclusion, anything between 2KB and 8 KB gives you optimal results for both
IO transfer and CPU consumption. Bigger buffers are only a waste of memory,
too small buffers are a waste of CPU resources.
Willy.



Nov 17 '05 #8
TJB replied to:
Conclusion, anything between 2KB and 8 KB gives you optimal results for both IO transfer and CPU consumption. Bigger buffers are only a waste of memory, too small buffers are a waste of CPU resources.


That conclusion is a bit too sweeping given the benchmark. For a 10k
RPM drive, those numbers sound underperforming.

It sounds like your reads are synchronous and we also don't know what
the underlying options that got sent to CreateFile in that case. I
think you can get different results between sequential and random
access caching. Also, it might be interesting to see what happens if
you do a bunch of asynchronous random reads. Doesn't SATA support a
form of command queuing? The drive might be able to do the SCSI like
thing of ordering a batch of reads to conform to where it thinks the
fastest order is.

Nov 17 '05 #9

"stork" <tb******@mightyware.com> wrote in message
news:11*********************@o13g2000cwo.googlegro ups.com...
TJB replied to:
Conclusion, anything between 2KB and 8 KB gives you optimal results

for both
IO transfer and CPU consumption. Bigger buffers are only a waste of

memory,
too small buffers are a waste of CPU resources.


That conclusion is a bit too sweeping given the benchmark. For a 10k
RPM drive, those numbers sound underperforming.

It sounds like your reads are synchronous and we also don't know what
the underlying options that got sent to CreateFile in that case. I
think you can get different results between sequential and random
access caching. Also, it might be interesting to see what happens if
you do a bunch of asynchronous random reads. Doesn't SATA support a
form of command queuing? The drive might be able to do the SCSI like
thing of ordering a batch of reads to conform to where it thinks the
fastest order is.

Note that the transfer rates are Buffer from/to Disk NOT Buffer from/to
Host, what makes you think that the numbers are underperforming?
Note that this wasn't meant as a benchmark, my only purpose was to show the
impact of the buffer sizes on CPU consumption and IO throughput for simple
reads.
Anyway to answer some of your questions;
The reads are synchronous sequential from an non-fragmented single disk
using a buffered Filestream IO .NET API.
fs = new FileStream(fileName,
FileMode.OpenOrCreate,
FileAccess.ReadWrite,
FileShare.None,
blockSize);

No additional options can be specified running v1.1 of the framework.
Running on v2.0 with "Sequentialscan" option results in a 5% increase of the
transfer rate.

Running the same test asynchronously didn't result in a higher throughput
(as expected).

Doing sequential synchronous writes gave aprox. the same figures for the IO
throughput with a smaller CPU overhead compared to the reads.

Wily.




Nov 17 '05 #10
Sorry I've been so quiet these recent days.

Thanks a bunch for all your valuable suggestions. I have been using rather
large buffers that I have reduced considerably now.

Regards
Stephan
Nov 17 '05 #11
"Stephan Steiner" <st*****@isuisse.com> wrote in message news:<u#**************@TK2MSFTNGP14.phx.gbl>...
[...]

Thanks a bunch for all your valuable suggestions. I have been using rather
large buffers that I have reduced considerably now.

Regards
Stephan


During some work on a similar problem I've made some little
benchmarking:

File size approx. 110 MB, 2.4GHz, 1GB RAM, FW 1.1:

Method MM:SS

File.Copy = 1:41
FileStream = 1:17
FS with 10K = 1:24
FS with 1K = 1:46

I guess that less memory will outperform if swapping starts and
smaller chunks will reduce this effect.
Nethertheless, copying with OS alone was about 1:00, which is bit
faster, indeed.
This was only a short test, not a scientific proven method.

-Joerg
www.joerg.krause.net
Nov 17 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Alexander | last post by:
Actual Div Size =============== How can I determin the actaul size of a Div? The style properties of a Div determin the minimum size of the Div Box. But the Size can be larger depending on...
7
by: Andrew | last post by:
Hello, Is it possible for a file position indicator of an input stream to be greater than the size of a file? And if so could this cause fgetc to somehow write to the file? For example is it...
3
by: yxq | last post by:
Hello, I use the com component "shockwave flash object" to play flash file, but how to get the actual size (width*height) of flash file but no control size? thank you!
5
by: IcingDeath via DotNetMonster.com | last post by:
I am building this SQL Server database app in which i can store files. In order to display files I want to have the app show the associated icon for the extension of the file that is in the...
3
by: Linh Luong | last post by:
Hi All, 1. I have been reading and the max size of a tuple is 8K. I have also read that I can it to a larger size in some config file. Where is this file? is it called pg_config.h and is the...
7
by: Simon Wigzell | last post by:
I am putting together a website for a photographer. There will be a section where his clients can select various sizes of picture to order. I would to be able to display the images the actual size...
3
by: nospam | last post by:
I am trying to monitor the process of a file being copied, but I cannot find a function that will return the actual file size, not the total file size, as reported by fileinfo.length. Has anyone...
3
Loftlore
by: Loftlore | last post by:
Confession. This is my first post here. Wish me luck please. I am trying to get a .jpg called from a javascript hiLite to popup rather than to load as a full page. The example can be found at...
1
by: kelleram | last post by:
I have a couple of tablespaces set up in which the actual size appears to be half the size of the allocated size. I have the tablespaces set to a maxsize of None, and increasesize of 5 M as space is...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.