Hello everyone,
I am writing a utility. Part of its function is to do a block-mode copy of
files and generate MD5 / SHA1 hashes on the fly. The functionality is
similar to that of the Unix DD / DCFLDD utilities. I am using API calls to
get handles to files and FileStreams to perform the copy block by block,
hashing on the way. The relevant portion of the code is below. I am using
byte by byte transfer to avoid copying the file and then re-openning it to
hash it. That's double the work and results in poor performance.
I realize that hashing the the files as they are being copied will generate
overhead. I am comparing performance to utilities like robocopy, xcopy,
plain windows copy, and dcfldd. I am pretty much on par with dcfldd (on
windows), sometimes better. My performance is about 90% to 95% that of DOS
utilities like robocopy and xcopy. Windows copy has too much overhead and
is much slower. Dcfldd on windows is not very fast and is not a good
reference point. I would like to get closer to 98% / 99% performance of
robocopy and xcopy.
I have several questions.
1. Can anyone recommend a good list of block sizes to use for various
environments? For example, what should be the block size for HDD to HDD
copying, LAN to LAN, within the same HDD, LAN to HDD, slow connections, etc?
I tried setting the block size to 512 bytes when copying from HDD to HDD to
match the NTFS cluster size, but that resulted in much worse transfer speed
then a block size of 32K. I experimented with setting block size to match
default LAN MTU size - minus packet headers sizes, but that resulted in poor
performance as well. I am just not how to determine good block size other
then trial and error.
2. Is my code below optimized? Am I wasting any CPU cycles?
3. Is there a better way of doing this? I would not call myself an
experienced programmer. I would appreciate any criticism.
Thank you in advance!
'get MAC times of the sourcefile
If GetFileTime(hFlHandle, dtCreated, dtAccessed, dtModified) = False Then
Logger.writeLN("Copy Error: " & APIErrorMessage(GetLastError))
Throw New Exception("Unable to get MAC times from sourcefile")
End If
While SourceStream.Position <SourceStream.Length ' write until EOF
'clear the buffer block
ReDim transferBlock(iBlockSize - 1)
If SourceStream.Length - SourceStream.Position CLng(iBlockSize) Then
'read a block of data
iBytesRead = SourceStream.Read(transferBlock, 0,
transferBlock.Length)
'hash the block
objMD5.TransformBlock(transferBlock, 0, transferBlock.Length,
transferBlock, 0)
'write to destination file
DestStream.Write(transferBlock, 0, transferBlock.Length)
Else
'read a block of data
ReDim transferBlock(SourceStream.Length - SourceStream.Position - 1)
iBytesRead = SourceStream.Read(transferBlock, 0,
transferBlock.Length)
'hash final block
objMD5.TransformFinalBlock(transferBlock, 0, transferBlock.Length)
'write to destination file
DestStream.Write(transferBlock, 0, transferBlock.Length)
End If
iTotalBytesRead += iBytesRead
iCurrentFileCopied += iBytesRead
End While
'set MAC times for the destination file
If SetFileTime(hDestHandle, dtCreated, dtAccessed, dtModified) = False Then
Logger.writeLN("Copy Error: " & APIErrorMessage(GetLastError))
Throw New Exception("Unable to set MAC times for destination file")
End If