Hi,
background:
----------
I am the maintainer of a COM C++ dll that is used to read
binary data from a specific file format. I was recently
tasked to speed up the execution speed of this dll. In
analysis I saw that you would very often get fseek()'s
followed by either fwrite()'s or fread()'s. So my
optimisation strategy was to improve file IO by using memory
mapped file IO (using Win32).
The basic access paradigm to these files, in this
implementation, is this:
VARIANT GetData( long recordNumber, long fieldNumber );
void SetData( long recordNumber, long fieldNumber, VARIANT
data );
In other words, the file is basically a binary table storage
mechanism.
I defined the following pure virtual "interface" class that
would be the IO mechanism that the dll would use to have raw
access to the file's bytes. The idea is that I have one
implementation wrapping the stdlib's fopen(), fseek(),
fwrite/read() etc. And another class that would wrap the
memory mapped file io mechanism.
class StorageProvider
{
public:
StorageProvider();
virtual ~StorageProvider();
virtual SIZE_T FileSize() = 0;
virtual ErrorObject CreateFile( std::string FileName ) = 0;
virtual ErrorObject OpenFile( std::string FileName, bool ReadOnly ) =
0;
virtual ErrorObject CloseFile() = 0;
// if Position == -1, it means that the internal
// pointer must be used as the position.
// the internal pointer must make this behave like fread, fwrite and
fseek does
virtual ErrorObject WriteData( const void* Source, SIZE_T Length, long
Position = -1 ) = 0;
virtual ErrorObject ReadData( void* Destination, SIZE_T Length, long
Position = -1 ) = 0;
virtual ErrorObject ChangeSize( SIZE_T Size ) = 0;
virtual bool IsOpen() = 0;
protected:
ErrorObject& mError;
};
As you might see, it is meant to be used as a drop-in
replacement for the stdlib file access calls.
In the dll you have the following 2 lines that determines
whether it will use the stdlib wrapper or the memory mapped
file io wrapper:
//StorageProvider *m_storage = new FileStreamStorage();
StorageProvider *m_storage = new MemoryMappedFile();
Aside from these 2 lines, in the following problem I have
not changed one single other line at all.
I am pretty sure that the individual implementations are
correct. I used another program during development and
testing with which I wrote and read to random positions
within big files. I saw the appropriate improvement in speed
when using the memory mapped file io wrapper over the stdlib
wrapper. In the test below I also get the same data back in
both implementations.
problem:
-------
I have a VB6 appy that calls the COM dll
methods to do some test reading from a file that conforms to
the file format.
If I use the stdlib wrapper (FileStreamStorage) the
application takes a total of ~0.02 seconds on average
(determined using the GetTickCount() call in kernel32.dll).
If I use the memory mapped file io wrapper
(MemoryMappedFile) then THE SAME CODE takes ~0.8 seconds, on
average.
I have a copy of CompuWare devPartner here that says, when I
am using the memory mapped file wrapper implementation, the
time is split between code I have source for (14%) and
system (86%). In the system list the main taker-of-time is a
call to rpcrt4.dll; IclsDMBinaryFile::FieldValue_Get.
What is interesting about this is that in my dll I have
class called clsDMBinaryFile with a method
get_FieldValue(VARIANT index, VARIANT *pVal) that is
exported to COM in the IDL.
I am not clued up enough to even do further analysis. I have
very little experiance in COM so any help or pointers will
be GREATLY appreciated.
Friendly Regards,
Pieter Breed