I'm converting a program I made in Python once to C#, and while I'm at
it, I want to do some performance improvements. The program takes a set
of mailbox files (in readable formats, like Eudora and Thunderbird) and
extracts every message that is to/from a particular penpal (identified
by one or more email addresses). Messages in a mailbox file are
typically separated with lines such as these:
From ???@??? Fri Feb 06 00:26:08 2004
From - Mon Feb 21 22:33:59 2005
I need to make this really fast and effective. Most mailboxes are about
5 MB, but some are 50 MB or more, and I may need to process up to 200
files together. The overall workings would be something like this in
pseudo-code:
for each mailbox
{
while (more messages)
{
message = get next message with a matching email
}
}
It's this "get next message with a matching email" method I'm not sure
how to construct, but I thought that would be the way to do it, since
creating objects of ALL messages in a mailbox would be too heavy.
I never worked much with the FileStream class, or with buffers and the
Seek() method, but I guess that is what I need here. Ideally, I'd like
to avoid keeping these large files in memory.
Please advice me on what methods and strategy to use, to get this to run
as fast as possible.
Many thanks in advance,
Gustaf