473,387 Members | 1,486 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Reading mailboxes effectively

I'm converting a program I made in Python once to C#, and while I'm at
it, I want to do some performance improvements. The program takes a set
of mailbox files (in readable formats, like Eudora and Thunderbird) and
extracts every message that is to/from a particular penpal (identified
by one or more email addresses). Messages in a mailbox file are
typically separated with lines such as these:

From ???@??? Fri Feb 06 00:26:08 2004
From - Mon Feb 21 22:33:59 2005

I need to make this really fast and effective. Most mailboxes are about
5 MB, but some are 50 MB or more, and I may need to process up to 200
files together. The overall workings would be something like this in
pseudo-code:

for each mailbox
{
while (more messages)
{
message = get next message with a matching email
}
}

It's this "get next message with a matching email" method I'm not sure
how to construct, but I thought that would be the way to do it, since
creating objects of ALL messages in a mailbox would be too heavy.

I never worked much with the FileStream class, or with buffers and the
Seek() method, but I guess that is what I need here. Ideally, I'd like
to avoid keeping these large files in memory.

Please advice me on what methods and strategy to use, to get this to run
as fast as possible.

Many thanks in advance,

Gustaf
Nov 17 '05 #1
1 1471
Gustaf,

I think you are on the right track. You would want to use a FileStream
in this case (maybe even a StreamReader), and process the file in chunks.
It looks like the file is delimited with CRLF (or at least LF). If this is
the case, you could read the file line by line, which is much more effective
than reading it all at once.

You can then parse each line to see if it has the information you
desire, and then properly dispose of the filestream when you make your
determination.

Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Gustaf" <gu*****@algonet.se> wrote in message
news:2q******************************@giganews.com ...
I'm converting a program I made in Python once to C#, and while I'm at it,
I want to do some performance improvements. The program takes a set of
mailbox files (in readable formats, like Eudora and Thunderbird) and
extracts every message that is to/from a particular penpal (identified by
one or more email addresses). Messages in a mailbox file are typically
separated with lines such as these:

From ???@??? Fri Feb 06 00:26:08 2004
From - Mon Feb 21 22:33:59 2005

I need to make this really fast and effective. Most mailboxes are about 5
MB, but some are 50 MB or more, and I may need to process up to 200 files
together. The overall workings would be something like this in
pseudo-code:

for each mailbox
{
while (more messages)
{
message = get next message with a matching email
}
}

It's this "get next message with a matching email" method I'm not sure how
to construct, but I thought that would be the way to do it, since creating
objects of ALL messages in a mailbox would be too heavy.

I never worked much with the FileStream class, or with buffers and the
Seek() method, but I guess that is what I need here. Ideally, I'd like to
avoid keeping these large files in memory.

Please advice me on what methods and strategy to use, to get this to run
as fast as possible.

Many thanks in advance,

Gustaf

Nov 17 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Jed Parsons | last post by:
What headers to I have to know about to build thread trees from Unix mailboxes? Is it enough to get the In-Reply-To header for each message and build a dictionary of { Message-ID: message }...
3
by: Laszlo Zsolt Nagy | last post by:
Hi All, I need to create a daemon that sits on a server and forwards some e-mails. (Well not only that, it needs to change header information before forwarding and also insert messages into a...
4
by: Ron Vecchi | last post by:
I a runnning w2k3 pop3 mail server that came with iis6. I would like to write an application that progammtically creates the new mailboxes in an already established mail domain. Does anyone know...
1
by: CodeSeeker | last post by:
I have an application, which uses pop3 to read the messages from the mailbox, and it has been working fine for so many year. We recently have started changing this application to use java mail IMAP 4...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.