473,387 Members | 1,379 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Data Storage for news client

Hi all,

I'm writing a news client (mainly to test out CAB & ClickOnce!), and trying
to decide on what to use for the storage of messages etc.. SQL Express seems
like overkill (and is a hefty download for a < 1MB app!). Also, since there
could be thousands of messages (potentially binary), I'm not sure that
serializing my classes to disk would perform at all well.

What would other people use for a small app like this? And why?

Thanks,
Nov 17 '05 #1
6 1454
I would simply store them sequentially in a single file and then create an
index file which has some header information (perhaps subject, author, date,
etc) and an offset to the message's text in the main file. Similar to my
response to the message just a bit earlier under "squeeze few image file
into on binary file"

You could also compress the text prior to storing it in the single file
(using SharpZipLib or 7Zip or something). I suspect it would compress well,
even messages with uuencoded or yenc encoded binaries.

I actually need to integrate a newsreader, at some point, into an app I'm
writing and I suspect this is the direction I'll take.

The advantage of this is that access is quick and it easily accommodates
thousands of messages. If you store the messages in separate files, you'll
soon find your directory getting large and getting to the data in a single
file with an index, using Seek will be much faster than having the file
system find a match for your file name in a directory with thousands of
files.

It's also fairly easy to purge lots of contiguous messages (which is likely
how you'd want to handle purging from a newsreader) from the file. For
example, if you want to delete the first 1000 messages, simply find the
index to the 1001'st message, then copy the data from there to the end to a
new file, delete the original file, and then rename the new one to the name
of the old. Do the same with the index file.

Pete

"Danny Tuppeny" <gr****@dannytuppeny.commmmmm> wrote in message
news:43***********************@ptn-nntp-reader03.plus.net...
Hi all,

I'm writing a news client (mainly to test out CAB & ClickOnce!), and
trying to decide on what to use for the storage of messages etc.. SQL
Express seems like overkill (and is a hefty download for a < 1MB app!).
Also, since there could be thousands of messages (potentially binary), I'm
not sure that serializing my classes to disk would perform at all well.

What would other people use for a small app like this? And why?

Thanks,

Nov 17 '05 #2
"Pete Davis" <pdavis68@[nospam]hotmail.com> wrote in message
news:Zs********************@giganews.com...

Hi Peter,
I would simply store them sequentially in a single file and then create an
index file which has some header information (perhaps subject, author,
date, etc) and an offset to the message's text in the main file. Similar to
my response to the message just a bit earlier under "squeeze few image file
into on binary file"

You could also compress the text prior to storing it in the single file
(using SharpZipLib or 7Zip or something). I suspect it would compress
well, even messages with uuencoded or yenc encoded binaries.

I actually need to integrate a newsreader, at some point, into an app I'm
writing and I suspect this is the direction I'll take.


Interesting response. What about performance though? If the user opens a
folder that has 1,000 messages, either I have to load them all *very*
quickly (I need to display Sender, Subject, Date, etc.), or I fetch them as
the user scrolls (which could be pretty unresponsive if the user is dragging
the scrollbar).

What would you store in the index file? The user will be able to change the
sort order in the display, so unless I maintain a few indexes, it'd be
difficult to get a list in order. The message list will show the Sender,
Date, Subject etc., and so if I have to scan through the data file for
thousands of these things, surely it'll take an age? I've never done this
kind of processing before, so I've no idea of how it would perform. I don't
want to build it and find it's unacceptable, so any experiences anyone can
share would be much appreciated! :)

As for compression - again, without testing it, I wouldn't know - but
although compression would save tons of disk space, wouldn't the overhead of
the compression make is slower than reading more uncompressed data? I assume
compression would be variable, so it'd be difficult to seek within a
compressed stream. Any ideas?

Thanks,

Danny
Nov 17 '05 #3


"Danny Tuppeny" <gr****@dannytuppeny.commmmmm> wrote in message
news:43**********************@ptn-nntp-reader02.plus.net...
"Pete Davis" <pdavis68@[nospam]hotmail.com> wrote in message
news:Zs********************@giganews.com...

Hi Peter,
[snip] Interesting response. What about performance though? If the user opens a
folder that has 1,000 messages, either I have to load them all *very*
quickly (I need to display Sender, Subject, Date, etc.), or I fetch them
as the user scrolls (which could be pretty unresponsive if the user is
dragging the scrollbar).

I suspect it will load much faster than you think.

Assuming in the index you store Sender, subject, date, message ID, offset in
main file, and a few other header items, I suspect you're looking at an
average of roughly 100-200 bytes per message. Let's say 200 bytes, but
that's probably on the high side. That works out to only 200K per thousand
messages or 5000 messages per megabyte. That will load into memory pretty
quickly.
What would you store in the index file? The user will be able to change
the sort order in the display, so unless I maintain a few indexes, it'd be
difficult to get a list in order. The message list will show the Sender,
Date, Subject etc., and so if I have to scan through the data file for
thousands of these things, surely it'll take an age? I've never done this
kind of processing before, so I've no idea of how it would perform. I
don't want to build it and find it's unacceptable, so any experiences
anyone can share would be much appreciated! :)

Well, if they're going to be able to sort them, then it makes sense to load
it all into memory, assuming that's feasible. Given the figures above, that
should be doable on most modern computers, assuming your just loading
messages from a single group at a time. Load the messages into memory and
then sort them. Leave them sorted in the files however you want. It won't
make much difference.

I don't expect it to be lighting fast, but I think it will be much faster
than you think. Implementing the IComparer interface, sorting should be a
piece of cake and the built-in sort algorithm is quick sort, I believe.
As for compression - again, without testing it, I wouldn't know - but
although compression would save tons of disk space, wouldn't the overhead
of the compression make is slower than reading more uncompressed data? I
assume compression would be variable, so it'd be difficult to seek within
a compressed stream. Any ideas?


Compressing data is slow. Decomrpessing is generally quite fast. I suspect
it'll be faster to read due to the large amount of saved space, particularly
if data is located on a network drive.

Remember, 2 files: Index file and Data File. Leave the index file
uncompressed. Don't compress the entire data file, just compress the
individual messages. That way you have an offset to each compressed message
and just begin decompression at the beginning of the message. Again, look at
the message I posted earlier where I use a simple index file and store a
bunch of thumbnails in a single file. It easily loads 500 thumbnails (and
that includes jpeg decoding of the data) in a matter of maybe 2 seconds.
Without the jpeg decoding, it would be less than half a second, I'm sure.

Nov 18 '05 #4
Why don't you try the SQLite database engine? It's a single small DLL,
requires no installation, has an ADO.NET provider, and it's extremely fast.
There's now a 2.0 version as well. Check it out at Sourceforge.net
peter

--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com


"Danny Tuppeny" wrote:
Hi all,

I'm writing a news client (mainly to test out CAB & ClickOnce!), and trying
to decide on what to use for the storage of messages etc.. SQL Express seems
like overkill (and is a hefty download for a < 1MB app!). Also, since there
could be thousands of messages (potentially binary), I'm not sure that
serializing my classes to disk would perform at all well.

What would other people use for a small app like this? And why?

Thanks,

Nov 18 '05 #5
"Pete Davis" <pdavis68@[nospam]hotmail.com> wrote in message
news:M4******************************@giganews.com ...
I suspect it will load much faster than you think.
After Googling a little more last night, I think you're right! :)
I ran this:

http://www.codeproject.com/csharp/Fa...yFileInput.asp

Which didn't take long to create 10,000,000 structs in a binary file -
276MB of data :)

Well, if they're going to be able to sort them, then it makes sense to
load it all into memory, assuming that's feasible. Given the figures
above, that should be doable on most modern computers, assuming your just
loading messages from a single group at a time. Load the messages into
memory and then sort them. Leave them sorted in the files however you
want. It won't make much difference.
I was thinking about this - if once loaded into memory, I let the user sort
(probably by clicking column headers), once they more to another folder (or
close the app), I can write the index back in this order - which persists
their sort order, but also means I don't ever have to load it and
immediately sort afterwards :)

Do you think the index file would perform well as normal Serialized objects?
The smaller thngs (Folders, user settings etc.) I was going to just
serialize as XML. Since the messages (indexes) won't be huge, I'm wondering
if they can be done the same way, or if I'd need to think about something
slightly different, like the message data..?

Remember, 2 files: Index file and Data File. Leave the index file
uncompressed. Don't compress the entire data file, just compress the
individual messages. That way you have an offset to each compressed
message and just begin decompression at the beginning of the message.
Again, look at the message I posted earlier where I use a simple index
file and store a bunch of thumbnails in a single file. It easily loads 500
thumbnails (and that includes jpeg decoding of the data) in a matter of
maybe 2 seconds. Without the jpeg decoding, it would be less than half a
second, I'm sure.


I forgot to look! Just looked now, and it looks very helpful - thanks! :)

Danny
Nov 18 '05 #6

"Danny Tuppeny" <gr****@dannytuppeny.commmmmm> wrote in message
news:43***********************@ptn-nntp-reader03.plus.net...
"Pete Davis" <pdavis68@[nospam]hotmail.com> wrote in message
news:M4******************************@giganews.com ...
I suspect it will load much faster than you think.

[snip]
I was thinking about this - if once loaded into memory, I let the user
sort (probably by clicking column headers), once they more to another
folder (or close the app), I can write the index back in this order -
which persists their sort order, but also means I don't ever have to load
it and immediately sort afterwards :)
yes, it will/ But again, I don't think sorting is going to be slow at all.

Do you think the index file would perform well as normal Serialized
objects? The smaller thngs (Folders, user settings etc.) I was going to
just serialize as XML. Since the messages (indexes) won't be huge, I'm
wondering if they can be done the same way, or if I'd need to think about
something slightly different, like the message data..?

Xml serialization for the headers is probably fine.

Pete
Nov 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Tauseef | last post by:
hi people I am running my PHP on Apache server on Linux. Now my problem is as follows. There may be several clients connecting to my server for some jobs whenever a client connects i am...
10
by: Zap | last post by:
Widespread opinion is that public data members are evil, because if you have to change the way the data is stored in your class you have to break the code accessing it, etc. After reading this...
4
by: zamolxe | last post by:
Only one client can connect to DB2 at a time from our office. Once a connection is established, anybody trying to connect gets the message "SQL0973N Not enough storage is available in the...
2
by: Victor Fees | last post by:
I have a general question for which I would like to get some general input from the online community. I'm building an asp.net web application that will make extensive use of a SQL Server 2K...
12
by: Chris Springer | last post by:
I'd like to get some feedback on the issue of storing data out to disk and where to store it. I've never been in a production environment in programming so you'll have to bear with me... My...
7
by: charpour | last post by:
Hello, I am implementing a server in C using the select function and I have problems implementing a buffering system for holding client data until the client socket is available for...
11
by: eBob.com | last post by:
I have this nasty problem with Shared methods and what I think of as "global storage" - i.e. storage declared outside of any subroutines or functions. In the simple example below this "global"...
18
by: Brock | last post by:
I'm trying to develop a web service to expose an XML file for product manufacturers for a client application to consume and populate a datagrid on the consuming end. I have successfully tested...
3
by: Andrew Poulos | last post by:
I have a piece of elearning where users move from screen to screen. The content itself is displayed within an IFRAME with the parent window containing the various navigation controls and...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.