473,805 Members | 2,059 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Data Storage for news client

Hi all,

I'm writing a news client (mainly to test out CAB & ClickOnce!), and trying
to decide on what to use for the storage of messages etc.. SQL Express seems
like overkill (and is a hefty download for a < 1MB app!). Also, since there
could be thousands of messages (potentially binary), I'm not sure that
serializing my classes to disk would perform at all well.

What would other people use for a small app like this? And why?

Thanks,
Nov 17 '05 #1
6 1477
I would simply store them sequentially in a single file and then create an
index file which has some header information (perhaps subject, author, date,
etc) and an offset to the message's text in the main file. Similar to my
response to the message just a bit earlier under "squeeze few image file
into on binary file"

You could also compress the text prior to storing it in the single file
(using SharpZipLib or 7Zip or something). I suspect it would compress well,
even messages with uuencoded or yenc encoded binaries.

I actually need to integrate a newsreader, at some point, into an app I'm
writing and I suspect this is the direction I'll take.

The advantage of this is that access is quick and it easily accommodates
thousands of messages. If you store the messages in separate files, you'll
soon find your directory getting large and getting to the data in a single
file with an index, using Seek will be much faster than having the file
system find a match for your file name in a directory with thousands of
files.

It's also fairly easy to purge lots of contiguous messages (which is likely
how you'd want to handle purging from a newsreader) from the file. For
example, if you want to delete the first 1000 messages, simply find the
index to the 1001'st message, then copy the data from there to the end to a
new file, delete the original file, and then rename the new one to the name
of the old. Do the same with the index file.

Pete

"Danny Tuppeny" <gr****@dannytu ppeny.commmmmm> wrote in message
news:43******** *************** @ptn-nntp-reader03.plus.n et...
Hi all,

I'm writing a news client (mainly to test out CAB & ClickOnce!), and
trying to decide on what to use for the storage of messages etc.. SQL
Express seems like overkill (and is a hefty download for a < 1MB app!).
Also, since there could be thousands of messages (potentially binary), I'm
not sure that serializing my classes to disk would perform at all well.

What would other people use for a small app like this? And why?

Thanks,

Nov 17 '05 #2
"Pete Davis" <pdavis68@[nospam]hotmail.com> wrote in message
news:Zs******** ************@gi ganews.com...

Hi Peter,
I would simply store them sequentially in a single file and then create an
index file which has some header information (perhaps subject, author,
date, etc) and an offset to the message's text in the main file. Similar to
my response to the message just a bit earlier under "squeeze few image file
into on binary file"

You could also compress the text prior to storing it in the single file
(using SharpZipLib or 7Zip or something). I suspect it would compress
well, even messages with uuencoded or yenc encoded binaries.

I actually need to integrate a newsreader, at some point, into an app I'm
writing and I suspect this is the direction I'll take.


Interesting response. What about performance though? If the user opens a
folder that has 1,000 messages, either I have to load them all *very*
quickly (I need to display Sender, Subject, Date, etc.), or I fetch them as
the user scrolls (which could be pretty unresponsive if the user is dragging
the scrollbar).

What would you store in the index file? The user will be able to change the
sort order in the display, so unless I maintain a few indexes, it'd be
difficult to get a list in order. The message list will show the Sender,
Date, Subject etc., and so if I have to scan through the data file for
thousands of these things, surely it'll take an age? I've never done this
kind of processing before, so I've no idea of how it would perform. I don't
want to build it and find it's unacceptable, so any experiences anyone can
share would be much appreciated! :)

As for compression - again, without testing it, I wouldn't know - but
although compression would save tons of disk space, wouldn't the overhead of
the compression make is slower than reading more uncompressed data? I assume
compression would be variable, so it'd be difficult to seek within a
compressed stream. Any ideas?

Thanks,

Danny
Nov 17 '05 #3


"Danny Tuppeny" <gr****@dannytu ppeny.commmmmm> wrote in message
news:43******** **************@ ptn-nntp-reader02.plus.n et...
"Pete Davis" <pdavis68@[nospam]hotmail.com> wrote in message
news:Zs******** ************@gi ganews.com...

Hi Peter,
[snip] Interesting response. What about performance though? If the user opens a
folder that has 1,000 messages, either I have to load them all *very*
quickly (I need to display Sender, Subject, Date, etc.), or I fetch them
as the user scrolls (which could be pretty unresponsive if the user is
dragging the scrollbar).

I suspect it will load much faster than you think.

Assuming in the index you store Sender, subject, date, message ID, offset in
main file, and a few other header items, I suspect you're looking at an
average of roughly 100-200 bytes per message. Let's say 200 bytes, but
that's probably on the high side. That works out to only 200K per thousand
messages or 5000 messages per megabyte. That will load into memory pretty
quickly.
What would you store in the index file? The user will be able to change
the sort order in the display, so unless I maintain a few indexes, it'd be
difficult to get a list in order. The message list will show the Sender,
Date, Subject etc., and so if I have to scan through the data file for
thousands of these things, surely it'll take an age? I've never done this
kind of processing before, so I've no idea of how it would perform. I
don't want to build it and find it's unacceptable, so any experiences
anyone can share would be much appreciated! :)

Well, if they're going to be able to sort them, then it makes sense to load
it all into memory, assuming that's feasible. Given the figures above, that
should be doable on most modern computers, assuming your just loading
messages from a single group at a time. Load the messages into memory and
then sort them. Leave them sorted in the files however you want. It won't
make much difference.

I don't expect it to be lighting fast, but I think it will be much faster
than you think. Implementing the IComparer interface, sorting should be a
piece of cake and the built-in sort algorithm is quick sort, I believe.
As for compression - again, without testing it, I wouldn't know - but
although compression would save tons of disk space, wouldn't the overhead
of the compression make is slower than reading more uncompressed data? I
assume compression would be variable, so it'd be difficult to seek within
a compressed stream. Any ideas?


Compressing data is slow. Decomrpessing is generally quite fast. I suspect
it'll be faster to read due to the large amount of saved space, particularly
if data is located on a network drive.

Remember, 2 files: Index file and Data File. Leave the index file
uncompressed. Don't compress the entire data file, just compress the
individual messages. That way you have an offset to each compressed message
and just begin decompression at the beginning of the message. Again, look at
the message I posted earlier where I use a simple index file and store a
bunch of thumbnails in a single file. It easily loads 500 thumbnails (and
that includes jpeg decoding of the data) in a matter of maybe 2 seconds.
Without the jpeg decoding, it would be less than half a second, I'm sure.

Nov 18 '05 #4
Why don't you try the SQLite database engine? It's a single small DLL,
requires no installation, has an ADO.NET provider, and it's extremely fast.
There's now a 2.0 version as well. Check it out at Sourceforge.net
peter

--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com


"Danny Tuppeny" wrote:
Hi all,

I'm writing a news client (mainly to test out CAB & ClickOnce!), and trying
to decide on what to use for the storage of messages etc.. SQL Express seems
like overkill (and is a hefty download for a < 1MB app!). Also, since there
could be thousands of messages (potentially binary), I'm not sure that
serializing my classes to disk would perform at all well.

What would other people use for a small app like this? And why?

Thanks,

Nov 18 '05 #5
"Pete Davis" <pdavis68@[nospam]hotmail.com> wrote in message
news:M4******** *************** *******@giganew s.com...
I suspect it will load much faster than you think.
After Googling a little more last night, I think you're right! :)
I ran this:

http://www.codeproject.com/csharp/Fa...yFileInput.asp

Which didn't take long to create 10,000,000 structs in a binary file -
276MB of data :)

Well, if they're going to be able to sort them, then it makes sense to
load it all into memory, assuming that's feasible. Given the figures
above, that should be doable on most modern computers, assuming your just
loading messages from a single group at a time. Load the messages into
memory and then sort them. Leave them sorted in the files however you
want. It won't make much difference.
I was thinking about this - if once loaded into memory, I let the user sort
(probably by clicking column headers), once they more to another folder (or
close the app), I can write the index back in this order - which persists
their sort order, but also means I don't ever have to load it and
immediately sort afterwards :)

Do you think the index file would perform well as normal Serialized objects?
The smaller thngs (Folders, user settings etc.) I was going to just
serialize as XML. Since the messages (indexes) won't be huge, I'm wondering
if they can be done the same way, or if I'd need to think about something
slightly different, like the message data..?

Remember, 2 files: Index file and Data File. Leave the index file
uncompressed. Don't compress the entire data file, just compress the
individual messages. That way you have an offset to each compressed
message and just begin decompression at the beginning of the message.
Again, look at the message I posted earlier where I use a simple index
file and store a bunch of thumbnails in a single file. It easily loads 500
thumbnails (and that includes jpeg decoding of the data) in a matter of
maybe 2 seconds. Without the jpeg decoding, it would be less than half a
second, I'm sure.


I forgot to look! Just looked now, and it looks very helpful - thanks! :)

Danny
Nov 18 '05 #6

"Danny Tuppeny" <gr****@dannytu ppeny.commmmmm> wrote in message
news:43******** *************** @ptn-nntp-reader03.plus.n et...
"Pete Davis" <pdavis68@[nospam]hotmail.com> wrote in message
news:M4******** *************** *******@giganew s.com...
I suspect it will load much faster than you think.

[snip]
I was thinking about this - if once loaded into memory, I let the user
sort (probably by clicking column headers), once they more to another
folder (or close the app), I can write the index back in this order -
which persists their sort order, but also means I don't ever have to load
it and immediately sort afterwards :)
yes, it will/ But again, I don't think sorting is going to be slow at all.

Do you think the index file would perform well as normal Serialized
objects? The smaller thngs (Folders, user settings etc.) I was going to
just serialize as XML. Since the messages (indexes) won't be huge, I'm
wondering if they can be done the same way, or if I'd need to think about
something slightly different, like the message data..?

Xml serialization for the headers is probably fine.

Pete
Nov 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1755
by: Tauseef | last post by:
hi people I am running my PHP on Apache server on Linux. Now my problem is as follows. There may be several clients connecting to my server for some jobs whenever a client connects i am executing a php script which connects to a MySQL database and gathers some table information into an array. So in this case every time a client connects to my server i have to connect to a database and and read it into an array.My question is that is it...
10
5793
by: Zap | last post by:
Widespread opinion is that public data members are evil, because if you have to change the way the data is stored in your class you have to break the code accessing it, etc. After reading this (also copied below for easier reference): http://groups.google.it/groups?hl=en&lr=&safe=off&selm=6beiuk%24cje%40netlab.cs.rpi.edu&rnum=95 I don't agree anymore.
4
11816
by: zamolxe | last post by:
Only one client can connect to DB2 at a time from our office. Once a connection is established, anybody trying to connect gets the message "SQL0973N Not enough storage is available in the "MON_HEAP_SZ" heap to process the statement. SQLSTATE=57011". Multiple connections were possible before, but then something changed somewhere (probably on the server). Funny thing is, people in other locations don't have this problem connecting to the...
2
2233
by: Victor Fees | last post by:
I have a general question for which I would like to get some general input from the online community. I'm building an asp.net web application that will make extensive use of a SQL Server 2K database. One of the relationships in this database is many Tasks to a 1 Project. I can see two basic ways to manage this in my web application: The first is by creating a Project class and a Task class, and talking directly to the database with...
12
3773
by: Chris Springer | last post by:
I'd like to get some feedback on the issue of storing data out to disk and where to store it. I've never been in a production environment in programming so you'll have to bear with me... My question is about storing data in a database. Yes I understand that you can link to a database in your program and read and write to the database etc etc. Well, that's all find and dandy but what if the person you're writing the application for...
7
1576
by: charpour | last post by:
Hello, I am implementing a server in C using the select function and I have problems implementing a buffering system for holding client data until the client socket is available for reading/writing (sendq and receivq). What I am trying to do is "save" the data in the client's recvq right after data is availiable for the socket and write buffered data (sendq) to the socket when it's ready. The prog skeleton is like this:
11
3169
by: eBob.com | last post by:
I have this nasty problem with Shared methods and what I think of as "global storage" - i.e. storage declared outside of any subroutines or functions. In the simple example below this "global" storage is ButtonHasBeenClicked. In this simple example code in Form1 calls a routine in Module1 which then calls code back in Form1 (subroutine WhatEver). WhatEver needs to access ButtonHasBeenClicked but the reference to ButtonHasBeenClicked...
18
1265
by: Brock | last post by:
I'm trying to develop a web service to expose an XML file for product manufacturers for a client application to consume and populate a datagrid on the consuming end. I have successfully tested the web service with simple mathematic returns like: <%@ WebService Language="VB" Class="aWebService" %> Imports System.Web
3
2279
by: Andrew Poulos | last post by:
I have a piece of elearning where users move from screen to screen. The content itself is displayed within an IFRAME with the parent window containing the various navigation controls and javascript. I did it this way because there's a substantial amount of javascript that gets loaded and a significant amount of data that gets created as the user moves about the lesson. All seems well and good until the user refreshes the screen. Then...
0
10614
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10363
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10369
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10109
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7649
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6876
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5678
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4327
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3847
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.