archival of newsfeeds?

cody

I want to program a feedreader which is able to archive all messages so that
I can view messages from weeks or months ago.

The question is now *how* to archive them. Since the feeds can have
different formats do I have to convert them in my own format? Is it better
to store them in a database or is it better to use a large xml file?
Will I still have satisfiing performance if I search a XmlDocument for a
newsfeed containing specific words in the title or feed having a specific
category?

Nov 12 '05 #1

Subscribe Post Reply

953

Pascal Schmitt

Hello!

The question is now *how* to archive them. Since the feeds can have
different formats do I have to convert them in my own format?
Atom & RSS have equal features, you could for example use Atom
internally with own additions to support RSS features (or swapped).

Is it better
to store them in a database or is it better to use a large xml file?
Will I still have satisfiing performance if I search a XmlDocument for a
newsfeed containing specific words in the title or feed having a specific
category?

For optimal Performance, use a Database. But this not as flexible as
xml-files wich you could search using XPath.

Always try to avoid using the XmlDocument, especially for large files.
They will be read into memory completely wich is a total waste of
Resources (why parse & build a DOM-Tree of 10MB XML when you just want
to read the first Elements text value?).
For best comfort, use the XPathDocument wich allows you to use XPath on
streamed xml (eg. it is not loaded into memory) for even more
Performance but more specific and schema-centric code use Xml(Text)Reader.
--
Pascal Schmitt

Nov 12 '05 #2

cody

> > The question is now *how* to archive them. Since the feeds can have

different formats do I have to convert them in my own format?

Atom & RSS have equal features, you could for example use Atom
internally with own additions to support RSS features (or swapped).

> Is it better
to store them in a database or is it better to use a large xml file?
Will I still have satisfiing performance if I search a XmlDocument for a
newsfeed containing specific words in the title or feed having a specific category?

For optimal Performance, use a Database. But this not as flexible as
xml-files wich you could search using XPath.

Always try to avoid using the XmlDocument, especially for large files.
They will be read into memory completely wich is a total waste of
Resources (why parse & build a DOM-Tree of 10MB XML when you just want
to read the first Elements text value?).
For best comfort, use the XPathDocument wich allows you to use XPath on
streamed xml (eg. it is not loaded into memory) for even more
Performance but more specific and schema-centric code use Xml(Text)Reader.

But if I want to search within my feed is XmlDocument the right solution or
is there a better way?
How fast is XPath? Does it simply walk through all nodes or are there
optimized algorithms used, for example hashing?

Nov 12 '05 #3

Pascal Schmitt

Hello!

For best comfort, use the XPathDocument wich allows you to use XPath on
streamed xml (eg. it is not loaded into memory) for even more
Performance but more specific and schema-centric code use Xml(Text)Reader.
But if I want to search within my feed is XmlDocument the right solution or
is there a better way?

XPathDocument. If there is no need to modify anything, use it!
(and IF you need to modify a big XML file consider using XmlTextReader &
XmlTextWriter simultaneously: read data, modify it, write it at once -
not as nice too look at as DOM operations but really fast.)

XPathDocument x = new XPathDocument("file.xml");
int f = (int)(double)x.CreateNavigator().Evaluate("count(//foo)");

How fast is XPath? Does it simply walk through all nodes or are there
optimized algorithms used, for example hashing?

Afaik there is no need for optimisation because XPath just walks the
Document using an XPathNavigator (wich both XPathDocument and
XmlDocument implement but XPathDocument is faster but does not allow
editing data until .NET 2.0).
--
Pascal Schmitt

Nov 12 '05 #4

cody

But the problem when I use xml files is that if I want to modifiy them, I
have to rewrite the entire file, right?
"Pascal Schmitt" <ne*******@cebra.nu> schrieb im Newsbeitrag
news:uF*************@tk2msftngp13.phx.gbl...

Hello!
For best comfort, use the XPathDocument wich allows you to use XPath on
streamed xml (eg. it is not loaded into memory) for even more
Performance but more specific and schema-centric code use
Xml(Text)Reader.

But if I want to search within my feed is XmlDocument the right solution
or
is there a better way?

XPathDocument. If there is no need to modify anything, use it!
(and IF you need to modify a big XML file consider using XmlTextReader &
XmlTextWriter simultaneously: read data, modify it, write it at once - not
as nice too look at as DOM operations but really fast.)

XPathDocument x = new XPathDocument("file.xml");
int f = (int)(double)x.CreateNavigator().Evaluate("count(//foo)");

How fast is XPath? Does it simply walk through all nodes or are there
optimized algorithms used, for example hashing?

Afaik there is no need for optimisation because XPath just walks the
Document using an XPathNavigator (wich both XPathDocument and XmlDocument
implement but XPathDocument is faster but does not allow editing data
until .NET 2.0).
--
Pascal Schmitt

Nov 12 '05 #5

by: John Hunter | last post by:

hashtar is a utility designed for encrypted archiving to media vulnerable to corruption (eg, CDR, DVDR). http://nitace.bsd.uchicago.edu:8080/hashtar Comments, bug reports, suggestions for...

Python

Five-Day XML/Etext/EAD Courses at Virginia

by: Rare Book School | last post by:

RARE BOOK SCHOOL (RBS) is pleased to announce its Winter and Early Spring Sessions 2004, a collection of five-day, non-credit courses on topics concerning rare books, manuscripts, the history of...

.NET Framework

EAD and E-text classes at Rare Book School 2005

by: Rare Book School | last post by:

RARE BOOK SCHOOL 2005 Rare Book School is pleased to announce its schedule of courses for 2005, including sessions at the University of Virginia, the Walters Art Museum/Johns Hopkins University...

.NET Framework

Any way to protect your data files from root?

by: siliconmike | last post by:

Is there a way to protect data files from access by root ? I have a data-centered website and would like to protect data piracy from any foot-loose hosting company employee. Any ideas? ...

MySQL Database

System clock vs. DB2

by: eugene | last post by:

Is there any issue re-setting system time while DB2 database is online and the system clock is ahead. The database is actually a 24x7 operational and a unscheduled shutdown would be a problem....

DB2 Database

Header in ASP.Net?

by: Sam | last post by:

I want to create header whereby I could reuse whenever new aspx. However, it is display nothing and please find my coding: index.aspx ========== <%@ Page Language="VB" %> <%@ Register...

ASP.NET

How to Restore an Archived Filegroup on Another Server

by: Takpol | last post by:

Hello, I have several archived filegroups that have data in them partitioned based on the date. These filegroups have been removed from database after archival. For example two months ago....

Microsoft SQL Server

Archival logging and USEREXIT BEHAVIOR.

by: Patrick Finnegan | last post by:

The db2 diag log shows the last log file archived was S0011941.LOG. cat db2diag.log| grep -i archived MESSAGE : Successfully archived log file S0011938.LOG to USEREXIT from MESSAGE :...

DB2 Database

Questions with Archival logging for HADR Setup

by: deshaipet | last post by:

As only primary database does archival logging - 1) How should I setup archival logging(LOGARCHMETH1 and LOGARCHMETH2 for Primary and Standby databases in HADR setup ? 2) Should I only use...

DB2 Database

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Similar topics