[RSS] Multiple <items>s

Ian Rastall

I'm doing a very simple RSS 2.0 feed by hand
<http://www.bookstacks.org/rss.xml> and changing the sole
<item></item> section for each day. Is this the general way in which
RSS feeds are done? I'm intentionally trying to avoid using software
to generate / manage the feed.

Ian
--
http://sundry.ws/

Mar 15 '06 #1

Subscribe Post Reply

1590

Andy Dingley

On Wed, 15 Mar 2006 00:10:05 GMT, Ian Rastall <ia*@bookstacks.org>
wrote:

Is this the general way in which RSS feeds are done?
Why are you asking? Does it _feel_ right ?

OK, I admit I do it. I have a couple of small sites where I do literally
hand-code the catalogue as RSS. One XSLT transform makes the pages, one
makes the indexes and another puts the most recent articles into an RSS
feed. But then I'm a geek and it's perfectly natural for me to hand-code
RSS - I'd never ask a user to do this.
I'm intentionally trying to avoid using software
to generate / manage the feed.

Why? Software is your friend!

In general RSS is spewed out of the same database-backed CMS that serves
up the pages. Sometimes you write static pages first and then scrape
them into RSS. You really need to look at the site as an overall task,
not as the RSS being something isolated.

Mar 15 '06 #2

Ian Rastall

On Wed, 15 Mar 2006 01:15:05 +0000, Andy Dingley
<di*****@codesmiths.com> wrote:

Why are you asking? Does it _feel_ right ?

Hi Andy. There isn't much to this feed at all, at least at the moment.
I put a link to it so people could see what it was, but it's easy to
just paste it here:

<?xml version="1.0"?>
<rss version="2.0">
<channel>
<title>Bookstacks</title>
<ttl>15</ttl>
<description>Daily bits of useful knowledge</description>
<link>http://www.bookstacks.org/</link>
<copyright>No rights reserved.</copyright>
<language>en</language>
<item>
<guid>http://www.bookstacks.org/rss/past/0001.html</guid>
<title>PG Encyclopedia - "A" (part 1)</title>
<description>A. This letter of ours corresponds [...snip...] in
Athenaeus x. 453 d.</description>
<pubDate>Tue, 14 Mar 2006 15:48:00 -0500</pubDate>
<link>http://www.bookstacks.org/rss/rss.xml</link>
</item>
</channel>
</rss>

All I have to do is replace the <description> section with a new
paragraph, and change the <title> and <guid>. This is pretty easy to
do by hand.

I guess what I'm worried about is that if I change the <item> section
every day, instead of adding a new <item> section underneath it, some
RSS readers (the software, mind you, not the people) might choke, or
think it's a new feed every time it opens it. Is it standard practice,
IOW, to have just one item and change it every day, or to keep adding
in <item> sections, much like a blog? Will an RSS reader download
every item every time it checks the page?

This is my fault, because I made the page, did the validation, and
just went ahead and put it up live, without seeing what would happen
if I added in an <item>, or changed the <item> ... whichever. Now I'm
trying to do the right thing pro-actively without some of my readers
getting a weird feed back from me while I work out the kinks.

TIA,

Ian
--
http://sundry.ws/

Mar 15 '06 #3

Ian Rastall

On Wed, 15 Mar 2006 01:15:05 +0000, Andy Dingley
<di*****@codesmiths.com> wrote:

You really need to look at the site as an overall task,
not as the RSS being something isolated.

This site has nothing to make a feed out of. I've tried before. If
someone wants to read one of the books, they'll get the whole thing.
There's no news, except newest books, and that's not something I would
open my aggregator to see. But I was looking at the Project Gutenberg
Encyclopedia today, thinking it wasn't a good idea to put it on the
site, but thinking it would make for an excellent RSS feed. It's not
that the feed delivers the content of the site, it's that it
*reflects* the content, or is rather more of it.

Ian
--
http://sundry.ws/

Mar 15 '06 #4

Andy Dingley

Ian Rastall wrote:

I guess what I'm worried about is that if I change the <item> section
every day, instead of adding a new <item> section underneath it,
Ah! Now I see your question.

Yes, this is unusual. Usually it's a FIFO list, and things are left on
it "forever", or until they're old and obsolete, or until the list is
too long, whichever comes first.

Of course this depends on the "meaning" of the feed. If it's phrased as
"Our latest publications" then it's sensible for it to grow. If it's
"Publication of the week" then it's reasonable to always keep
MAX_LIST_LENGTH =1. There's nothing about RSS that says you can't use
it as a single-item publishing channel, almost as an ad banner. You may
even have an associated feed that is the historical archive of these
one-off feed items.

some RSS readers (the software, mind you, not the people) might choke, or
think it's a new feed every time it opens

They should be wise to this. Set the metadata appropriately, don't
change item URLs that ought to be stable and the aggregators will work
fine.

Mar 15 '06 #5

Simon Brooke

in message <jj********************************@4ax.com>, Ian Rastall
('i**@bookstacks.org') wrote:

I'm doing a very simple RSS 2.0 feed by hand
<http://www.bookstacks.org/rss.xml> and changing the sole
<item></item> section for each day. Is this the general way in which
RSS feeds are done? I'm intentionally trying to avoid using software
to generate / manage the feed.

Use software to generate/manage the feed. Automatically, as a cron job or
similar. XSL is a very good place to start.

If you're doing it manually every twenty-four hours, advertising 'time to
live' of 15 minutes is just wrong, and places ridiculous loads on your
server.
See
<URL:http://www.neilturner.me.uk/2003/Aug/10/lowering_bandwidth_usage_with_ttl_in_rss.html>

The 'link' sub-element of the 'item' element should be a link to the
resource described (i.e. the encyclopedia), not to the RSS feed itself.

Although seeing you have the text of The Man who was Thursday, I shall
forgive you.

--
si***@jasmine.org.uk (Simon Brooke) http://www.jasmine.org.uk/~simon/

I'm fed up with Life 1.0. I never liked it much and now it's getting
me down. I think I'll upgrade to MSLife 97 -- you know, the one that
comes in a flash new box and within weeks you're crawling with bugs.

Mar 15 '06 #6

Simon Brooke

in message <13********************************@4ax.com>, Ian Rastall
('i**@bookstacks.org') wrote:

On Wed, 15 Mar 2006 01:15:05 +0000, Andy Dingley
<di*****@codesmiths.com> wrote:
You really need to look at the site as an overall task,
not as the RSS being something isolated.

This site has nothing to make a feed out of. I've tried before. If
someone wants to read one of the books, they'll get the whole thing.
There's no news, except newest books, and that's not something I would
open my aggregator to see. But I was looking at the Project Gutenberg
Encyclopedia today, thinking it wasn't a good idea to put it on the
site, but thinking it would make for an excellent RSS feed. It's not
that the feed delivers the content of the site, it's that it
*reflects* the content, or is rather more of it.

There's nothing which prevents you writing a little daemon which
periodically selects excepts at random, and wraps RSS round them.

--
si***@jasmine.org.uk (Simon Brooke) http://www.jasmine.org.uk/~simon/

:: Wisdom is better than weapons of war ::
:: Ecclesiastes 9:18 ::

Mar 15 '06 #7

Ian Rastall

Simon Brooke wrote:

There's nothing which prevents you writing a little daemon which
periodically selects excepts at random, and wraps RSS round them.

That would be nice to do, or even just to substitute in order. I'd
just have to learn how to do it first. :-)

Ian
--
http://sundry.ws

Mar 15 '06 #8

Simon Brooke

in message <JU*****************@fe07.news.easynews.com>, Ian Rastall
('i********@gmail.com') wrote:

Simon Brooke wrote:
There's nothing which prevents you writing a little daemon which
periodically selects excepts at random, and wraps RSS round them.

That would be nice to do, or even just to substitute in order. I'd
just have to learn how to do it first. :-)

If you had an XML (ideally DocBook) marked up version of the text, it
would certainly be easy. If you have an HTML marked-up version of the
encyclopedia, what is the structure of an entry?

Is it (1)
<dl>
...
<dt>[entry heading]</dt>
<dd>[entry content marked up in whatever way suits]</dd>
...
<dl>

or (2)
...
<div>
<h2>[entry heading]</h2>
[entry content marked up in whatever way suits]
</div>
...

or (3)

...
<h2>[entry heading]</h2>
[entry content marked up in whatever way suits]
...

(1) or (2) are exceedingly easy. (3) is /hard/. This is yet another
reason why structural markup is a Good Thing.

If you're working from the Gutenberg plain text... [sigh]. It's /got/ to
be possible. It would be a real shame if all that work was wasted just
because they made a bad choice of format.

--
si***@jasmine.org.uk (Simon Brooke) http://www.jasmine.org.uk/~simon/

Error 1109: There is no message for this error

Mar 16 '06 #9

Andy Dingley

Simon Brooke wrote:

If you had an XML (ideally DocBook) marked up version of the text, it
would certainly be easy.

I'd agree with that much, except that I wouldn't recommend DocBook. I'd
go for HTML instead.

- One already knows (or needs to learn) HTML. DocBook is rather more
obscure.

- The final output is HTML, not DocBook. This can save you a step of
some transformation work. I suppose that it's reasonable to embed
DocBook in RSS (!), but not in a mainstream app.

- DocBook offers a better representation for the broad structure of a
document, but it doesn't do much better for character and para level
formatting than HTML does.

IMHE, DocBook isn't much use for this sort of task. I keep thinking of
using it, but ending up back with HTML. The text has to be a lot more
complex, but not _too_ complex, before it's worth using DocBook but not
worth heading right off into RDF.

Mar 16 '06 #10

Joe Kesselman

Andy Dingley <di*****@codesmiths.com> wrote:

I'd agree with that much, except that I wouldn't recommend DocBook. I'd
go for HTML instead.

Given that we're in the XML discussion, I'd suggest XHTML rather than
HTML. (XHTML is the XML-based rewrite of HTML; HTML was SGML-based.
XHTML is also somewhat more rigorous about what is and isn't correct;
most browsers will try to guess their way through pretty badly broken HTML.)

Though there are a few parsers out there which will read HTML and output
XML-compatable representations. NekoHTML, based on Xerces, will do so.
And the W3C's Tidy tool can be used to convert HTML into XHTML, applying
the same sort of guess-and-repair hacks that the browsers do; I seem to
remember that it's also possible to use Tidy directly as an HTML parser.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry

Mar 16 '06 #11

Simon Brooke

in message <11**********************@i40g2000cwc.googlegroups .com>, Andy
Dingley ('d******@codesmiths.com') wrote:

Simon Brooke wrote:
If you had an XML (ideally DocBook) marked up version of the text, it
would certainly be easy.
I'd agree with that much, except that I wouldn't recommend DocBook. I'd
go for HTML instead.

- One already knows (or needs to learn) HTML. DocBook is rather more
obscure.

If one is is dealing with a complex text like an encyclopedia, HTML is
nothing like rich enough to catch that structure. The dl, dt, dd tags
may be fine for a quick glossary but they don't really meet the needs of
anything more complex. You can hack it up, of course, with lots of
special classes of div, but that becomes more problematic to deal with
than a purpose designed markup.
- The final output is HTML, not DocBook.
Who knows what the final output will be? Large and complex source
documents like this are expensive to produce, and will be produced
rarely. The etext document is now over ten years old, and is an OCR scan
of a document nearly 100 years old. The final output in ten years time
will not be HTML as you know it now, and probably won't be HTML at all.
Don't limit your options.
This can save you a step of
some transformation work. I suppose that it's reasonable to embed
DocBook in RSS (!), but not in a mainstream app.
You can't straightforwardly embed either HTML or DocBook in RSS, so it's
moot.
- DocBook offers a better representation for the broad structure of a
document, but it doesn't do much better for character and para level
formatting than HTML does.
There's an awful lot more to the structure of an encyclopedia than
character of para level formatting.
IMHE, DocBook isn't much use for this sort of task. I keep thinking of
using it, but ending up back with HTML. The text has to be a lot more
complex, but not _too_ complex, before it's worth using DocBook but not
worth heading right off into RDF.

There, I'd agree with you; RDF certainly needs to be considered as part
of the overall solution.

--
si***@jasmine.org.uk (Simon Brooke) http://www.jasmine.org.uk/~simon/

See one nuclear war, you've seen them all.

Mar 16 '06 #12

Ian Rastall

Simon Brooke wrote:

If you have an HTML marked-up version of the
encyclopedia, what is the structure of an entry?
In the <head>: <title>PG Encyclopedia - [entry title]</title>

In the <body>: <p></p>

That's how simple I've got it. I literally just copy and paste the
title, then copy and paste the paragraph.
If you're working from the Gutenberg plain text... [sigh]. It's /got/ to
be possible. It would be a real shame if all that work was wasted just
because they made a bad choice of format.

I don't know what they're doing with that encyclopedia. It's from
1911, so it's severely out-of-date, and they can't call it the
Encyclopedia Brittanica, because of some legel thing. So it's the
"Project Gutenberg Encyclopedia", which is basically plain text broken
up into paragraphs. All the semantic markup and transformations
mentioned in this thread would have to be added in, but then this
whole RSS project would become unwieldy.

Undoubtedly features will creep in as I learn more about hand-coding
RSS feeds.

Ian
--
http://sundry.ws

Mar 16 '06 #13

Ian Rastall

Joe Kesselman wrote:

Given that we're in the XML discussion, I'd suggest XHTML rather than
HTML.

I actually started out on XHTML, but have switched to HTML, since I
was just serving it as text/html anyway.

Ian
--
http://sundry.ws

Mar 16 '06 #14

Andy Dingley

Joe Kesselman wrote:

Given that we're in the XML discussion, I'd suggest XHTML rather than
HTML.
However we're talking about RSS output. HTML can be embedded in RSS
with relatively widespread techniques, XHTML can't. Putting XHTML into
RSS has the potential to cause trouble for your users (depending on its
resemblance to HTML, for that specific document). Putting XHTML into
RSS _as_XML_ (i.e. by namespacing) is just waving goodbye to those
users - almost none of them will be able to read it.

(XHTML is the XML-based rewrite of HTML; HTML was SGML-based.
XHTML is also somewhat more rigorous about what is and isn't correct;
No, this is wrong. XHTML is deliberately no more restrictive than HTML
4.01 _except_ in the implications of XML. It's a straightforward
transcoding, not any "tightening up" of the spec.
most browsers will try to guess their way through pretty badly broken HTML.)

They'll guess their way through XHTML in just the same way. (For
representative browsers) HTML is parsed according to SGML rules into a
DOM, XHTML is also parsed into a HTML DOM. A _lot_ of guessing goes on
about "broken" web pages between the parser output and the DOM, but
this is broadly the same process for either HTML or XHTML. Much of the
"variance" in HTML, as compared to XML, is actually just the SGML
simplifications allowed by omitting tags (both opening and closing) and
the ability to infer missing tags because the DTD is known. This takes
place before we reach the DOM.

There's an awful lot of SGML parsing allowed in "abbreviated" HTML that
is still deterministic, before you get to the bad-markup-fixup level.
Funny SGML rules about tags still give rise to a stable DOM full of
well-behaved elements. The "broken" inferencing really comes in at the
level after this, trying to make sense of mis-used elements to get a
renderable page.

And of course browsers don't do XHTML anyway. They just treat it as
funny looking HTML and parse it as SGML anyway (for practical
combinations of common browsers and appendix C)

Mar 17 '06 #15

Andy Dingley

Simon Brooke wrote:

in message <11**********************@i40g2000cwc.googlegroups .com>, Andy
Dingley ('d******@codesmiths.com') wrote:
- The final output is HTML, not DocBook.

Who knows what the final output will be?

It's RSS. So the embedded output content is pretty much implied to be
HTML.

The closeness between "output" and "master" depends on the OP's
large-scale architectural choices. Is this a quick hack to put some RSS
content out every week, or is it a Gutenberg-scale long-term project
with RSS output as a minor bolt-on ?
Large and complex source
documents like this are expensive to produce, and will be produced
rarely.
That's a good point, but not knowing the OP's situation it's impossible
to really judge what the "master" format ought to be. Is DocBook
adequate ? Does TEI have anything more to offer? (quite probably). It
certainly _could_ be done in HTML (we've all done it many times) and
the only cost there is a lot of classed <span>s and <div>s. That's not
a huge downside to things - I quite like my semantics to be imposed
through a meta-structure like this, with classes into a defined
taxonomy. It keeps the underlying HTML structure simple and easily
processed for output.

I just don't like DocBook. It sees the whole of publishing as being
about computer manuals. Far too many little tweaks bolted on for this,
not enough real flexible power for much else.

The sad part is that so many major projects (like MIT's horrendous
"bandsaw up the library project") don't even use a format with the
expressiveness of HTML

You can't straightforwardly embed either HTML or DocBook in RSS, so it's
moot.

HTML is embeddable in RSS by common techniques (admittedly several of
them). I've never seen DocBook in RSS.

- DocBook offers a better representation for the broad structure of a
document, but it doesn't do much better for character and para level
formatting than HTML does.

There's an awful lot more to the structure of an encyclopedia than
character of para level formatting.

Yes, but not in the structure of snippets going out through public-use
RSS. By that point you've pretty much hammered it flat and you really
are serving "trivial" content. If they want the good stuff, they follow
the link on the item and go to the real version.

Mar 17 '06 #16

Ian Rastall

Andy Dingley <di*****@codesmiths.com> wrote:

Is this a quick hack to put some RSS
content out every week, or is it a Gutenberg-scale long-term project
with RSS output as a minor bolt-on ?

A little bit of both. It's a long-term project using a quick hack to
get RSS content up every day. The reason for this is that the
Gutenberg encyclopedia is outdated, and only usuable as a curiosity.
To really do it right on the books site itself would be to create a
less-usable, outdated version of the Wikipedia, which is why (I think)
Gutenberg isn't doing much with this encyclopedia anyway. It works as
RSS content, to my mind, but wouldn't work on the site itself.

Now, to get back on topic, I would be very interested in software that
could manage my RSS feeds locally. I'll look about, but so far haven't
had much luck. Most of the stuff out there is blogging software.

Ian
--
http://sundry.ws

Mar 17 '06 #17

Similar topics