473,703 Members | 2,464 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Converting HTML elements into XML/RSS

Hi,

I'd like to include the whole web page content (as opposed to just the
headlines) into RSS/XML to enable people to read them via rss feed
readers.

Question: how to convert HTML elements such as href, img, b, p, etc
into XML?
I've seen someone use the following in their RSS feed but I don't like
it because <pre> doesn't produce a nice format:

<content:encode d><![CDATA[
<PRE>
blah blah blah..

Here is a sample HTML code. What would be the best way to put it into
XML, more specifically, convert those HTML elements.

----------------
<b>CAESAR</b> Et tu, Brute! Then fall,
<a
href=http://www.epilepsiemu seum.de/raum6/caesar.jpg>Caes ar</a>.<br>
Dies
<p>
<b>CINNA</b> Liberty! Freedom! Tyranny is dead!
Run hence, proclaim, cry it about the streets.
<a href=http://www.shakespeare-online.com/>Read more</a>.
-----------------

Thanks for all the help!

Mick James

Jul 20 '05 #1
17 29831
On 6 Jan 2005 13:43:19 -0800, mi*******@gmail .com wrote:
I'd like to include the whole web page content (as opposed to just the
headlines) into RSS/XML to enable people to read them via rss feed
readers.


Read this
http://diveintomark.org/archives/200...compatible-rss

Ask again if anything is unclear.
Jul 20 '05 #2
Thanks.

So all the HTML needs to be enclosed in <description> and tags need to
be escaped with &amp;lt; and &amp;gt;?

Jul 20 '05 #3
In article <11************ **********@f14g 2000cwb.googleg roups.com>,
mi*******@gmail .com writes:
I'd like to include the whole web page content (as opposed to just the
headlines) into RSS/XML to enable people to read them via rss feed
readers.
Uh, that's a lot of content for what users are expecting to be a summary.
Why use a feed if it doesn't save your users anything?
Question: how to convert HTML elements such as href, img, b, p, etc
into XML?
Bearing in mind the above, freely mix it, just using namespaces to
distinguish the elements. Since you're already breaking the purpose
of a feed, working normally with conventional client software presumably
isn't an issue.
Here is a sample HTML code. What would be the best way to put it into


Looks more like tag-soup to me.

--
Nick Kew
Jul 20 '05 #4
Thanks for your reply. Yes, I understand that RSS is meant for summary,
not the whole content, but a lot of readers ask for the whole thing.
One imagines, they prefer to read using an rss feed reader instead of
using a web browser.

One question I didn't get the answer to in all my searching is: how to
code HTML tags such as href, img, p, b, etc when converting an HTML
page to .rss page?

Putting everything in CDATA or is there a better way?
A short example would be helpful.

Thanks a lot!

Jul 20 '05 #5
On 6 Jan 2005 15:15:54 -0800, mi*******@gmail .com wrote:
So all the HTML needs to be enclosed in <description> and tags need to
be escaped with &amp;lt; and &amp;gt;?


Yes. Ampersands might also cause problems and should already have been
escaped, but it's common in HTML that they aren't.

You should also "fix" any entitity references that are in the HTML,
such as &eacute; or &nbsp; This needs to be done whether there are
tags involved or not - they're one of the most common intermittent
reasons for an RSS feed to become invalid. Such entities are defined
in HTML, but aren't already defined in XML or RSS.

"Fixing" them can be either replacing the initial ampersand with &amp;
or replacing the "named" form of the entity reference with the
corresponding numeric form. The numeric form is probably best to use,
because that will render correctly even if the consumer doesn't
properly expand the encoded entities.

--
Smert' spamionam
Jul 20 '05 #6
On Fri, 7 Jan 2005 01:25:36 +0000, ni**@hugin.webt hing.com (Nick Kew)
wrote:
Why use a feed if it doesn't save your users anything?
Why do you assume the function of my RSS feed ? I've built many
feeds that are anything but "newsfeeds" . I think my record was 20MB
content size in a <description> element, for a very
application-specific intranet task. However it's still perfectly
compliant RSS 1.0
Question: how to convert HTML elements such as href, img, b, p, etc
into XML?


Bearing in mind the above, freely mix it, just using namespaces to
distinguish the elements.


You can't use namespacing, because the content is HTML rather than
XHTML. Apart from the standards-based argument and the fact that
namespacing just doesn't make sense for HTML, it's also impractical to
expect the incoming HTML content to be well-formed as an XML fragment
(or even valid HTML!).

Remember that RSS is a _feed_, not a one-off document (I wish Winer
would recognise this). Like all layered protocols you have to be very
careful that your implementations are not only correct for one
demonstration example, they have to be demonstrably correct for all
possible inputs.

Since you're already breaking the purpose of a feed,


Rubbish. RSS does _NOT_ define any notion of "purpose", or what's
"appropriat e" to use it for. Besides which, the notion of content
encoding HTML fragments within the <description> element is very well
established.
--
Smert' spamionam
Jul 20 '05 #7
In article <11************ *********@f14g2 000cwb.googlegr oups.com>,
mi*******@gmail .com writes:
One imagines, they prefer to read using an rss feed reader instead of
using a web browser.
Hmmm. I think it should be the job of the Client to present it
sensibly. An RSS feed is to the Web as a newsgroup or mail folder
listing (from, subject, date) is to Usenet or Email. IMHO.

(you've presumably seen how Opera presents RSS feeds?)
One question I didn't get the answer to in all my searching is: how to
code HTML tags such as href, img, p, b, etc when converting an HTML
page to .rss page?
The core Site Valet tools offer options to present reports as RDF.
Since these are markup analysis tools, the more verbose options
embed the original markup, so all system messages can be properly
referenced to it. This uses a namespace to describe it, and
looks a little like XSLT with things like:
<ml:element name="a">
<ml:attribute name="href">foo </ml:attribute>
Putting everything in CDATA or is there a better way?
A short example would be helpful.


I don't think the above reply is really relevant to your question:
I was solving a different problem! But you already have Andy's reply.

--
Nick Kew
Jul 20 '05 #8
Hey,
I'd like to include the whole web page content (as opposed to just the
headlines) into RSS/XML to enable people to read them via rss feed
readers.

Question: how to convert HTML elements such as href, img, b, p, etc
into XML?


Why don't you just use software to create the feed that will convert it for you
so that you don't have to worry about it. There are a couple of options, I know
FeedForAll http://www.feedforall.com has a WYSWIG editor that will do this.

Best,
Colin

Jul 20 '05 #9
WYSIWIG is not an option. I need to do it via script on Linux.

Would someone tell me how the following HTML snippet should be encoded
in an RSS file:

<b>This is a test.</a>
<a href=foo.html>B ar</a>.
<img src=baz.jpg>
<p>

I tried using &amp;lt; etc but RSS readers simply display the
equivalent HTML, rather then rendering it.

Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
2903
by: Tomba | last post by:
hi there, I am looking for a way to convert line breaks that are written in a textarea (with an enter) to <br> to create the same line break in html is there anyone who can help me with this? I can't seem to find a way to recognize the linebreaks in the textarea thanks, Steven De Groote
14
3330
by: Sridhar R | last post by:
Consider the code below, class Base(object): pass class Derived(object): def __new__(cls, *args, **kwds): # some_factory returns an instance of Base # and I have to derive from this instance!
1
1482
by: D Elkins | last post by:
Here is my situation: I have several arrays ... let's say ... Bob1_1, Bob1_2, etc. Each array has several elements ... element 1 is the one I am interested in. Example: Bob1_1=new Array("Element 0","NewWin=window.open('thispage.html')"); Bob1_2=new Array("Element 0","NewWin=window.open('thatpage.html')");
29
3901
by: Armand Karlsen | last post by:
I have a website ( http://www.zen62775.zen.co.uk ) that I made HTML 4.01 Transitional and CSS compliant, and I'm thinking of converting it into XHTML to learn a little about it. Which XHTML variant would you recommend? The w3c HTML validator mentions XHTML 1.0 Transitional, Basic, Strict, and XHTML 1.1. Would I be able to make my existing CSS work in the XHTML page without modification to the .css file?
2
2868
by: mike | last post by:
regards: I follow the following steps to converting from HTML to XHTML http://webpageworkshop.co.uk/main/xhtml_converting My parser is http://htmlparser.sourceforge.net/ Xhtml version is 1.0 from http://nds.nokia.com/uaprof/N6600r100.xml but nokia mobile browser cannot identify the converted file(XHTML1.0). Is there something wrong with my procedure.
3
9921
by: Stephan Brunner | last post by:
Hi I have created two flavors of an XSLT stylesheet to transform all attributes of an XML document to elements: They both work as expected with MSXML and XMLSPY but throw an exception ========================= <?xml version="1.0" encoding="iso-8859-1"?> <xsl:stylesheet version="1.0"
3
2469
by: Parvesh | last post by:
hi, I am using a webservice which Returns the Result in an XML string, The XML response i get i svery cumbersome to parse, But if i could convert it to the Corresponding Class using the System.Xml.Serialization, i think that can solve my problem. But i tried using the Deserialize method for converting the XML to the Corresponding Object, neither i get error nor i get any luck for converting it to Object.
6
12484
by: =?Utf-8?B?QWxleCBNYWdoZW4=?= | last post by:
Is there a function in VS or a utility that will take an HTML file and create a code-behind ASPX page? The idea is, I'd like to be able to have someone develop beautiful, fully functional HTML pages and then pull them in and convert them to ASPX WebForms, doing a bunch of things like: Convert various standard html tags to <ASP:...equivalents, etc., etc. Any ideas?
1
1701
by: Izhaki | last post by:
Hi, I'm creating a system where my XML includes HTML tags (<h1></h1>) in addition to other XML elements (<book></book>). I would like to render the HTML tags back to HTML using XSL. Considering I want to replace all headings, I could do for each heading level (i.e. repeat the following code for h2, h3, h4, h5, etc.): <xsl:template match="h1"> <h1><xsl:apply-templates/></h1> </xsl:template>
0
8763
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8676
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9127
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9020
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8972
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7878
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6602
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4435
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
3
2073
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.