473,396 Members | 1,846 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

RFC: From XHTML to HTML via XSLT

It is common knowledge that XHTML is better HTML and you can serve XHTML content as HTML.
However, the second statement is incorrect, for various reasons;
it is enough to say that the HTML validator does not tolerate XML-style empty tags.
It seems serving XHTML to the browser is of no advantage and can cause serious problems if the browser does not understand the difference.
This raises the question of downgrading XHTML to HTML.
I could not find any relevant instruction at the WWW Corporation so I decided I have to roll my own with XSLT.
I attach the XSLT code and I kindly ask for comments (because I am a novice in this area).
Please note that all tags and attributes have to be copied stripping the napespace;
<xsl:copydoes not work as expected because I get <br></brinstead of <bronly.
I decided to copy the comments explicitly
in order to be able to embed Internet Explorer conditional inclusion comments into the output.
Chris

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output

method="html" doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"

doctype-system="http://www.w3.org/TR/html4/loose.dtd"

/>

<xsl:template match="@*"<xsl:attribute name="{name()}" <xsl:value-of select="." /</xsl:attribute>

</xsl:template>

<xsl:template match="*"

<xsl:element name="{name()}"<xsl:apply-templates select="@* | node()" /</xsl:element

</xsl:template<xsl:template match="comment()"<xsl:copy /</xsl:template>

</xsl:stylesheet>

Mar 22 '07 #1
21 4560
In our last episode, <et***********@news2.ipartners.pl>, the lovely and
talented Køi¹tof ®elechovski broadcast on
comp.infosystems.www.authoring.html:
It is common knowledge that XHTML is better HTML
No, it isn't.

--
Lars Eighner <http://larseighner.com/ <http://myspace.com/larseighner>
Countdown: 670 days to go.
Mar 22 '07 #2

Uzytkownik "Lars Eighner" <us****@larseighner.comnapisal w wiadomosci news:sl*******************@goodwill.larseighner.co m...
In our last episode, <et***********@news2.ipartners.pl>, the lovely and
talented Køi¹tof ®elechovski broadcast on
comp.infosystems.www.authoring.html:
>It is common knowledge that XHTML is better HTML
No, it isn't.
It is not better HTML but it is a common opinion.
Chris
Mar 22 '07 #3
Křištof Želechovski ,comp.infosystems.www.authoring.html:
doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
doctype-system="http://www.w3.org/TR/html4/loose.dtd"
Ideally, you should keep the DTD kind (Transitional/Strict) from the
XHTML document, but there is unfortunately no way to do this in XSLT.
<xsl:template match="@*"<xsl:attribute name="{name()}" >
<xsl:value-of select="." /</xsl:attribute</xsl:template>
I am not sure this is the good strategy: actually, you'd probably want to
keep only attributes in the default namespace (so as to remove
xml:lang/xml:space, for instance):

<xsl:template match="@*[namespace-uri()='']">
<xsl:copy />
</xsl:template>

<xsl:template match="@*" />
<xsl:template match="*">

<xsl:element name="{name()}"<xsl:apply-templates select="@* | node()" /</xsl:element>

</xsl:template>
Same here, you probably want to keep only elements of the xhtml
namespace; note also the use of local-name() instead of name(), in case
your original XHTML document use namespace prefixes:

<xsl:template match="*[namespace-uri()='http://www.w3.org/1999/xhtml']">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="@*|node()" />
</xsl:element>
</xsl:template>

Untested.
Mar 22 '07 #4
Scripsit Kristof Zelechovski:
>>It is common knowledge that XHTML is better HTML

No, it isn't.

It is not better HTML but it is a common opinion.
If you don't know the difference between knowledge and opinion, I suggest
that you postpone further participation in public discussions until you do.
That's just my opinion; all people have the right to ridicule themselves in
public.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Mar 22 '07 #5

Uzytkownik "Jukka K. Korpela" <jk******@cs.tut.finapisal w wiadomosci news:9k*******************@reader1.news.saunalahti .fi...
Scripsit Kristof Zelechovski:
>>>It is common knowledge that XHTML is better HTML

No, it isn't.

It is not better HTML but it is a common opinion.
If you don't know the difference between knowledge and opinion, I suggest
that you postpone further participation in public discussions until you do.
That's just my opinion; all people have the right to ridicule themselves in
public.
I have not ridiculed myself. You are trying to ridicule me. Have fun.
Chris
Mar 22 '07 #6

Użytkownik "Pierre Senellart" <in*****@invalid.invalidnapisał w wiadomości news:et***********@nef.ens.fr...
Křištof Želechovski ,comp.infosystems.www.authoring.html:
>doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
doctype-system="http://www.w3.org/TR/html4/loose.dtd"
Ideally, you should keep the DTD kind (Transitional/Strict) from the
XHTML document, but there is unfortunately no way to do this in XSLT.
Since all my pages are transitional, I do not have such a problem.
It seems uncommon to have some pages transitional and some pages strictly conformant;
you usually decide one way or the other.
><xsl:template match="@*"<xsl:attribute name="{name()}" >
<xsl:value-of select="." /</xsl:attribute</xsl:template>
I am not sure this is the good strategy: actually, you'd probably want to
keep only attributes in the default namespace (so as to remove
xml:lang/xml:space, for instance):
Good point.
<xsl:template match="@*[namespace-uri()='']">
<xsl:copy />
As I have already noticed, copy does not work because it leaves the namespace qualification,
as in ‘xhtml:clear="none"’, and does not remove the default value, as in ‘xhtml:restricted="restricted"’.
</xsl:template>

<xsl:template match="@*" />
><xsl:template match="*"

<xsl:element name="{name()}"<xsl:apply-templates select="@* | node()" /</xsl:element

</xsl:template>
Same here, you probably want to keep only elements of the xhtml
namespace; note also the use of local-name() instead of name(), in case
your original XHTML document use namespace prefixes:
It does not, but using the local name does not harm either.
While custom elements should not make it to the output, you have to decide what to do with them if you use them.
Skipping them altogether is one possibility, but I can imagine it need not be the best solution.
On the other hand, if you let them pass through, the validation fails
and at least you know you are loosing information.
<xsl:template match="*[namespace-uri()='http://www.w3.org/1999/xhtml']">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="@*|node()" />
</xsl:element>
</xsl:template>

Untested.
Thanks a lot.
Can anyone make a comment why no public resource for this transformation is available at the WWW Corp.?
Chris
Mar 22 '07 #7
On Thu, 22 Mar 2007 07:52:23 +0100
Křištof Želechovski <gi******@stegny.2a.plwrote:
It is common knowledge that XHTML is better HTML
Sinc we don't know you here, it's not so easy to judge whether
that's intended to be ironic. But you've already been bitten
by those who take what you said literally.
> This raises the question
of downgrading XHTML to HTML.
If you so wish.
> I could not find any relevant
instruction at the WWW Corporation so I decided I have to roll my own
with XSLT.
XSLT is an extremely inefficient way to do this (parsing an entire
document to an in-memory tree is inherently very expensive).
Far better to use SAX. There are modules for Apache that will let
you transform between HTML and XHTML on the fly, going in
whichever direction you please. Not that I'd recommend using them
unless you have an existing need to parse the markup.

--
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/
Mar 22 '07 #8
Nick Kew <ni**@grimnir.webthing.comwrites:
On Thu, 22 Mar 2007 07:52:23 +0100
Křištof Želechovski <gi******@stegny.2a.plwrote:
>> This raises the question
of downgrading XHTML to HTML.

If you so wish.
>> I could not find any relevant
instruction at the WWW Corporation so I decided I have to roll my own
with XSLT.

XSLT is an extremely inefficient way to do this (parsing an entire
document to an in-memory tree is inherently very expensive).
Far better to use SAX.
Even SAX seems an awfully convoluted way to do this... Why not just use tidy?

tidy -ashtml infile.xhtml outfile.html

sherm--

--
Web Hosting by West Virginians, for West Virginians: http://wv-www.net
Cocoa programming in Perl: http://camelbones.sourceforge.net
Mar 22 '07 #9
Rob
Jukka K. Korpela schreef:
Scripsit Kristof Zelechovski:
>>>It is common knowledge that XHTML is better HTML

No, it isn't.

It is not better HTML but it is a common opinion.

If you don't know the difference between knowledge and opinion, I
suggest that you postpone further participation in public discussions
until you do. That's just my opinion; all people have the right to
ridicule themselves in public.
I think Kristof was very subtly telling the OP that it is more a matter
of opinion than of knowledge.

Rob
Mar 22 '07 #10
On Thu, 22 Mar 2007 08:43:55 -0400
Sherm Pendley <sp******@dot-app.orgwrote:
XSLT is an extremely inefficient way to do this (parsing an entire
document to an in-memory tree is inherently very expensive).
Far better to use SAX.

Even SAX seems an awfully convoluted way to do this... Why not just
use tidy?

tidy -ashtml infile.xhtml outfile.html
Because tidy parses to an in-memory tree. Which is, as I said,
hugely expensive.

Yes of course, if all you need is a commandline tool for processing
static files, then that's fine: you have hundreds of trivial solutions
to choose from. But if you want to do anything more interesting
like process outgoing content in a server, you want something
more efficient. In the case of Apache, the lack of a parseChunk
API makes tidy even more expensive than XSLT for this.

--
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/
Mar 22 '07 #11
On Thu, 22 Mar 2007, Køi¹tof ®elechovski wrote:
It is common knowledge that XHTML is better HTML and you can serve XHTML content as HTML.
It is common knowledge that vodka is better whisky and you can serve
vodka content in whisky glasses.

--
In memoriam Alan J. Flavell
http://groups.google.com/groups/sear...Alan.J.Flavell
Mar 22 '07 #12

Uzytkownik "Rob" <ro************@hotmail.comnapisal w wiadomosci news:46*********************@news.xs4all.nl...
Jukka K. Korpela schreef:
>Scripsit Kristof Zelechovski:
>>>>It is common knowledge that XHTML is better HTML

No, it isn't.

It is not better HTML but it is a common opinion.

If you don't know the difference between knowledge and opinion, I
suggest that you postpone further participation in public discussions
until you do. That's just my opinion; all people have the right to
ridicule themselves in public.
I think Kristof was very subtly telling the OP that it is more a matter
of opinion than of knowledge.
That is, I was subtly telling it to myself, because I am the OP.
Chris
Mar 22 '07 #13
Kristof Zelechovski schreef:
Uzytkownik "Rob" <ro************@hotmail.comnapisal w wiadomosci news:46*********************@news.xs4all.nl...
>Jukka K. Korpela schreef:
>>Scripsit Kristof Zelechovski:

>It is common knowledge that XHTML is better HTML
No, it isn't.
It is not better HTML but it is a common opinion.
If you don't know the difference between knowledge and opinion, I
suggest that you postpone further participation in public discussions
until you do. That's just my opinion; all people have the right to
ridicule themselves in public.
I think Kristof was very subtly telling the OP that it is more a matter
of opinion than of knowledge.

That is, I was subtly telling it to myself, because I am the OP.
Chris
Oops

--
Rob Waaijenberg
Mar 22 '07 #14
in message <et***********@news2.ipartners.pl>, Křištof Želechovski
('g*******@stegny.2a.pl') wrote:
It is common knowledge that XHTML is better HTML and you can serve XHTML
content as HTML. However, the second statement is incorrect, for various
reasons; it is enough to say that the HTML validator does not tolerate
XML-style empty tags. It seems serving XHTML to the browser is of no
advantage and can cause serious problems if the browser does not
understand the difference. This raises the question of downgrading XHTML
to HTML. I could not find any relevant instruction at the WWW Corporation
so I decided I have to roll my own with XSLT. I attach the XSLT code and
I kindly ask for comments (because I am a novice in this area). Please
note that all tags and attributes have to be copied stripping the
napespace; <xsl:copydoes not work as expected because I get <br></br>
instead of <bronly. I decided to copy the comments explicitly in order
to be able to embed Internet Explorer conditional inclusion comments into
the output. Chris
My comments:

(1) there is no point in doing this, XHTML is not broken.
(2) if you did want to do this, the stylesheet you would need would be:

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml">

<xsl:output indent="yes" method="html"
doctype-public="-//W3C//DTD HTML 4.01 Strict//EN"
doctype-system="http://www.w3.org/TR/html4/strict.dtd"/>

<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
--
si***@jasmine.org.uk (Simon Brooke) http://www.jasmine.org.uk/~simon/

;; It's dangerous to be right when the government is wrong.
;; Voltaire RIP Dr David Kelly 1945-2004
Mar 23 '07 #15

Użytkownik "Simon Brooke" <si***@jasmine.org.uknapisał w wiadomości news:19************@gododdin.internal.jasmine.org. uk...
in message <et***********@news2.ipartners.pl>, Křištof Želechovski
('g*******@stegny.2a.pl') wrote:
>It is common knowledge that XHTML is better HTML and you can serve XHTML
content as HTML. However, the second statement is incorrect, for various
reasons; it is enough to say that the HTML validator does not tolerate
XML-style empty tags. It seems serving XHTML to the browser is of no
advantage and can cause serious problems if the browser does not
understand the difference. This raises the question of downgrading XHTML
to HTML. I could not find any relevant instruction at the WWW Corporation
so I decided I have to roll my own with XSLT. I attach the XSLT code and
I kindly ask for comments (because I am a novice in this area). Please
note that all tags and attributes have to be copied stripping the
napespace; <xsl:copydoes not work as expected because I get <br></br>
instead of <bronly. I decided to copy the comments explicitly in order
to be able to embed Internet Explorer conditional inclusion comments into
the output. Chris
My comments:

(1) there is no point in doing this, XHTML is not broken.
(2) if you did want to do this, the stylesheet you would need would be:

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml">

<xsl:output indent="yes" method="html"
doctype-public="-//W3C//DTD HTML 4.01 Strict//EN"
doctype-system="http://www.w3.org/TR/html4/strict.dtd"/>

<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Have you read my post? Your script was my first attempt but it does not work: it produces <br</br>.
Chris
Mar 23 '07 #16
On 23 Mar, 20:13, Křištof Želechovski <giecr...@stegny.2a.plwrote:
Your script was my first attempt but it does not work: it produces <br</br>.
It doesn't (although it's not my script BTW). The script produces a
"br element" in the result tree. _Your_ XSLT transform engine and
serialiser turns that into either <br></br<br /or <brin the
output document, according to its own behaviour amd the output method
you specified.

If the script uses this output method
<xsl:output method="html" />
and the transform returns either XML version, then your local
transform engine is behaving. The script is correct.

Mar 26 '07 #17
Sorry for top-posting, my Outlook Express failed to quote.

What do you mean by ‘my local transform engine is behaving’?
Upon assumption that you meant ‘misbehaving’, I do not think it is.
The output method "html" knows about specific HTML tags only;
the "br" you get from "xsl:copy" is not a HTML tag and the engine has no clue
whether it should have a closing tag or not so it applies the default.

Chris

Użytkownik "Andy Dingley" <di*****@codesmiths.comnapisał w wiadomości news:11**********************@o5g2000hsb.googlegro ups.com...
On 23 Mar, 20:13, Křištof Želechovski <giecr...@stegny.2a.plwrote:
Your script was my first attempt but it does not work: it produces <br</br>.
It doesn't (although it's not my script BTW). The script produces a
"br element" in the result tree. _Your_ XSLT transform engine and
serialiser turns that into either <br></br<br /or <brin the
output document, according to its own behaviour amd the output method
you specified.

If the script uses this output method
<xsl:output method="html" />
and the transform returns either XML version, then your local
transform engine is behaving. The script is correct.

Mar 26 '07 #18
On 26 Mar, 12:24, Křištof Želechovski <giecr...@stegny.2a.plwrote:
What do you mean by ‘my local transform engine is behaving’?
I mean that either the library you use to do XSLT transformations is
in error, or possibly that your input document has an error. I can
think of no script error where a script as we describe here (using
HTML output method) will generate <br></brrather than <br>

The output method "html" knows about specific HTML tags only;
the "br" you get from "xsl:copy" is not a HTML tag
Then it ought to be a HTML element. What namespace is it from? For a
simple copy operation, what's its namespace in the source document?
Mar 26 '07 #19
The original document is in XHTML. xls:copy creates an XHTML element, as instructed.
The XHTML element needs a closing tag, as specified in the DTD.
Since the output language is HTML, the transformation creates an empty element with a stand-alone closing tag.
Everything is regular and conformant to me.
SorForTopPostButMyOutlExprFailToQuotAgain.
Chris

Użytkownik "Andy Dingley" <di*****@codesmiths.comnapisał w wiadomości news:11*********************@b75g2000hsg.googlegro ups.com...
On 26 Mar, 12:24, Křištof Želechovski <giecr...@stegny.2a.plwrote:
What do you mean by ‘my local transform engine is behaving’?
I mean that either the library you use to do XSLT transformations is
in error, or possibly that your input document has an error. I can
think of no script error where a script as we describe here (using
HTML output method) will generate <br></brrather than <br>

The output method "html" knows about specific HTML tags only;
the "br" you get from "xsl:copy" is not a HTML tag
Then it ought to be a HTML element. What namespace is it from? For a
simple copy operation, what's its namespace in the source document?
Mar 27 '07 #20
On 27 Mar, 10:25, Křištof Želechovski <giecr...@stegny.2a.plwrote:
The original document is in XHTML. xls:copy creates an XHTML element, asinstructed.
Interesting explanation of this was just posted to comp.text.xml
(thanks Bjoern)
<http://groups.google.co.uk/group/comp.text.xml/msg/117f441f0b6d5f4c>

An example of an "XHTML to HTML copy" stylesheet was also posted
<http://www.bjoernsworld.de/temp/xhtml-to-html4.xslt>
It seems that the behaviour I'd been expecting _only_ occurs if the
input XHTML is the default namespace but _isn't_ bound to a namespace
URI. If it is namespaced, or if the default namespace URI is
declared, then XSLT should treat this as XML output, no matter what
the output method.

If you have bound the input namespace, then you can't use <xsl:copy>.
You need to use <xsl:element name="local-name()" /instead.

OK, so today I learned something. 8-)

Mar 27 '07 #21
You do not understand. The output format *is* HTML, only the elements are not.
You would get <br /under XML.
Chris

Użytkownik "Andy Dingley" <di*****@codesmiths.comnapisał w wiadomości news:11********************@n76g2000hsh.googlegrou ps.com...
On 27 Mar, 10:25, Křištof Želechovski <giecr...@stegny.2a.plwrote:
The original document is in XHTML. xls:copy creates an XHTML element, as instructed.
Interesting explanation of this was just posted to comp.text.xml
(thanks Bjoern)
<http://groups.google.co.uk/group/comp.text.xml/msg/117f441f0b6d5f4c>

An example of an "XHTML to HTML copy" stylesheet was also posted
<http://www.bjoernsworld.de/temp/xhtml-to-html4.xslt>
It seems that the behaviour I'd been expecting _only_ occurs if the
input XHTML is the default namespace but _isn't_ bound to a namespace
URI. If it is namespaced, or if the default namespace URI is
declared, then XSLT should treat this as XML output, no matter what
the output method.

If you have bound the input namespace, then you can't use <xsl:copy>.
You need to use <xsl:element name="local-name()" /instead.

OK, so today I learned something. 8-)

Mar 28 '07 #22

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Tristan Miller | last post by:
Greetings. I would like to produce a static multilingual website in XHTML. Is it possible to specify each web page in its own XML file, but have all of the translations encapsulated in that one...
3
by: johkar | last post by:
What is the proper doctype syntax for XHTML transitional??? Strict? How come XSLT doesn't preserve XHTML when it is compiled (Xalan)? Meaning, <br /> tags become <br> <input /> tags become...
6
by: mike | last post by:
regards: Is it possible to use java XML api to translate a HTML document into a XHTML document(mobile profile) thank you best wishes
23
by: Mikko Ohtamaa | last post by:
From XML specification: The representation of an empty element is either a start-tag immediately followed by an end-tag, or an empty-element tag. (This means that <foo></foo> is equal to...
12
by: hawat.thufir | last post by:
I'm trying do some "screen scraping", and am using <http://www.oreilly.com/catalog/xmlhks/> for inspiration. First I'd like to convert XHTML to XML, or extract XML from XHTML, I'm not sure how...
12
by: Pierre Senellart | last post by:
I am going to teach a basic Web design course (fundamentals of HTML/CSS, plus some basic client-side (JavaScript) and server-side (PHP, perhaps XSLT) scripting). Most of the students do not have...
3
by: Martin Olson | last post by:
I'm trying to output valid xhtml 1.0 transitional with xslt -- my question is when dealing with elements that have self-closing tags such as <img /> and <input />... I'm getting closing tags on...
9
by: anupamjain | last post by:
Hi, After 2 weeks of search/hit-and-trial I finally thought to revert to the group to find solution to my problem.(something I should have done much earlier) This is the deal : On a JSP...
15
by: Zhang Weiwu | last post by:
http://www.w3.org/MarkUp/2004/xhtml-faq provided a trick to serve xhtml webpage to IE as application/xml I used that trick and now every one of my xhtml webpage have following first 4 starting...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.