By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,723 Members | 1,655 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,723 IT Pros & Developers. It's quick & easy.

RFC: From XHTML to HTML via XSLT

P: n/a
It is common knowledge that XHTML is better HTML and you can serve XHTML content as HTML.
However, the second statement is incorrect, for various reasons;
it is enough to say that the HTML validator does not tolerate XML-style empty tags.
It seems serving XHTML to the browser is of no advantage and can cause serious problems if the browser does not understand the difference.
This raises the question of downgrading XHTML to HTML.
I could not find any relevant instruction at the WWW Corporation so I decided I have to roll my own with XSLT.
I attach the XSLT code and I kindly ask for comments (because I am a novice in this area).
Please note that all tags and attributes have to be copied stripping the napespace;
<xsl:copydoes not work as expected because I get <br></brinstead of <bronly.
I decided to copy the comments explicitly
in order to be able to embed Internet Explorer conditional inclusion comments into the output.
Chris

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output

method="html" doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"

doctype-system="http://www.w3.org/TR/html4/loose.dtd"

/>

<xsl:template match="@*"<xsl:attribute name="{name()}" <xsl:value-of select="." /</xsl:attribute>

</xsl:template>

<xsl:template match="*"

<xsl:element name="{name()}"<xsl:apply-templates select="@* | node()" /</xsl:element

</xsl:template<xsl:template match="comment()"<xsl:copy /</xsl:template>

</xsl:stylesheet>

Mar 22 '07 #1
Share this Question
Share on Google+
21 Replies


P: n/a
In our last episode, <et***********@news2.ipartners.pl>, the lovely and
talented Køi¹tof ®elechovski broadcast on
comp.infosystems.www.authoring.html:
It is common knowledge that XHTML is better HTML
No, it isn't.

--
Lars Eighner <http://larseighner.com/ <http://myspace.com/larseighner>
Countdown: 670 days to go.
Mar 22 '07 #2

P: n/a

Uzytkownik "Lars Eighner" <us****@larseighner.comnapisal w wiadomosci news:sl*******************@goodwill.larseighner.co m...
In our last episode, <et***********@news2.ipartners.pl>, the lovely and
talented Køi¹tof ®elechovski broadcast on
comp.infosystems.www.authoring.html:
>It is common knowledge that XHTML is better HTML
No, it isn't.
It is not better HTML but it is a common opinion.
Chris
Mar 22 '07 #3

P: n/a
Křištof Želechovski ,comp.infosystems.www.authoring.html:
doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
doctype-system="http://www.w3.org/TR/html4/loose.dtd"
Ideally, you should keep the DTD kind (Transitional/Strict) from the
XHTML document, but there is unfortunately no way to do this in XSLT.
<xsl:template match="@*"<xsl:attribute name="{name()}" >
<xsl:value-of select="." /</xsl:attribute</xsl:template>
I am not sure this is the good strategy: actually, you'd probably want to
keep only attributes in the default namespace (so as to remove
xml:lang/xml:space, for instance):

<xsl:template match="@*[namespace-uri()='']">
<xsl:copy />
</xsl:template>

<xsl:template match="@*" />
<xsl:template match="*">

<xsl:element name="{name()}"<xsl:apply-templates select="@* | node()" /</xsl:element>

</xsl:template>
Same here, you probably want to keep only elements of the xhtml
namespace; note also the use of local-name() instead of name(), in case
your original XHTML document use namespace prefixes:

<xsl:template match="*[namespace-uri()='http://www.w3.org/1999/xhtml']">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="@*|node()" />
</xsl:element>
</xsl:template>

Untested.
Mar 22 '07 #4

P: n/a
Scripsit Kristof Zelechovski:
>>It is common knowledge that XHTML is better HTML

No, it isn't.

It is not better HTML but it is a common opinion.
If you don't know the difference between knowledge and opinion, I suggest
that you postpone further participation in public discussions until you do.
That's just my opinion; all people have the right to ridicule themselves in
public.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Mar 22 '07 #5

P: n/a

Uzytkownik "Jukka K. Korpela" <jk******@cs.tut.finapisal w wiadomosci news:9k*******************@reader1.news.saunalahti .fi...
Scripsit Kristof Zelechovski:
>>>It is common knowledge that XHTML is better HTML

No, it isn't.

It is not better HTML but it is a common opinion.
If you don't know the difference between knowledge and opinion, I suggest
that you postpone further participation in public discussions until you do.
That's just my opinion; all people have the right to ridicule themselves in
public.
I have not ridiculed myself. You are trying to ridicule me. Have fun.
Chris
Mar 22 '07 #6

P: n/a

Użytkownik "Pierre Senellart" <in*****@invalid.invalidnapisał w wiadomości news:et***********@nef.ens.fr...
Křištof Želechovski ,comp.infosystems.www.authoring.html:
>doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
doctype-system="http://www.w3.org/TR/html4/loose.dtd"
Ideally, you should keep the DTD kind (Transitional/Strict) from the
XHTML document, but there is unfortunately no way to do this in XSLT.
Since all my pages are transitional, I do not have such a problem.
It seems uncommon to have some pages transitional and some pages strictly conformant;
you usually decide one way or the other.
><xsl:template match="@*"<xsl:attribute name="{name()}" >
<xsl:value-of select="." /</xsl:attribute</xsl:template>
I am not sure this is the good strategy: actually, you'd probably want to
keep only attributes in the default namespace (so as to remove
xml:lang/xml:space, for instance):
Good point.
<xsl:template match="@*[namespace-uri()='']">
<xsl:copy />
As I have already noticed, copy does not work because it leaves the namespace qualification,
as in ‘xhtml:clear="none"’, and does not remove the default value, as in ‘xhtml:restricted="restricted"’.
</xsl:template>

<xsl:template match="@*" />
><xsl:template match="*"

<xsl:element name="{name()}"<xsl:apply-templates select="@* | node()" /</xsl:element

</xsl:template>
Same here, you probably want to keep only elements of the xhtml
namespace; note also the use of local-name() instead of name(), in case
your original XHTML document use namespace prefixes:
It does not, but using the local name does not harm either.
While custom elements should not make it to the output, you have to decide what to do with them if you use them.
Skipping them altogether is one possibility, but I can imagine it need not be the best solution.
On the other hand, if you let them pass through, the validation fails
and at least you know you are loosing information.
<xsl:template match="*[namespace-uri()='http://www.w3.org/1999/xhtml']">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="@*|node()" />
</xsl:element>
</xsl:template>

Untested.
Thanks a lot.
Can anyone make a comment why no public resource for this transformation is available at the WWW Corp.?
Chris
Mar 22 '07 #7

P: n/a
On Thu, 22 Mar 2007 07:52:23 +0100
Křištof Želechovski <gi******@stegny.2a.plwrote:
It is common knowledge that XHTML is better HTML
Sinc we don't know you here, it's not so easy to judge whether
that's intended to be ironic. But you've already been bitten
by those who take what you said literally.
> This raises the question
of downgrading XHTML to HTML.
If you so wish.
> I could not find any relevant
instruction at the WWW Corporation so I decided I have to roll my own
with XSLT.
XSLT is an extremely inefficient way to do this (parsing an entire
document to an in-memory tree is inherently very expensive).
Far better to use SAX. There are modules for Apache that will let
you transform between HTML and XHTML on the fly, going in
whichever direction you please. Not that I'd recommend using them
unless you have an existing need to parse the markup.

--
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/
Mar 22 '07 #8

P: n/a
Nick Kew <ni**@grimnir.webthing.comwrites:
On Thu, 22 Mar 2007 07:52:23 +0100
Křištof Želechovski <gi******@stegny.2a.plwrote:
>> This raises the question
of downgrading XHTML to HTML.

If you so wish.
>> I could not find any relevant
instruction at the WWW Corporation so I decided I have to roll my own
with XSLT.

XSLT is an extremely inefficient way to do this (parsing an entire
document to an in-memory tree is inherently very expensive).
Far better to use SAX.
Even SAX seems an awfully convoluted way to do this... Why not just use tidy?

tidy -ashtml infile.xhtml outfile.html

sherm--

--
Web Hosting by West Virginians, for West Virginians: http://wv-www.net
Cocoa programming in Perl: http://camelbones.sourceforge.net
Mar 22 '07 #9

P: n/a
Rob
Jukka K. Korpela schreef:
Scripsit Kristof Zelechovski:
>>>It is common knowledge that XHTML is better HTML

No, it isn't.

It is not better HTML but it is a common opinion.

If you don't know the difference between knowledge and opinion, I
suggest that you postpone further participation in public discussions
until you do. That's just my opinion; all people have the right to
ridicule themselves in public.
I think Kristof was very subtly telling the OP that it is more a matter
of opinion than of knowledge.

Rob
Mar 22 '07 #10

P: n/a
On Thu, 22 Mar 2007 08:43:55 -0400
Sherm Pendley <sp******@dot-app.orgwrote:
XSLT is an extremely inefficient way to do this (parsing an entire
document to an in-memory tree is inherently very expensive).
Far better to use SAX.

Even SAX seems an awfully convoluted way to do this... Why not just
use tidy?

tidy -ashtml infile.xhtml outfile.html
Because tidy parses to an in-memory tree. Which is, as I said,
hugely expensive.

Yes of course, if all you need is a commandline tool for processing
static files, then that's fine: you have hundreds of trivial solutions
to choose from. But if you want to do anything more interesting
like process outgoing content in a server, you want something
more efficient. In the case of Apache, the lack of a parseChunk
API makes tidy even more expensive than XSLT for this.

--
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/
Mar 22 '07 #11

P: n/a
On Thu, 22 Mar 2007, Køi¹tof ®elechovski wrote:
It is common knowledge that XHTML is better HTML and you can serve XHTML content as HTML.
It is common knowledge that vodka is better whisky and you can serve
vodka content in whisky glasses.

--
In memoriam Alan J. Flavell
http://groups.google.com/groups/sear...Alan.J.Flavell
Mar 22 '07 #12

P: n/a

Uzytkownik "Rob" <ro************@hotmail.comnapisal w wiadomosci news:46*********************@news.xs4all.nl...
Jukka K. Korpela schreef:
>Scripsit Kristof Zelechovski:
>>>>It is common knowledge that XHTML is better HTML

No, it isn't.

It is not better HTML but it is a common opinion.

If you don't know the difference between knowledge and opinion, I
suggest that you postpone further participation in public discussions
until you do. That's just my opinion; all people have the right to
ridicule themselves in public.
I think Kristof was very subtly telling the OP that it is more a matter
of opinion than of knowledge.
That is, I was subtly telling it to myself, because I am the OP.
Chris
Mar 22 '07 #13

P: n/a
Kristof Zelechovski schreef:
Uzytkownik "Rob" <ro************@hotmail.comnapisal w wiadomosci news:46*********************@news.xs4all.nl...
>Jukka K. Korpela schreef:
>>Scripsit Kristof Zelechovski:

>It is common knowledge that XHTML is better HTML
No, it isn't.
It is not better HTML but it is a common opinion.
If you don't know the difference between knowledge and opinion, I
suggest that you postpone further participation in public discussions
until you do. That's just my opinion; all people have the right to
ridicule themselves in public.
I think Kristof was very subtly telling the OP that it is more a matter
of opinion than of knowledge.

That is, I was subtly telling it to myself, because I am the OP.
Chris
Oops

--
Rob Waaijenberg
Mar 22 '07 #14

P: n/a
in message <et***********@news2.ipartners.pl>, Křištof Želechovski
('g*******@stegny.2a.pl') wrote:
It is common knowledge that XHTML is better HTML and you can serve XHTML
content as HTML. However, the second statement is incorrect, for various
reasons; it is enough to say that the HTML validator does not tolerate
XML-style empty tags. It seems serving XHTML to the browser is of no
advantage and can cause serious problems if the browser does not
understand the difference. This raises the question of downgrading XHTML
to HTML. I could not find any relevant instruction at the WWW Corporation
so I decided I have to roll my own with XSLT. I attach the XSLT code and
I kindly ask for comments (because I am a novice in this area). Please
note that all tags and attributes have to be copied stripping the
napespace; <xsl:copydoes not work as expected because I get <br></br>
instead of <bronly. I decided to copy the comments explicitly in order
to be able to embed Internet Explorer conditional inclusion comments into
the output. Chris
My comments:

(1) there is no point in doing this, XHTML is not broken.
(2) if you did want to do this, the stylesheet you would need would be:

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml">

<xsl:output indent="yes" method="html"
doctype-public="-//W3C//DTD HTML 4.01 Strict//EN"
doctype-system="http://www.w3.org/TR/html4/strict.dtd"/>

<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
--
si***@jasmine.org.uk (Simon Brooke) http://www.jasmine.org.uk/~simon/

;; It's dangerous to be right when the government is wrong.
;; Voltaire RIP Dr David Kelly 1945-2004
Mar 23 '07 #15

P: n/a

Użytkownik "Simon Brooke" <si***@jasmine.org.uknapisał w wiadomości news:19************@gododdin.internal.jasmine.org. uk...
in message <et***********@news2.ipartners.pl>, Křištof Želechovski
('g*******@stegny.2a.pl') wrote:
>It is common knowledge that XHTML is better HTML and you can serve XHTML
content as HTML. However, the second statement is incorrect, for various
reasons; it is enough to say that the HTML validator does not tolerate
XML-style empty tags. It seems serving XHTML to the browser is of no
advantage and can cause serious problems if the browser does not
understand the difference. This raises the question of downgrading XHTML
to HTML. I could not find any relevant instruction at the WWW Corporation
so I decided I have to roll my own with XSLT. I attach the XSLT code and
I kindly ask for comments (because I am a novice in this area). Please
note that all tags and attributes have to be copied stripping the
napespace; <xsl:copydoes not work as expected because I get <br></br>
instead of <bronly. I decided to copy the comments explicitly in order
to be able to embed Internet Explorer conditional inclusion comments into
the output. Chris
My comments:

(1) there is no point in doing this, XHTML is not broken.
(2) if you did want to do this, the stylesheet you would need would be:

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml">

<xsl:output indent="yes" method="html"
doctype-public="-//W3C//DTD HTML 4.01 Strict//EN"
doctype-system="http://www.w3.org/TR/html4/strict.dtd"/>

<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Have you read my post? Your script was my first attempt but it does not work: it produces <br</br>.
Chris
Mar 23 '07 #16

P: n/a
On 23 Mar, 20:13, Křištof Želechovski <giecr...@stegny.2a.plwrote:
Your script was my first attempt but it does not work: it produces <br</br>.
It doesn't (although it's not my script BTW). The script produces a
"br element" in the result tree. _Your_ XSLT transform engine and
serialiser turns that into either <br></br<br /or <brin the
output document, according to its own behaviour amd the output method
you specified.

If the script uses this output method
<xsl:output method="html" />
and the transform returns either XML version, then your local
transform engine is behaving. The script is correct.

Mar 26 '07 #17

P: n/a
Sorry for top-posting, my Outlook Express failed to quote.

What do you mean by ‘my local transform engine is behaving’?
Upon assumption that you meant ‘misbehaving’, I do not think it is.
The output method "html" knows about specific HTML tags only;
the "br" you get from "xsl:copy" is not a HTML tag and the engine has no clue
whether it should have a closing tag or not so it applies the default.

Chris

Użytkownik "Andy Dingley" <di*****@codesmiths.comnapisał w wiadomości news:11**********************@o5g2000hsb.googlegro ups.com...
On 23 Mar, 20:13, Křištof Želechovski <giecr...@stegny.2a.plwrote:
Your script was my first attempt but it does not work: it produces <br</br>.
It doesn't (although it's not my script BTW). The script produces a
"br element" in the result tree. _Your_ XSLT transform engine and
serialiser turns that into either <br></br<br /or <brin the
output document, according to its own behaviour amd the output method
you specified.

If the script uses this output method
<xsl:output method="html" />
and the transform returns either XML version, then your local
transform engine is behaving. The script is correct.

Mar 26 '07 #18

P: n/a
On 26 Mar, 12:24, Křištof Želechovski <giecr...@stegny.2a.plwrote:
What do you mean by ‘my local transform engine is behaving’?
I mean that either the library you use to do XSLT transformations is
in error, or possibly that your input document has an error. I can
think of no script error where a script as we describe here (using
HTML output method) will generate <br></brrather than <br>

The output method "html" knows about specific HTML tags only;
the "br" you get from "xsl:copy" is not a HTML tag
Then it ought to be a HTML element. What namespace is it from? For a
simple copy operation, what's its namespace in the source document?
Mar 26 '07 #19

P: n/a
The original document is in XHTML. xls:copy creates an XHTML element, as instructed.
The XHTML element needs a closing tag, as specified in the DTD.
Since the output language is HTML, the transformation creates an empty element with a stand-alone closing tag.
Everything is regular and conformant to me.
SorForTopPostButMyOutlExprFailToQuotAgain.
Chris

Użytkownik "Andy Dingley" <di*****@codesmiths.comnapisał w wiadomości news:11*********************@b75g2000hsg.googlegro ups.com...
On 26 Mar, 12:24, Křištof Želechovski <giecr...@stegny.2a.plwrote:
What do you mean by ‘my local transform engine is behaving’?
I mean that either the library you use to do XSLT transformations is
in error, or possibly that your input document has an error. I can
think of no script error where a script as we describe here (using
HTML output method) will generate <br></brrather than <br>

The output method "html" knows about specific HTML tags only;
the "br" you get from "xsl:copy" is not a HTML tag
Then it ought to be a HTML element. What namespace is it from? For a
simple copy operation, what's its namespace in the source document?
Mar 27 '07 #20

P: n/a
On 27 Mar, 10:25, Křištof Želechovski <giecr...@stegny.2a.plwrote:
The original document is in XHTML. xls:copy creates an XHTML element, asinstructed.
Interesting explanation of this was just posted to comp.text.xml
(thanks Bjoern)
<http://groups.google.co.uk/group/comp.text.xml/msg/117f441f0b6d5f4c>

An example of an "XHTML to HTML copy" stylesheet was also posted
<http://www.bjoernsworld.de/temp/xhtml-to-html4.xslt>
It seems that the behaviour I'd been expecting _only_ occurs if the
input XHTML is the default namespace but _isn't_ bound to a namespace
URI. If it is namespaced, or if the default namespace URI is
declared, then XSLT should treat this as XML output, no matter what
the output method.

If you have bound the input namespace, then you can't use <xsl:copy>.
You need to use <xsl:element name="local-name()" /instead.

OK, so today I learned something. 8-)

Mar 27 '07 #21

P: n/a
You do not understand. The output format *is* HTML, only the elements are not.
You would get <br /under XML.
Chris

Użytkownik "Andy Dingley" <di*****@codesmiths.comnapisał w wiadomości news:11********************@n76g2000hsh.googlegrou ps.com...
On 27 Mar, 10:25, Křištof Želechovski <giecr...@stegny.2a.plwrote:
The original document is in XHTML. xls:copy creates an XHTML element, as instructed.
Interesting explanation of this was just posted to comp.text.xml
(thanks Bjoern)
<http://groups.google.co.uk/group/comp.text.xml/msg/117f441f0b6d5f4c>

An example of an "XHTML to HTML copy" stylesheet was also posted
<http://www.bjoernsworld.de/temp/xhtml-to-html4.xslt>
It seems that the behaviour I'd been expecting _only_ occurs if the
input XHTML is the default namespace but _isn't_ bound to a namespace
URI. If it is namespaced, or if the default namespace URI is
declared, then XSLT should treat this as XML output, no matter what
the output method.

If you have bound the input namespace, then you can't use <xsl:copy>.
You need to use <xsl:element name="local-name()" /instead.

OK, so today I learned something. 8-)

Mar 28 '07 #22

This discussion thread is closed

Replies have been disabled for this discussion.