Opera guesses encoding for "application/xml"

Christoph Schneegans

Hi!

Okay, so positions on "text/html" XHTML are totally contradicting. Anyway!
I hope there's more consensus about "application/xml" XHTML.

I've recently learned that Opera 9.0b2 does not only evaluate HTTP header,
BOM and XML declaration to determine the character encoding of an XHTML
document sent as "application/xml", but also the "meta" element. For
example, <http://schneegans.de/sv/test-cases/?case=meta-only-encoding> is
rendered as "ÄŒeskÃ¡ republika". In contrast, Firefox displays "?esk?
republika", and IE even aborts parsing.

If you agree with me and think that this behavior is wrong, you might
want to post a follow-up to <news:op******************@news.opera.com>.

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 8 '06 #1

Subscribe Post Reply

3047

Spartanicus

Christoph Schneegans <Ch*******@Schneegans.de> wrote:

Okay, so positions on "text/html" XHTML are totally contradicting. Anyway!
I've yet to see anyone successfully defend a single argument for sending
XHTML as text/html. The consensus is that Appendix C was a mistake.
I hope there's more consensus about "application/xml" XHTML.

XHTML should be served as application/xhtml+xml.

--
Spartanicus

Jun 9 '06 #2

Garmt de Vries

On Fri, 09 Jun 2006 00:24:19 +0200, Christoph Schneegans
<Ch*******@Schneegans.de> wrote:

I've recently learned that Opera 9.0b2 does not only evaluate HTTP
header,
BOM and XML declaration to determine the character encoding of an XHTML
document sent as "application/xml", but also the "meta" element. For
example, <http://schneegans.de/sv/test-cases/?case=meta-only-encoding> is
rendered as "ÄŒeskÃ¡ republika". In contrast, Firefox displays "?esk?
republika", and IE even aborts parsing.

When I go to that page, I see:

Encoding from server (used by Opera):
iso-8859-2 (iso-8859-2)

Doesn't really look like "meta only"...

--
Garmt de Vries

Jun 9 '06 #3

Christoph Schneegans

"Spartanicus" wrote:

I've yet to see anyone successfully defend a single argument for
sending XHTML as text/html.
For XHTML? More powerful means for validation, simpler syntax. For
text/html? IE wouldn't support it otherwise.
The consensus is that Appendix C was a mistake.

Your consensus.

I hope there's more consensus about "application/xml" XHTML.

XHTML should be served as application/xhtml+xml.

That wouldn't change anything, and I guess you know that.

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 9 '06 #4

Christoph Schneegans

Garmt de Vries wrote:

<http://schneegans.de/sv/test-cases/?case=meta-only-encoding>

When I go to that page, I see:

Encoding from server (used by Opera):
iso-8859-2 (iso-8859-2)

Doesn't really look like "meta only"...

<http://web-sniffer.net/?url=http://schneegans.de/sv/test-cases/%3Fcase=meta-only-encoding>

When still in doubt, use Telnet.

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 9 '06 #5

Spartanicus

Christoph Schneegans <Ch*******@Schneegans.de> wrote:

I've yet to see anyone successfully defend a single argument for
sending XHTML as text/html.
For XHTML? More powerful means for validation,

Yawn, we've disproved that plenty of times
http://www.spartanicus.utvinternet.ie/no-xhtml.htm
simpler syntax.
Nonsense.
For text/html? IE wouldn't support it otherwise.

IE doesn't support XHTML period, falsely labeling it as text/html merely
prevents that from being demonstrated more clearly.

The consensus is that Appendix C was a mistake.

Your consensus.

The consensus amongst the members of this group. To date no-one has
managed to uphold an argument for serving XHTML as text/html, feel free
to confirm this via the archives.
http://groups.google.com/group/comp....ml+text%2Fhtml

I hope there's more consensus about "application/xml" XHTML.

XHTML should be served as application/xhtml+xml.

That wouldn't change anything, and I guess you know that.

You professed a hope for "consensus" about serving XHTML as
application/xml from. There is broad consensus, but it is to serve XHTML
as application/xhtml+xml as per w3c's recommendation:
http://www.w3.org/TR/xhtml-media-types/

Specifically:

3.3. 'application/xml'

The 'application/xml' media type [RFC3023] is a generic media type for
XML documents, and the definition of 'application/xml' does not preclude
serving XHTML documents as that media type. Any XHTML Family document
MAY be served as 'application/xml'.

However, authors should be aware that such a document may not always be
processed as XHTML (e.g. hyperlinks may not be recognized), depending on
user agents. Generic XML processors might recognize it as just an XML
document which includes elements and attributes from the XHTML namespace
(and others), and may not have a priori knowledge what to do with such a
document beyond they can do for generic XML documents.
http://www.w3.org/TR/xhtml-media-typ...pplication-xml

--
Spartanicus

Jun 9 '06 #6

Andreas Prilop

On Fri, 9 Jun 2006, Spartanicus wrote:

IE doesn't support XHTML period, falsely labeling it as text/html merely
prevents that from being demonstrated more clearly.

What means "support" or "doesn't support"?
http://www.unics.uni-hannover.de/nht...otation.x.html
http://www.unics.uni-hannover.de/nht...notation.xhtml

are identical resources - only the URL is different. Yet IE 6
behaves differently because one of the URLs ends in ".html".
*.x.html is displayed, *.xhtml is offered for download.
Silly IE!

Jun 9 '06 #7

Spartanicus

Andreas Prilop <nh******@rrzn-user.uni-hannover.de> wrote:

IE doesn't support XHTML period, falsely labeling it as text/html merely
prevents that from being demonstrated more clearly.

What means "support" or "doesn't support"?

Parse it as XHTML, not HTML.

--
Spartanicus

Jun 9 '06 #8

Christoph Schneegans

"Spartanicus" wrote:

For XHTML? More powerful means for validation,
Yawn, we've disproved that plenty of times
http://www.spartanicus.utvinternet.ie/no-xhtml.htm

You surely intended to point to
<http://www.spartanicus.utvinternet.ie/custom_dtd.htm>. I'd like to know
how you want to spot improper or error-prone markup such as



<hr>

...

with this custom DTD.

simpler syntax.

Nonsense.

Yeah,

</span

is obviously simpler syntax than



because it's shorter.
IE doesn't support XHTML period,
It doesn't support HTML either, see e.g.
<http://schneegans.de/web/xhtml/shorttag/>. Now I want you to present an
XHTML 1.0 document that conforms to Appendix C and is not supported by IE.
You professed a hope for "consensus" about serving XHTML as
application/xml from. There is broad consensus, but it is to serve
XHTML as application/xhtml+xml as per w3c's recommendation:
Just show me an user agent that supports "application/xhtml+xml" but does
not support "application/xml".
http://www.w3.org/TR/xhtml-media-types/

Did you really overlook
<http://www.w3.org/TR/xhtml-media-types/#text-html>?

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 9 '06 #9

Spartanicus

Christoph Schneegans <Ch*******@Schneegans.de> wrote:

For XHTML? More powerful means for validation,
Yawn, we've disproved that plenty of times
http://www.spartanicus.utvinternet.ie/no-xhtml.htm

You surely intended to point to
<http://www.spartanicus.utvinternet.ie/custom_dtd.htm>.

A misconception like "XHTML is stricter" is common. If you are not
specific and only say "More powerful means for validation", then all I
can do is point to the main page that includes a refute of this most
common misconception.
I'd like to know
how you want to spot improper or error-prone markup such as



<hr>

...

with this custom DTD.
All additional XHTML constraints can be emulated for HTML validation.
The resource you quoted makes no claim that it emulates all these
additional constraints, it merely demonstrates how it can be done using
a common subset.

Not all DTD validator checkable constraints are governed by the DTD.
This is noted on the quoted resource, and it links to an explanation of
how to change non DTD constraints, again using a common example.

That said, the practical value of being able to machine check even for
all additional constraints that are part of XHTML 1.x (some of which are
only part of XHTML 1.1 such as your "..."
example) is nil as long as the result is parsed by HTML clients.
Ultimately this invalidates all potential arguments in favour of XHTML.

simpler syntax.

Nonsense.

Yeah,

Again, if you want to make a point you need to be specific. Loose
unqualified remarks such as "simpler syntax" don't allow for a proper
response.
</span

is obviously simpler syntax than



because it's shorter.
You've lost me, are you suggesting that "</span" is
proper syntax and/or valid under XHTML?

IE doesn't support XHTML period,

It doesn't support HTML either, see e.g.
<http://schneegans.de/web/xhtml/shorttag/>.

Other flaws do not form an argument for a claim that IE supports XHTML.
Now I want you to present an
XHTML 1.0 document that conforms to Appendix C and is not supported by IE.
You professed a hope for "consensus" about serving XHTML as
application/xml from. There is broad consensus, but it is to serve
XHTML as application/xhtml+xml as per w3c's recommendation:

Just show me an user agent that supports "application/xhtml+xml" but does
not support "application/xml".

You are avoiding the point made that contrary to your claim that the
media type used made no difference, that a document served as
application/xml may not be recognized as XHTML.

http://www.w3.org/TR/xhtml-media-types/

Did you really overlook
<http://www.w3.org/TR/xhtml-media-types/#text-html>?

The consensus I referred to pertained to this group, it has rejected
serving XHTML as text/html in favour of the view that if XHTML is to be
used at all then it should be served as application/xhtml+xml

--
Spartanicus

Jun 9 '06 #10

Christoph Schneegans

"Spartanicus" wrote:

If you are not specific and only say "More powerful means for
validation", then all I can do is point to the main page that
includes a refute of this most common misconception.
Do you dispute the fact that XML Schema validation is more powerful than
DTD validation?
All additional XHTML constraints can be emulated for HTML validation.
Using freaky regular expressions?
This is noted on the quoted resource, and it links to an explanation of
how to change non DTD constraints, again using a common example.
Most authors don't want to create a custom DTD or a custom SGML
declaration. As long as there's no "strict" HTML validation service
available on the web, your remarks have no practical implications.
That said, the practical value of being able to machine check even for
all additional constraints that are part of XHTML 1.x (some of which are
only part of XHTML 1.1 such as your "..."
example) is nil as long as the result is parsed by HTML clients.
Thank you. Appendix C documents are parsed by HTML clients as well.
Loose unqualified remarks such as "simpler syntax" don't allow for a
proper response.
Do you dispute the fact that XML syntax is simpler than SGML syntax?
You've lost me, are you suggesting that "</span" is
proper syntax and/or valid under XHTML?
'</span' is valid HTML, ''
is the corresponding XHTML syntax. So which one is simpler IYO?
Other flaws do not form an argument for a claim that IE supports XHTML.
Nobody in this thread claims that IE supports XHTML.

Dou you dispute the fact that IE neither supports XHTML nor HTML?

Now I want you to present an XHTML 1.0 document that conforms to
Appendix C and is not supported by IE.

You forgot to answer this one.
You are avoiding the point made that contrary to your claim that the
media type used made no difference, that a document served as
application/xml may not be recognized as XHTML.

That's what the W3C says. It does not happen nevertheless.

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 9 '06 #11

Eric B. Bednarz

Christoph Schneegans <Ch*******@Schneegans.de> writes:

Most authors don't want to create a custom DTD or a custom SGML
declaration. As long as there's no "strict" HTML validation service
available on the web, your remarks have no practical implications.

Authors don't necessarily have to *create* them theirselves. For the
creation of static files the idea of online validation is totally lost
on me anyway. That should and easily can be part of the local authoring
process, whatever way you like it.

I know that you are also aware of the inherent SGML syntax features that
cannot be made XML(or even HTML UA)-compatible with any custom SGML
declaration, and I'm surprised that you didn't mention them :)

(For dynamic content -- any CMS of sorts -- I'd actually prefer XHTML
syntax, since even lowest-end Linux hosting usually comes with PHP
bundled with the expat library, and _trivially_ allows for verifying
fully-tagged input and some kind of homegrown content model restraints.)
As a personal note, I am a bit flabbergasted by the origin of discussion
at large. Any service that doesn't advertise itself inappropriately is
useful within the bounds of its documentation. While I actually do
believe that Christoph's service is not about to hit (and educate) its
target audience at all, the W3C markup validation service (in the SGML
sense, I won't even mention X(HT)ML) is deliberately aiming at the
clue-underprivileged and seldom criticised as such over here.
--
||| hexadecimal EBB
o-o decimal 3771
--oOo--( )--oOo-- octal 7273
205 goodbye binary 111010111011

Jun 9 '06 #12

Jim Moe

Christoph Schneegans wrote:

'</span' is valid HTML, [...]

HTML, yes. XHTML, no.
It is, however, meaningless. A parser ignores all the mystery components
and it reduces to ''.

--
jmm (hyphen) list (at) sohnen-moe (dot) com
(Remove .AXSPAMGN for email)

Jun 9 '06 #13

Christoph Schneegans

Jim Moe wrote:

'</span' is valid HTML, [...]
It is, however, meaningless.

It is exactly equivalent to '' in HTML. If
you don't believe it, feed <http://validator.w3.org/fragment-upload.html>
with

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<title></title>
</span

and

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<title></title>


and enable the "Show Parse Tree" option.
A parser ignores all the mystery components and it reduces to ''.

That depends on the parser.

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 9 '06 #14

Spartanicus

Christoph Schneegans <Ch*******@Schneegans.de> wrote:

If you are not specific and only say "More powerful means for
validation", then all I can do is point to the main page that
includes a refute of this most common misconception.
Do you dispute the fact that XML Schema validation is more powerful than
DTD validation?

Before changing to another topic, first let's finish off your original
claim; do you agree that XHTML DTD based validation is not "more
powerful" than HTML DTD validation?

All additional XHTML constraints can be emulated for HTML validation.

Using freaky regular expressions?

By editing the files used by the validator, this doesn't involve regular
expressions.

This is noted on the quoted resource, and it links to an explanation of
how to change non DTD constraints, again using a common example.

Most authors don't want to create a custom DTD or a custom SGML
declaration.

No-one has anything to gain from checking text/html documents for the
extra constraints that XHTML requires, by definition this includes "most
authors".
As long as there's no "strict" HTML validation service
available on the web, your remarks have no practical implications.
If you want to claim practical relevance then you need to acknowledge
the at best very limited value of validation in the whole, and the fact
that checking for the extra constraints has no benefit at all for
content served as text/html.

That said, I'm aware of at least one online HTML validator that has an
option to impose additional constraints (Nick Kew's Page Valet).

That said, the practical value of being able to machine check even for
all additional constraints that are part of XHTML 1.x (some of which are
only part of XHTML 1.1 such as your "..."
example) is nil as long as the result is parsed by HTML clients.

Thank you. Appendix C documents are parsed by HTML clients as well.

So do you now concede that there is no basis whatsoever for your
suggestion that checking text/html documents for the additional XHTML
constraints has any practical relevance or value?

Loose unqualified remarks such as "simpler syntax" don't allow for a
proper response.

Do you dispute the fact that XML syntax is simpler than SGML syntax?

Again before having another change of subject, I'm still trying to find
out what you meant by your original claim that XHTML syntax is simpler
than HTML syntax.

You've lost me, are you suggesting that "</span" is
proper syntax and/or valid under XHTML?

'</span' is valid HTML, ''
is the corresponding XHTML syntax. So which one is simpler IYO?

What you refer to as "valid HTML" is an error that isn't picked up by a
validator using the public DTD. This does not form an argument to
declare XHTML "simpler" (strange way to raise a point about a certain
HTML error being missed). If the fact that this error isn't flagged
under the public DTD bothers you, it is easily fixed, "</span" doesn't validate in my HTML validation process.

Other flaws do not form an argument for a claim that IE supports XHTML.

Nobody in this thread claims that IE supports XHTML.

You have a short or selective memory:

For XHTML? More powerful means for validation, simpler syntax. For
text/html? IE wouldn't support it otherwise. Dou you dispute the fact that IE neither supports XHTML nor HTML?

Again: other flaws such as possibly not parsing a certain HTML construct
correctly do not form an argument for your claim that IE supports XHTML.
Now I want you to present an XHTML 1.0 document that conforms to
Appendix C and is not supported by IE.
You forgot to answer this one.

Again: IE does not support XHTML at all. Like almost all other HTML
parsers IE's error recovery mechanism allows XHTML served as text/html
to be parsed as pseudo HTML without necessarily causing problems. This
does not demonstrate "support", it merely demonstrates error recovery at
work when parsing tag soup.
You are avoiding the point made that contrary to your claim that the
media type used made no difference, that a document served as
application/xml may not be recognized as XHTML.

That's what the W3C says. It does not happen nevertheless.

You mean that you haven't seen it happen, excuse me for attaching little
to no value to that.

--
Spartanicus

Jun 9 '06 #15

Christoph Schneegans

"Spartanicus" wrote:

Do you dispute the fact that XML Schema validation is more powerful than
DTD validation?
Before changing to another topic, first let's finish off your original
claim;

My original claim was that XHTML allows more powerful validation. I really
don't know why you assume I was referring to XML DTD based validation.
do you agree that XHTML DTD based validation is not "more powerful"
than HTML DTD validation?
Sure. This has no relevance at all. If you want meaningful validation
results, use an XML Schema validator such as <http://schneegans.de/sv/>.

All additional XHTML constraints can be emulated for HTML validation.

Using freaky regular expressions?

By editing the files used by the validator, this doesn't involve regular
expressions.

If I wanted to disallow



in favor of



in my HTML documents, how would I do that?

This is noted on the quoted resource, and it links to an explanation of
how to change non DTD constraints, again using a common example.

Most authors don't want to create a custom DTD or a custom SGML
declaration.

No-one has anything to gain from checking text/html documents for the
extra constraints that XHTML requires, by definition this includes "most
authors".

Far from it! Many authors wonder why e.g.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head><title></title></head>
<body>
foo

bar

</body>
</html>

is not displayed in Firefox as intended, although this document is valid
according to the W3C Validator. <http://valet.webthing.com/page/> doesn't
output a warning either, BTW.
Again before having another change of subject, I'm still trying to find
out what you meant by your original claim that XHTML syntax is simpler
than HTML syntax.
'</span' is valid HTML syntax. It's not simple at all.
What you refer to as "valid HTML" is an error that isn't picked up by a
validator using the public DTD.
Which specification says that '</span' is not valid HTML?

Now I want you to present an XHTML 1.0 document that conforms to
Appendix C and is not supported by IE.
Again: IE does not support XHTML at all.

Okay, then your task should be quite easy.
Like almost all other HTML parsers IE's error recovery mechanism
allows XHTML served as text/html to be parsed as pseudo HTML without
necessarily causing problems.

IE supports anything that is labelled "text/html" and has some angle
brackets in it. In this sense, it also supports HTML or XHTML. IE doesn't
parse "text/html" XHTML using an XML parser, and it doesn't parse
"text/html" HTML using an HTML parser either. The difference is that I
don't suggest the former, while you do suggest the latter.

You are avoiding the point made that contrary to your claim that the
media type used made no difference, that a document served as
application/xml may not be recognized as XHTML.

That's what the W3C says. It does not happen nevertheless.

You mean that you haven't seen it happen, excuse me for attaching little
to no value to that.

You have't seen it happen either. Otherwise you would happily disclose the
name of that user agent.

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 10 '06 #16

Henri Sivonen

In article <ef********************************@4ax.com>,
Spartanicus <in*****@invalid.invalid> wrote:

That said, I'm aware of at least one online HTML validator that has an
option to impose additional constraints (Nick Kew's Page Valet).

You may also be interested in
http://hsivonen.iki.fi/validator/
which does not play along with the HTML-as-SGML fiction and allows RELAX
NG and Schematron 1.5 to be used with HTML.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Jun 10 '06 #17

Henri Sivonen

In article <e6**********@news.christoph.schneegans.de>,
Christoph Schneegans <Ch*******@Schneegans.de> wrote:

My original claim was that XHTML allows more powerful validation. I really
don't know why you assume I was referring to XML DTD based validation.
:-)
If I wanted to disallow



in favor of



in my HTML documents, how would I do that?

But is that a reasonable thing to want?

Isn't kind of like wanting an app that processes XML to make a
difference between
<
and
<![CDATA[<]]>
?

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Validation Service for RELAX NG: http://hsivonen.iki.fi/validator/

Jun 10 '06 #18

Christoph Schneegans

Henri Sivonen wrote:

If I wanted to disallow



in favor of



in my HTML documents, how would I do that?

But is that a reasonable thing to want?

Omitting whitespace between attribute specifications is almost always
a typo, not a deliberate decision.

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 10 '06 #19

Henri Sivonen

In article <e6**********@news.christoph.schneegans.de>,
Christoph Schneegans <Ch*******@Schneegans.de> wrote:

Henri Sivonen wrote:
If I wanted to disallow



in favor of



in my HTML documents, how would I do that?

But is that a reasonable thing to want?

Omitting whitespace between attribute specifications is almost always
a typo, not a deliberate decision.

But with HTML it is harmless--like making the typo of putting two spaces
instead of one between attributes.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Jun 10 '06 #20

Christoph Schneegans

Henri Sivonen wrote:

Omitting whitespace between attribute specifications is almost always
a typo, not a deliberate decision.

But with HTML it is harmless--like making the typo of putting two spaces
instead of one between attributes.

It's definitely harmless for HTML parsers. But what about tag soup
parsers? I will put a document containing

<a href="foo"class="bar">

somewhere and check if spiders request 'foo"class="bar' or similar.

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 10 '06 #21

Spartanicus

Christoph Schneegans <Ch*******@Schneegans.de> wrote:

Do you dispute the fact that XML Schema validation is more powerful than
DTD validation?
Before changing to another topic, first let's finish off your original
claim;

My original claim was that XHTML allows more powerful validation. I really
don't know why you assume I was referring to XML DTD based validation.

I assumed no such thing. You attempted to change the subject from DTD
based validation to Schema or RELAX NG validation.

do you agree that XHTML DTD based validation is not "more powerful"
than HTML DTD validation?

Sure. This has no relevance at all.

I seem to remember that you claimed this as an advantage of XHTML.
If you want meaningful validation
results, use an XML Schema validator such as <http://schneegans.de/sv/>.

A different subject not related to the alleged advantages of XHTML.

FYI Schema and/or RELAX NG validation is also available for HTML.

All additional XHTML constraints can be emulated for HTML validation.

Using freaky regular expressions?

By editing the files used by the validator, this doesn't involve regular
expressions.

If I wanted to disallow



in favor of



in my HTML documents, how would I do that?

Ask someone who knows SGML. As referenced in my resource on using a
custom DTD, Eric B. Bednarz once helped me to emulate some extra
constraints. He has posted in this thread, maybe he is willing to help
with that if you are serious about using it.

This is noted on the quoted resource, and it links to an explanation of
how to change non DTD constraints, again using a common example.

Most authors don't want to create a custom DTD or a custom SGML
declaration.

No-one has anything to gain from checking text/html documents for the
extra constraints that XHTML requires, by definition this includes "most
authors".

Far from it! Many authors wonder why e.g.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head><title></title></head>
<body>
foo

bar

</body>
</html>

is not displayed in Firefox as intended, although this document is valid
according to the W3C Validator. <http://valet.webthing.com/page/> doesn't
output a warning either, BTW.

The above code also validates (using a DTD validator) if I change it to
XHTML.

Again before having another change of subject, I'm still trying to find
out what you meant by your original claim that XHTML syntax is simpler
than HTML syntax.

'</span' is valid HTML syntax. It's not simple at all.
What you refer to as "valid HTML" is an error that isn't picked up by a
validator using the public DTD.

Which specification says that '</span' is not valid HTML?

As I said it's an error, but valid (assuming DTD validation using the
public validation profile). I've pointed you to a resource that allows
this error the be picked up by a DTD validator.

Like almost all other HTML parsers IE's error recovery mechanism
allows XHTML served as text/html to be parsed as pseudo HTML without
necessarily causing problems.

IE supports anything that is labelled "text/html" and has some angle
brackets in it. In this sense, it also supports HTML or XHTML. IE doesn't
parse "text/html" XHTML using an XML parser, and it doesn't parse
"text/html" HTML using an HTML parser either. The difference is that I
don't suggest the former, while you do suggest the latter.

By that measure there are no CSS or DOM capable browsers because they
all have bugs. An potential obscure issue versus a fundamental parsing
model does not compare.

You are avoiding the point made that contrary to your claim that the
media type used made no difference, that a document served as
application/xml may not be recognized as XHTML.

That's what the W3C says. It does not happen nevertheless.

You mean that you haven't seen it happen, excuse me for attaching little
to no value to that.

You have't seen it happen either. Otherwise you would happily disclose the
name of that user agent.

I haven't tested for it, nor am I about to for your education
considering that application/xml should not be used to serve XHTML,
which itself has virtually no value at this point in time.

--
Spartanicus

Jun 10 '06 #22

Christoph Schneegans

"Spartanicus" wrote:

Do you dispute the fact that XML Schema validation is more powerful than
DTD validation?

Before changing to another topic, first let's finish off your original
claim;
My original claim was that XHTML allows more powerful validation. I really
don't know why you assume I was referring to XML DTD based validation.

I assumed no such thing. You attempted to change the subject from DTD
based validation to Schema or RELAX NG validation.

The subject never was "DTD based validation", it was "validation". XML
validation includes XML Schema validation and RELAX NG validation, of
course.
FYI Schema and/or RELAX NG validation is also available for HTML.

I doubt it.

> All additional XHTML constraints can be emulated for HTML validation.

Using freaky regular expressions?

By editing the files used by the validator, this doesn't involve regular
expressions.

If I wanted to disallow



in favor of



in my HTML documents, how would I do that?

Ask someone who knows SGML.

How can you claim that "all additional XHTML constraints can be emulated"
with HTML if you don't know how to do that?

Many authors wonder why e.g.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head><title></title></head>
<body>
foo

bar

</body>
</html>

is not displayed in Firefox as intended, although this document is valid
according to the W3C Validator.

The above code also validates (using a DTD validator) if I change it to
XHTML.

'' is not even well-formed XML, and all XML validators know that.
Of course, SGML validators such as OpenSP that try to act like XML
validators don't, see <http://esw.w3.org/topic/MarkupValidator/XML_Limitations>.

Which specification says that '</span' is not valid HTML?

As I said it's an error, but valid (assuming DTD validation using the
public validation profile).

So which specification says that '</span' is erroneous?

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 10 '06 #23

Henri Sivonen

In article <e6**********@news.christoph.schneegans.de>,
Christoph Schneegans <Ch*******@Schneegans.de> wrote:

Henri Sivonen wrote:
Omitting whitespace between attribute specifications is almost always
a typo, not a deliberate decision.
But with HTML it is harmless--like making the typo of putting two spaces
instead of one between attributes.

It's definitely harmless for HTML parsers. But what about tag soup
parsers?

What kind of difference is there, according to your definitions, between
an HTML parser and a tag soup parser?

Anyway, AFAIK, it is harmless for the text/html parsers deployed in
notable browsers.
I will put a document containing

<a href="foo"class="bar">

somewhere and check if spiders request 'foo"class="bar' or similar.

Well, I can't know if there's a broken spider somewhere out there that
does not support syntax that is OK according to browsers, according to
HTML5 and according to SGML.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Jun 10 '06 #24

Henri Sivonen

In article <e6**********@news.christoph.schneegans.de>,
Christoph Schneegans <Ch*******@Schneegans.de> wrote:

FYI Schema and/or RELAX NG validation is also available for HTML.

I doubt it.

http://hsivonen.iki.fi/validator/
provides RELAX NG validation for HTML.

There's no reason why the same approach couldn't be taken with XSD. The
reason why XSD does not work with my validation service is that the
Xerces XSD validator has a crasher bug (null dereference) and I don't
care enough about XSD to fix the bug myself.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Jun 10 '06 #25

Spartanicus

Christoph Schneegans <Ch*******@Schneegans.de> wrote:

My original claim was that XHTML allows more powerful validation. I really
don't know why you assume I was referring to XML DTD based validation.

I assumed no such thing. You attempted to change the subject from DTD
based validation to Schema or RELAX NG validation.

The subject never was "DTD based validation", it was "validation". XML
validation includes XML Schema validation and RELAX NG validation, of
course.

You should have made that clear from the start. There are different
definitions of validation. Afaik in SGML it is defined as "checking a
document against a DTD", nothing more. Given the topic of this group
this is the definition I apply unless the poster mentions that he is
referring to XML validation.

FYI Schema and/or RELAX NG validation is also available for HTML.

I doubt it.

One that was announced in this group: http://badame.vse.cz/validator/
there are likely to be others.

If I wanted to disallow



in favor of



in my HTML documents, how would I do that?

Ask someone who knows SGML.

How can you claim that "all additional XHTML constraints can be emulated"
with HTML if you don't know how to do that?

Due to your failure to note that you were referring to XML validation my
responses until now assumed DTD validation.

So now I have to ask why you'd want to disallow "" in HTML in the first place, given that it's valid
and a brief test doesn't demonstrate it causing a problem.

The above code also validates (using a DTD validator) if I change it to
XHTML.

'' is not even well-formed XML, and all XML validators know that.
Of course, SGML validators such as OpenSP that try to act like XML
validators don't, see <http://esw.w3.org/topic/MarkupValidator/XML_Limitations>.

Checking for XML constraints that are missed by SGML validators can be
achieved by using an XML validator and internally transcoding HTML to
XHTML (http://badame.vse.cz/validator/ does this). No need to author
XHTML because of "more powerful means for validation".

Which specification says that '</span' is not valid HTML?

As I said it's an error, but valid (assuming DTD validation using the
public validation profile).

So which specification says that '</span' is erroneous?

Presumably the SGML spec (to which I don't have access). I assume that a
left angled bracket is not allowed inside a tag unless it is surrounded
by quotes.

--
Spartanicus

Jun 10 '06 #26

Christoph Schneegans

Henri Sivonen wrote:

FYI Schema and/or RELAX NG validation is also available for HTML.

I doubt it.

http://hsivonen.iki.fi/validator/
provides RELAX NG validation for HTML.

How do you validate HTML 4.01 documents?

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 10 '06 #27

Christoph Schneegans

"Spartanicus" wrote:

There are different definitions of validation. Afaik in SGML it is
defined as "checking a document against a DTD", nothing more. Given
the topic of this group this is the definition I apply unless the
poster mentions that he is referring to XML validation.
It's not my problem when you jump to conclusions.

FYI Schema and/or RELAX NG validation is also available for HTML.

I doubt it.

One that was announced in this group: http://badame.vse.cz/validator/
there are likely to be others.

<http://badame.vse.cz/validator/validate?uri=http://www.spartanicus.utvinternet.ie/no-xhtml.htm>
seems to complain about a missing system identifier, which is perfectly
legal in HTML...
So now I have to ask why you'd want to disallow "" in HTML in the first place, given that it's valid
and a brief test doesn't demonstrate it causing a problem.

Omitting whitespace between attribute specifications is almost always
a typo, not a deliberate decision. I'd like to spot typos.

So which specification says that '</span' is erroneous?

Presumably the SGML spec (to which I don't have access).

So public availability of all relevant specifications is obviously another
advantage of XHTML over HTML.

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 10 '06 #28

Christoph Schneegans

Henri Sivonen wrote:

What kind of difference is there, according to your definitions, between
an HTML parser and a tag soup parser?

'</span' and '' are
equivalent for an HTML parser, but probably not for a tag soup parser.

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 10 '06 #29

Henri Sivonen

In article <e6**********@news.christoph.schneegans.de>,
Christoph Schneegans <Ch*******@Schneegans.de> wrote:

Henri Sivonen wrote:
FYI Schema and/or RELAX NG validation is also available for HTML.

I doubt it.

http://hsivonen.iki.fi/validator/
provides RELAX NG validation for HTML.

How do you validate HTML 4.01 documents?

The parser works as in the HTML5 case except:
* An HTML 4.01 doctype is required instead of the HTML5 doctype.
* Extra entities from HTML5 are not allowed.

The XHTML 1.0 Strict and Transitional schemas are used for HTML 4.01
Strict and Transitional, respectively.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Jun 10 '06 #30

Henri Sivonen

In article <e6**********@news.christoph.schneegans.de>,
Christoph Schneegans <Ch*******@Schneegans.de> wrote:

<http://badame.vse.cz/validator/valid...cus.utvinterne
t.ie/no-xhtml.htm>
seems to complain about a missing system identifier, which is perfectly
legal in HTML...

Is it? The HTML 4.01 says "HTML 4.01 specifies three DTDs, so authors
must include one of the following document type declarations in their
documents." and list three doctype with system IDs.

Of course, SGML would allow the system ID to be omitted, but then what
stance do you take on other SGMLisms?

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Jun 10 '06 #31

Henri Sivonen

In article <e6**********@news.christoph.schneegans.de>,
Christoph Schneegans <Ch*******@Schneegans.de> wrote:

Henri Sivonen wrote:
What kind of difference is there, according to your definitions, between
an HTML parser and a tag soup parser?

'</span' and '' are
equivalent for an HTML parser, but probably not for a tag soup parser.

That's not really a definition. :-)

Do you mean something other than text/html when you say tag soup? Do you
mean something other than a text/html parser when you say HTML parser?

I'd say HTML parsers and tag soup parsers parse
</span
as equivalent to


SGML parsers, of course, would parse it as equivalent to


--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Jun 10 '06 #32

Christoph Schneegans

Henri Sivonen wrote:

<http://badame.vse.cz/validator/validate?uri=http://www.spartanicus.utvinternet.ie/no-xhtml.htm>
seems to complain about a missing system identifier, which is perfectly
legal in HTML...
Is it? The HTML 4.01 says "HTML 4.01 specifies three DTDs, so authors
must include one of the following document type declarations in
their documents." and list three doctype with system IDs.

<http://www.w3.org/TR/html401/struct/global.html> itself lacks the system
identifier. I think the intention is clear.
Of course, SGML would allow the system ID to be omitted, but then what
stance do you take on other SGMLisms?

Such as?

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 10 '06 #33

Christoph Schneegans

Henri Sivonen wrote:

> FYI Schema and/or RELAX NG validation is also available for HTML.

I doubt it.

http://hsivonen.iki.fi/validator/
provides RELAX NG validation for HTML.

How do you validate HTML 4.01 documents?

The parser works as in the HTML5 case except:
* An HTML 4.01 doctype is required instead of the HTML5 doctype.
* Extra entities from HTML5 are not allowed.

I guess I should have asked "How do I validate HTML 4.01 documents?"
<http://hsivonen.iki.fi/validator/?doc=http://www.spartanicus.utvinternet.ie/no-xhtml.htm>
doesn't seem to work.

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 10 '06 #34

Henri Sivonen

In article <e6**********@news.christoph.schneegans.de>,
Christoph Schneegans <Ch*******@Schneegans.de> wrote:

Henri Sivonen wrote:
>> FYI Schema and/or RELAX NG validation is also available for HTML.
>
> I doubt it.

http://hsivonen.iki.fi/validator/
provides RELAX NG validation for HTML.

How do you validate HTML 4.01 documents?

The parser works as in the HTML5 case except:
* An HTML 4.01 doctype is required instead of the HTML5 doctype.
* Extra entities from HTML5 are not allowed.

I guess I should have asked "How do I validate HTML 4.01 documents?"
<http://hsivonen.iki.fi/validator/?do...vinternet.ie/n
o-xhtml.htm>
doesn't seem to work.

It works as designed. HTML5 does not allow comments before the doctype
and Spartanicus has a comment there. And comments before the doctype
were not on my list of differences above. Moreover, the comment can
cause IE6 to fall into quirks mode (from memory; I don't have a Windows
box here to test with).

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Jun 10 '06 #35

Henri Sivonen

In article <e6**********@news.christoph.schneegans.de>,
Christoph Schneegans <Ch*******@Schneegans.de> wrote:

Of course, SGML would allow the system ID to be omitted, but then what
stance do you take on other SGMLisms?

Such as?

http://hsivonen.iki.fi/test/minimization.html

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Jun 10 '06 #36

Spartanicus

Christoph Schneegans <Ch*******@Schneegans.de> wrote:

There are different definitions of validation. Afaik in SGML it is
defined as "checking a document against a DTD", nothing more. Given
the topic of this group this is the definition I apply unless the
poster mentions that he is referring to XML validation.

It's not my problem when you jump to conclusions.

An assumption is not a conclusion, and as explained it's a logical
assumption. Failing to provide sufficient information is your problem if
you want to be taken serious in discussions here.

FYI Schema and/or RELAX NG validation is also available for HTML.

I doubt it.

One that was announced in this group: http://badame.vse.cz/validator/
there are likely to be others.

<http://badame.vse.cz/validator/validate?uri=http://www.spartanicus.utvinternet.ie/no-xhtml.htm>
seems to complain about a missing system identifier, which is perfectly
legal in HTML...

I don't care about the judgement of such tools one way or the other, I
don't use them. As I've said before, validation has very limited value.
The fact that my documents typically validate against the DTD says very
little about the quality of the markup.

So now I have to ask why you'd want to disallow "" in HTML in the first place, given that it's valid
and a brief test doesn't demonstrate it causing a problem.

Omitting whitespace between attribute specifications is almost always
a typo, not a deliberate decision. I'd like to spot typos.

To borrow a phrase from Jukka: a pointless exercise in futility. Such
exercises do not form an argument in favour of XHTML.

So which specification says that '</span' is erroneous?

Presumably the SGML spec (to which I don't have access).

So public availability of all relevant specifications is obviously another
advantage of XHTML over HTML.

X(HT)ML is a SGML subset.

--
Spartanicus

Jun 10 '06 #37

Christoph Schneegans

"Spartanicus" wrote:

So which specification says that '</span' is erroneous?

Presumably the SGML spec (to which I don't have access).

So public availability of all relevant specifications is obviously another
advantage of XHTML over HTML.

X(HT)ML is a SGML subset.

SGML is not a relevant specification in order to create or process XML
documents.

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 10 '06 #38

Christoph Schneegans

Henri Sivonen wrote:

Of course, SGML would allow the system ID to be omitted, but then
what stance do you take on other SGMLisms?

Such as?

http://hsivonen.iki.fi/test/minimization.html

There can be no doubt that this is a valid HTML 4.01 Strict document.

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 10 '06 #39

Christoph Schneegans

Henri Sivonen wrote:

What kind of difference is there, according to your definitions, between
an HTML parser and a tag soup parser?
'</span' and '' are
equivalent for an HTML parser, but probably not for a tag soup parser.

That's not really a definition. :-)

You got the idea.
Do you mean something other than text/html when you say tag soup?
HTML is defined in the HTML specification. Tag soup is any text with some
angle brackets.
Do you mean something other than a text/html parser when you say HTML
parser?
Of course, an HTML parser must be able to process HTML documents, this
includes some special SGML constructs. User agents such as IE, Opera or
Firefox don't use HTML parsers. I don't know if there are any distinct
HTML parsers at all, but SGML parsers would be able to process HTML
documents. What parser is used by Emacs/W3, BTW?
I'd say HTML parsers and tag soup parsers parse
</span
as equivalent to

No, just take a look at <http://schneegans.de/temp/shorthand.html> with
different browsers. IE and Opera don't recognize an "em" tag.
SGML parsers, of course, would parse it as equivalent to

Yes, beyond dispute.

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 10 '06 #40

Henri Sivonen

In article <e6**********@news.christoph.schneegans.de>,
Christoph Schneegans <Ch*******@Schneegans.de> wrote:

Henri Sivonen wrote:
Of course, SGML would allow the system ID to be omitted, but then
what stance do you take on other SGMLisms?

Such as?

http://hsivonen.iki.fi/test/minimization.html

There can be no doubt that this is a valid HTML 4.01 Strict document.

Right.

But is it conforming?

And more importantly, is it useful? (No.)

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Jun 11 '06 #41

Christoph Schneegans

Henri Sivonen wrote:

http://hsivonen.iki.fi/test/minimization.html
There can be no doubt that this is a valid HTML 4.01 Strict document.

But is it conforming?

I'm not aware of any definition of the term "conforming HTML document".
And more importantly, is it useful? (No.)

Of course not. Authors don't use such constructs deliberately, but
accidentally. Take a look at
<http://schneegans.de/web/xhtml/flawed-html/>,
<http://schneegans.de/web/xhtml/ie.png>,
<http://schneegans.de/web/xhtml/firefox.png> and
<http://schneegans.de/web/xhtml/opera.png>. I didn't invent this example;
a very similar document was e-mailed to me by someone who noticed the
different rendering, but was unable to spot the problem with the help of
the W3C Validator. He should have used XHTML...

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 11 '06 #42

Henri Sivonen

In article <e6**********@news.christoph.schneegans.de>,
Christoph Schneegans <Ch*******@Schneegans.de> wrote:

Henri Sivonen wrote:
http://hsivonen.iki.fi/test/minimization.html

There can be no doubt that this is a valid HTML 4.01 Strict document.

But is it conforming?

I'm not aware of any definition of the term "conforming HTML document".

In the HTML 4.01 spec, there's no clear definition, but implying a
connection between section headings and definitions, one gets this:
A conforming HTML document is an SGML document that meets the
constraints of the HTML 4.01 specification.

A conforming HTML5 document is a data object that meets all the
conformance criteria for documents as stated in the Web Apps 1.0 spec
(or normative references).

And more importantly, is it useful? (No.)

Of course not. Authors don't use such constructs deliberately, but
accidentally. Take a look at
<http://schneegans.de/web/xhtml/flawed-html/>,
<http://schneegans.de/web/xhtml/ie.png>,
<http://schneegans.de/web/xhtml/firefox.png> and
<http://schneegans.de/web/xhtml/opera.png>. I didn't invent this example;
a very similar document was e-mailed to me by someone who noticed the
different rendering, but was unable to spot the problem with the help of
the W3C Validator. He should have used XHTML...

Well, my parser would have flagged the problem without using XHTML:
http://hsivonen.iki.fi/validator/?do...ns.de%2Fweb%2F
xhtml%2Fflawed-html%2F

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Jun 11 '06 #43

Christoph Schneegans

Henri Sivonen wrote:

> http://hsivonen.iki.fi/test/minimization.html

There can be no doubt that this is a valid HTML 4.01 Strict document.

But is it conforming?

I'm not aware of any definition of the term "conforming HTML document".

A conforming HTML document is an SGML document that meets the
constraints of the HTML 4.01 specification.

The HTML 4.01 specification does not prohibit the use of SGML shorthand
markup.

--
All free men, wherever they may live, are citizens of Denmark. And
therefore, as a free man, I take pride in the words "Jeg er dansker!"

Jun 11 '06 #44

Opera guesses encoding for "application/xml"

Similar topics