By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,723 Members | 1,662 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,723 IT Pros & Developers. It's quick & easy.

FF and IE with span

P: n/a
I forgot to close a <span>. The html was:
<span class="something">aaaaaa bbbbbbbb ccccccccccccc
<p>
sssss
ttttttt
</p>
etc
</body>
I intended to put the </spanbefore the <p>. The "etc" portion
included <p>s, <table>s, <img>s, my grandmother's antique silver, and
my 57 Chevy up on blocks, but no more <spantags. FF closed the
<spanbefore the <pas though it were reading my mind. IE appears to
have included the remainder of the document in the <span>. I'm
guessing that was the case because some font characteristics in
class="something" were propogated throughout the rest of the document.

So what happened here? I'm guessing that <p>s are not allowed inside a
<spanand FF enforced that rule but IE didn't.
Regards,
Kent Feiler
www.KentFeiler.com
Feb 6 '07 #1
Share this Question
Share on Google+
33 Replies


P: n/a
Scripsit Kent Feiler:
I forgot to close a <span>.
That's a syntax error. Just fix it.

But if you need to speculate, too...
FF closed the
<spanbefore the <pas though it were reading my mind.
That's reasonable error recovery, since <pstarts a block-level element and
<spanis text-level (inline).
IE appears to
have included the remainder of the document in the <span>.
That's less reasonable error recovery, but HTML specifications do not
mandate any particular error recovery. A browser could also show nothing or
it could start playing Towers of Hanoi, but that would be even less
reasonable.
I'm guessing that <p>s are not allowed inside a
<span>
Well, you should _know_ that to be the case.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Feb 6 '07 #2

P: n/a
On 2007-02-06, Jukka K. Korpela <jk******@cs.tut.fiwrote:
[snip]
That's less reasonable error recovery, but HTML specifications do not
mandate any particular error recovery.
They might do in the future

http://www.whatwg.org/specs/web-apps...-work/#parsing

"The error handling for parse errors is well-defined: user agents
must either act as described below when encountering such problems,
or must abort processing at the first error that they encounter for
which they do not wish to apply the rules described below."
Feb 6 '07 #3

P: n/a
Scripsit Ben C:
On 2007-02-06, Jukka K. Korpela <jk******@cs.tut.fiwrote:
[snip]
>That's less reasonable error recovery, but HTML specifications do not
mandate any particular error recovery.

They might do in the future
And cows and pigs might learn to fly.
http://www.whatwg.org/specs/web-apps...-work/#parsing
Anyone and his brother can write a document and call it a specification, or
draft for a specification. While the cited work might be useful in
documenting browser practice, it can hardly affect browser behavior in
issues like mandatory error recovery. The web already has billions of
malformed HTML documents, and any browser has to deal with more or less
wrongly marked-up documents and display them the best they can.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Feb 6 '07 #4

P: n/a
In article <uf*****************@reader1.news.saunalahti.fi> ,
"Jukka K. Korpela" <jk******@cs.tut.fiwrote:
http://www.whatwg.org/specs/web-apps...-work/#parsing

Anyone and his brother can write a document and call it a specification, or
draft for a specification.
Not just anyone and his brother can write a document that would have the
same impact expectations as the cited document.
While the cited work might be useful in
documenting browser practice, it can hardly affect browser behavior in
issues like mandatory error recovery. The web already has billions of
malformed HTML documents, and any browser has to deal with more or less
wrongly marked-up documents and display them the best they can.
The HTML5 parsing algorithm has been carefully designed to be compatible
with legacy content while still producing DOMs where the tree
assumptions hold. Of the current deployed browsers, the HTML5 parsing
algorithm is probably closest to what WebKit does. Gecko, WebKit and
Opera are expected to implement the algorithm. The word from Microsoft
suggests that they don't want to change their HTML 4 parsing but might
adopt a standard parsing algorithm for a future version of HTML.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Feb 6 '07 #5

P: n/a
"Jukka K. Korpela" <jk******@cs.tut.fiwrites:
That's less reasonable error recovery, but HTML specifications do not
mandate any particular error recovery. A browser could also show
nothing or it could start playing Towers of Hanoi, but that would be
even less reasonable.
If browsers did that, we'd have much better markup on the web today, and
far fewer problems. Sounds reasonable to me. :-)

sherm--

--
Web Hosting by West Virginians, for West Virginians: http://wv-www.net
Cocoa programming in Perl: http://camelbones.sourceforge.net
Feb 7 '07 #6

P: n/a
Scripsit Henri Sivonen:
The HTML5 parsing algorithm has been carefully designed to be
compatible with legacy content while still producing DOMs where the
tree assumptions hold.
That's one of the fundamental flaws in the approach.

When you define error processing rigorously enough, you effectively extend
the language by defining errors part of it. _If_ browser vendors actually
started to comply with error processing requirements, then those authors who
are foolish enough to rely on that (and to ignore old browser versions and
non-browser software used to process HTML documents) would happily use
"erroneous" constructs since they would think they _know_ how they will be
handled.

If and when browser vendors won't comply, nothing has been won but confusion
has been created.

If an author wants to play with things involving DOMs, he should simply
write his markup by the book. There's no point in making people think they
can keep sloppy code or sloppy coding and expect good results.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Feb 7 '07 #7

P: n/a
On 2007-02-07, Jukka K. Korpela <jk******@cs.tut.fiwrote:
Scripsit Henri Sivonen:
>The HTML5 parsing algorithm has been carefully designed to be
compatible with legacy content while still producing DOMs where the
tree assumptions hold.

That's one of the fundamental flaws in the approach.

When you define error processing rigorously enough, you effectively extend
the language by defining errors part of it.
Indeed, c.f. the part of the ECMAScript specification that deals with
missing semicolons.
_If_ browser vendors actually started to comply with error processing
requirements, then those authors who are foolish enough to rely on
that (and to ignore old browser versions and non-browser software used
to process HTML documents) would happily use "erroneous" constructs
since they would think they _know_ how they will be handled.

If and when browser vendors won't comply, nothing has been won but confusion
has been created.
And insanity quickly follows.
If an author wants to play with things involving DOMs, he should simply
write his markup by the book. There's no point in making people think they
can keep sloppy code or sloppy coding and expect good results.
The problem is that certain vendors will encourage sloppy coding that's
handled in a secret proprietary way, while selling authoring tools that
produce exactly complementary slop that "works" with their sloppy
browser. How else to stop these people than define a standard for them
to be clearly accused of breaking, in a court-room setting?

Ideally the standard would say the browser should display a helpful
error message instead of any part of the page at the first sign of
trouble. This would make everyone's life easier, particularly those of
the page authors. But it's too late for that now, so they define either
a particular course of error recovery, or the option of aborting. This
does seem like a reasonable move considering the circumstances.
Feb 7 '07 #8

P: n/a
In article <ma***************@reader1.news.saunalahti.fi>,
"Jukka K. Korpela" <jk******@cs.tut.fiwrote:
Scripsit Henri Sivonen:
The HTML5 parsing algorithm has been carefully designed to be
compatible with legacy content while still producing DOMs where the
tree assumptions hold.

That's one of the fundamental flaws in the approach.
Do you also consider the CSS forward-compatible parsing rules a flawed
approach?
When you define error processing rigorously enough, you effectively extend
the language by defining errors part of it.
However, the errors should be considered to be reserved as future
extension points, so smart authors shouldn't rely on error recovery.
Non-smart authors are already relying on unstandardized error recovery.
_If_ browser vendors actually
started to comply with error processing requirements, then those authors who
are foolish enough to rely on that (and to ignore old browser versions and
non-browser software used to process HTML documents) would happily use
"erroneous" constructs since they would think they _know_ how they will be
handled.
The problem with your argumentation is that you seem to imply that if
error handling was undefined, authors wouldn't rely on error handling.
If you look at the real Web, this obviously isn't the case.

Also, you are assuming that authors were smart enough to read the spec
but still clueless enough to willfully produce erroneous documents.
If and when browser vendors won't comply, nothing has been won but confusion
has been created.
What makes you think that they won't comply?
If an author wants to play with things involving DOMs, he should simply
write his markup by the book. There's no point in making people think they
can keep sloppy code or sloppy coding and expect good results.
The part about producing DOM trees isn't about authors playing with
things that involve the DOM. Modern browsers are DOM-based regardless of
a content-side script mutating the DOM. Parsing HTML in a browser is
inherently a process that takes a stream of bytes as the input and
produces a DOM as the output.

WebKit and Gecko *already* have parsers that always produce DOMs where
the tree assumption holds. This means that requiring the tree assumption
to hold is feasible considering legacy content. On the other hand, since
the architecture of these engines requires a real tree, a standard
algorithm must produce a tree. That is, the exact behavior of IE (which
doesn't always produce a tree) can't become the standard.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Feb 7 '07 #9

P: n/a
On 2007-02-07, Henri Sivonen <hs******@iki.fiwrote:
[snip]
WebKit and Gecko *already* have parsers that always produce DOMs where
the tree assumption holds. This means that requiring the tree assumption
to hold is feasible considering legacy content. On the other hand, since
the architecture of these engines requires a real tree, a standard
algorithm must produce a tree. That is, the exact behavior of IE (which
doesn't always produce a tree) can't become the standard.
What does it produce if not a tree?
Feb 7 '07 #10

P: n/a
On 7 Feb, 09:00, Ben C <spams...@spam.eggswrote:
Ideally the standard would say the browser should display a helpful
error message instead of any part of the page at the first sign of
trouble.
I disagree. This is the _web_ we're talking about, it needs to be
optimised for consumers (and consumers of tag soup) rather than
authors. There are far more readers than authors and the "available to
all authors" constraint also means that the majority of these authors
are always going to be less than skilled.

Error reporting like this would be a boon to authors, but a nuisance
to most readers of most sites. After all, what use is an error message
on something you really have no chance of fixing?
Feb 7 '07 #11

P: n/a
Scripsit Henri Sivonen:
Do you also consider the CSS forward-compatible parsing rules a flawed
approach?
I do, but that's a different issue. "Forward-compatible parsing" is what
browsers have done to HTML from the beginning: ignore a tag when you don't
grok the tag name, ignore an attribute when you don't grok the attribute
name or value. There's nothing brilliant with it, but it's practical up to a
point. As we know, it failed for some elements in the past since the tags
were not designed to be ignorable by browsers that didn't know them. Many
people still write the foolish "<!--" and "-->" inside <styleor <script>
elements due to this sad history.

HTML was never really designed to be systematically extensible (and don't
make be start bashing the "extensibility" of XHTML), and neither was CSS,
but there was a better effort with CSS.
>When you define error processing rigorously enough, you effectively
extend the language by defining errors part of it.

However, the errors should be considered to be reserved as future
extension points, so smart authors shouldn't rely on error recovery.
If you wish to have an extensible markup language, you should design one. It
could be XML based, just as virtually anything could, but that's irrelevant.
Non-smart authors are already relying on unstandardized error
recovery.
Standardized error recovery would convert the big crowd of authors (those
between idiots and geniuses) into non-smart in that sense.
The problem with your argumentation is that you seem to imply that if
error handling was undefined, authors wouldn't rely on error handling.
If you look at the real Web, this obviously isn't the case.
Authors aren't relying on error handling. They're just using what they
regard as HTML, not knowing what's vendor-specific, or they're just making
mistakes in coding. They don't regard their markup as erroneous.
Also, you are assuming that authors were smart enough to read the spec
but still clueless enough to willfully produce erroneous documents.
No, I'm not. Most people learn from other people's code or from
miscellaneous tutorials and articles. They'll read e.g. that you shouldn't
do something but if you do, here's what _shall_ happen, and they'll learn
the "if I ..., then ..." part.
>If and when browser vendors won't comply, nothing has been won but
confusion has been created.

What makes you think that they won't comply?
They'd win nothing but reputation in some small circles. If following some
error processing rules would break the way that existing pages now work on a
browser, they would just upset many people by doing so. So they'll follow
the rules only as far as they coincide what they're doing, i.e. they won't
change anything.
The part about producing DOM trees isn't about authors playing with
things that involve the DOM. Modern browsers are DOM-based regardless
of a content-side script mutating the DOM. Parsing HTML in a browser
is inherently a process that takes a stream of bytes as the input and
produces a DOM as the output.
So what? When the source is syntactically malformed, they'll do _something_.
If you want to compete with such a modern browser on the market, against the
market leader, then you'll simply have to simulate tag soup processing the
best you can, trying to construct a tree from something that ain't no tree
logically.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Feb 7 '07 #12

P: n/a
Ben C wrote:
On 2007-02-07, Henri Sivonen <hs******@iki.fiwrote:
[snip]
>WebKit and Gecko *already* have parsers that always produce DOMs where
the tree assumption holds. This means that requiring the tree assumption
to hold is feasible considering legacy content. On the other hand, since
the architecture of these engines requires a real tree, a standard
algorithm must produce a tree. That is, the exact behavior of IE (which
doesn't always produce a tree) can't become the standard.

What does it produce if not a tree?
Something else, see e.g. http://ln.hixie.ch/?start=1037910467&count=1

--
David Håsäther
Feb 7 '07 #13

P: n/a
Jukka K. Korpela wrote:
Scripsit Kent Feiler:
>I'm guessing that <p>s are not allowed inside a
<span>

Well, you should _know_ that to be the case.
Yes, for the same reasons discussed here in December:
http://message-id.net/<op********************************@4ax.com>

--
John
Feb 7 '07 #14

P: n/a
In article <sl*********************@bowser.marioworld>,
Ben C <sp******@spam.eggswrote:
On 2007-02-07, Henri Sivonen <hs******@iki.fiwrote:
[snip]
WebKit and Gecko *already* have parsers that always produce DOMs where
the tree assumption holds. This means that requiring the tree assumption
to hold is feasible considering legacy content. On the other hand, since
the architecture of these engines requires a real tree, a standard
algorithm must produce a tree. That is, the exact behavior of IE (which
doesn't always produce a tree) can't become the standard.

What does it produce if not a tree?
I haven't seen the source code of IE, but my best guess from black box
observations is that it maintains a linked list of element starts,
element ends, comment and character data and implements tree API
operations by walking this list. When the list is coherent, the view
experienced through the tree API indeed looks like a tree. In incoherent
cases, it can be shown that identities that should hold in a tree don't
hold.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Feb 7 '07 #15

P: n/a
In article <i9***************@reader1.news.saunalahti.fi>,
"Jukka K. Korpela" <jk******@cs.tut.fiwrote:
Scripsit Henri Sivonen:
Do you also consider the CSS forward-compatible parsing rules a flawed
approach?

I do, but that's a different issue.
Interesting.
When you define error processing rigorously enough, you effectively
extend the language by defining errors part of it.
However, the errors should be considered to be reserved as future
extension points, so smart authors shouldn't rely on error recovery.

If you wish to have an extensible markup language, you should design one. It
could be XML based, just as virtually anything could, but that's irrelevant.
Draconian error recovery doesn't work with text/html. A smooth update to
HTML must target text/html.

So far, the CSS (and JavaScript) approaches to future versions seem to
be working better than the HTML and XML approaches. Therefore, it is
reasonable to try to follow the example of CSS.
Non-smart authors are already relying on unstandardized error
recovery.

Standardized error recovery would convert the big crowd of authors (those
between idiots and geniuses) into non-smart in that sense.
I think I don't exactly follow your reasoning there.
The problem with your argumentation is that you seem to imply that if
error handling was undefined, authors wouldn't rely on error handling.
If you look at the real Web, this obviously isn't the case.

Authors aren't relying on error handling. They're just using what they
regard as HTML, not knowing what's vendor-specific, or they're just making
mistakes in coding. They don't regard their markup as erroneous.
So why would making this stuff interoperable be worse than the status
quo? Which is more important, interoperability or making sure people
regard their markup as erroneous?
Also, you are assuming that authors were smart enough to read the spec
but still clueless enough to willfully produce erroneous documents.

No, I'm not. Most people learn from other people's code or from
miscellaneous tutorials and articles. They'll read e.g. that you shouldn't
do something but if you do, here's what _shall_ happen, and they'll learn
the "if I ..., then ..." part.
Isn't that still better than stuff breaking in incompatible ways?
If and when browser vendors won't comply, nothing has been won but
confusion has been created.
What makes you think that they won't comply?

They'd win nothing but reputation in some small circles.
Well, they'd win in interop. If Gecko, WebKit and Opera agree, two of
them win when a tag souper only tests with one.
If following some error processing rules would break the way that
existing pages now work on a browser, they would just upset many
people by doing so.
Like I said before, a key design goal for the HTML5 parsing algorithm is
not to break existing pages.
So they'll follow the rules only as far as they coincide what
they're doing, i.e. they won't change anything.
There are indications that change is afoot in Gecko. The HTML5 parsing
algorithm is already pretty close to what WebKit does. The Opera people
usually don't talk until they have a release, but they have already
demonstrated a profound commitment to the WHATWG work.
The part about producing DOM trees isn't about authors playing with
things that involve the DOM. Modern browsers are DOM-based regardless
of a content-side script mutating the DOM. Parsing HTML in a browser
is inherently a process that takes a stream of bytes as the input and
produces a DOM as the output.

So what?
You seemed to be assuming that my mention of the DOM was primarily about
catering to content-side scripting.
When the source is syntactically malformed, they'll do _something_.
Of course. You seem to be arguing that browser vendors shouldn't agree
on what "something" is.
If you want to compete with such a modern browser on the market, against the
market leader, then you'll simply have to simulate tag soup processing the
best you can, trying to construct a tree from something that ain't no tree
logically.
This is exactly a key design goal in the development of the HTML5
parsing algorithm: being as close to IE as possible while still keeping
the resulting data structure as a tree without having to add private
annotations to the tree (beyond a CSS-mandated "this is HTML not XML"
flag) for CSS box construction to achieve legacy-compatible layout.

Gecko and WebKit both construct a content tree that is used as the basis
for generating the CSS box tree. There's no magic data that the HTML
parser could add to the content tree that you couldn't touch through the
DOM APIs. This is particularly sane approach. The HTML5 approach assume
a Gecko/WebKit-style architecture from the tree generation onwards.

Opera also guarantees that a tree is exposed to scripts, but the HTML
parser [it seems] can add magic annotations to the tree that make CSS
box generation work in a way that is inexplicable by considering the
content tree that is exposed through the tree APIs. Opera's tree
generally has less nodes than the tree generated by Gecko and WebKit. IE
doesn't guarantee that the APIs expose a tree. Logically, it would be
easier to change Opera and IE to the Gecko/WebKit direction than vice
versa.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Feb 7 '07 #16

P: n/a
On 2007-02-07, Andy Dingley <di*****@codesmiths.comwrote:
On 7 Feb, 09:00, Ben C <spams...@spam.eggswrote:
>Ideally the standard would say the browser should display a helpful
error message instead of any part of the page at the first sign of
trouble.

I disagree. This is the _web_ we're talking about, it needs to be
optimised for consumers (and consumers of tag soup) rather than
authors. There are far more readers than authors and the "available to
all authors" constraint also means that the majority of these authors
are always going to be less than skilled.

Error reporting like this would be a boon to authors, but a nuisance
to most readers of most sites. After all, what use is an error message
on something you really have no chance of fixing?
I can see the argument for the browser displaying the best it can with
whatever input, and that stricter error checking should be something you
can turn on if you're authoring a page. But we already have that, and
the problem is people (and authoring tools, the worst offenders) don't
use it.

If errors always resulted in error messages it would be both easier and
more of a necessity for authors to fix the errors, and I do think we'd
all be happier. You can make the specification sensibly tolerant where
it doesn't add extra complexity, and most well-designed computer
languages are like this.
Feb 7 '07 #17

P: n/a
On 2007-02-07, Henri Sivonen <hs******@iki.fiwrote:
In article <sl*********************@bowser.marioworld>,
Ben C <sp******@spam.eggswrote:
>On 2007-02-07, Henri Sivonen <hs******@iki.fiwrote:
[snip]
WebKit and Gecko *already* have parsers that always produce DOMs where
the tree assumption holds. This means that requiring the tree assumption
to hold is feasible considering legacy content. On the other hand, since
the architecture of these engines requires a real tree, a standard
algorithm must produce a tree. That is, the exact behavior of IE (which
doesn't always produce a tree) can't become the standard.

What does it produce if not a tree?

I haven't seen the source code of IE, but my best guess from black box
observations is that it maintains a linked list of element starts,
element ends, comment and character data and implements tree API
operations by walking this list. When the list is coherent, the view
experienced through the tree API indeed looks like a tree. In incoherent
cases, it can be shown that identities that should hold in a tree don't
hold.
I see. Interesting. This is a strange approach since in programming I
think it is usually best to clean up the input sooner rather than later
in the processing pipeline.

You would think the tree inconsistencies would propagate into the part
of the program that's implementing the box model causing all kinds of
problems, infinite loops, etc.
Feb 7 '07 #18

P: n/a
On 7 Feb, 09:00, Ben C <spams...@spam.eggswrote:
Ideally the standard would say the browser should display a helpful
error message instead of any part of the page at the first sign of
trouble.
--------------------------------------------------------------------

Post it where? Are we saying that a guy from New York writes some bad
html that causes another guy from Auckland, NZ to get a screenfull of
"helpful" error messages? What nonsense! Who would they help?

Here's a better idea. Every time any browser anywhere in the world
detects an html error it sends a "helpful" error message to W3C! How's
that for a "denial of service" attack!

Regards,
Kent Feiler
www.KentFeiler.com
Feb 8 '07 #19

P: n/a
On 2007-02-08, Kent Feiler <zz**@zzzz.comwrote:
On 7 Feb, 09:00, Ben C <spams...@spam.eggswrote:
>Ideally the standard would say the browser should display a helpful
error message instead of any part of the page at the first sign of
trouble.
--------------------------------------------------------------------

Post it where? Are we saying that a guy from New York writes some bad
html that causes another guy from Auckland, NZ to get a screenfull of
"helpful" error messages?
That is exactly what I'm suggesting.

The situation in reality is not so far different. Today the guy from New
York writes some bad HTML and the guy from Auckland gets a screenful of
boxes in the wrong place, buttons he can't navigate to, forms that don't
work and all the rest of it.
What nonsense! Who would they help?
HTML is well-specified and easy to validate. If the guy from Auckland
gets an error message, the guy from New York can be certain to get one
too, even if he is using a different browser. The only people it
wouldn't help are those who don't even test their content once in one
browser!
Feb 8 '07 #20

P: n/a
Scripsit Ben C:
>Post it where? Are we saying that a guy from New York writes some bad
html that causes another guy from Auckland, NZ to get a screenfull of
"helpful" error messages?

That is exactly what I'm suggesting.

The situation in reality is not so far different. Today the guy from
New York writes some bad HTML and the guy from Auckland gets a
screenful of boxes in the wrong place, buttons he can't navigate to,
forms that don't work and all the rest of it.
The Auckland guy wouldn't understand much of the messages. No offence to
Aucklanders, really; people elsewhere would be even more confused, on the
average.

What a browser _could_ meaningfully do is to _signal_ that there are errors
(as Lynx sometimes does: "Malformed HTML!"), perhaps with an option of
looking at more detailed error reporting if the user really wants that.

Would it help? Browsers would probably show the signal for 99 % of web pages
if they are liberal and for 99.9 % if they check more (e.g., against the
automatically checkable rules in WAI recommendations, which might be
regarded as part of "correct HTML" by some).

Users would soon learn to ignore the signals, just as they currently ignore
signals like "Error on page" on the status line (unless they switched off
indication of scripting errors, so that they don't even see those signals).

Error signalling could be more useful if it were divided into severity
levels and less severe errors were not signalled at all by default. But how
could you do that? In pure, DTD-expressible syntax for example, the
situation is dichotomic: either the document complies with the DTD (and
general markup syntax rules) or it does not. The syntax specification gives
no guidance for classifying the errors into severe and less severe. Such
classifications could be developed, but that's far from easy.
HTML is well-specified and easy to validate.
HTML is _relatively_ well-specified syntactically, but is <p></p_correct_
(it's explicitly frowned upon in HTML specs), and what about the common
<p>&nbsp;</p>?

Even the formalized syntax is more difficult than one might think. Should
HTML be checked against the HTML 4.01 specification, thereby accepting
things like <em/foo/ that aren't supported by any (?) browsers? Besides, if
HTML is so easy to validate, how come even the W3C offers a "markup
validator" that suffers from several errors and obscurities, despite
continued error reports through years? (I know that markup validation is not
rocket science. It just hasn't been regarded as important enough by the
W3C.)
If the guy from Auckland
gets an error message, the guy from New York can be certain to get one
too, even if he is using a different browser. The only people it
wouldn't help are those who don't even test their content once in one
browser!
There is a large number of such people. We... ehem, they... may test their
pages even on different browsers with different settings and against fancy
user style sheets, but later on, when some error on a page needs to be fixed
quickly, or some small piece of information needs to be added, they may well
fail to test the modified page at all. Korpela's law on errors: Most errors
result from fixing errors.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Feb 8 '07 #21

P: n/a
On 2007-02-08, Jukka K. Korpela <jk******@cs.tut.fiwrote:
Scripsit Ben C:
>>Post it where? Are we saying that a guy from New York writes some bad
html that causes another guy from Auckland, NZ to get a screenfull of
"helpful" error messages?

That is exactly what I'm suggesting.

The situation in reality is not so far different. Today the guy from
New York writes some bad HTML and the guy from Auckland gets a
screenful of boxes in the wrong place, buttons he can't navigate to,
forms that don't work and all the rest of it.

The Auckland guy wouldn't understand much of the messages. No offence to
Aucklanders, really; people elsewhere would be even more confused, on the
average.
Yes, but the idea is obviously that the Auckland guy wouldn't see the
messages, because the New York guy would fix the errors before
publishing the page.
What a browser _could_ meaningfully do is to _signal_ that there are errors
(as Lynx sometimes does: "Malformed HTML!"), perhaps with an option of
looking at more detailed error reporting if the user really wants that.
Konqueror does that. It displays a little icon of a moth in the bottom
right, sometimes two of them.
Would it help? Browsers would probably show the signal for 99 % of web pages
if they are liberal and for 99.9 % if they check more (e.g., against the
automatically checkable rules in WAI recommendations, which might be
regarded as part of "correct HTML" by some).
Users would soon learn to ignore the signals, just as they currently ignore
signals like "Error on page" on the status line (unless they switched off
indication of scripting errors, so that they don't even see those signals).
I usually ignore the moths myself.
Error signalling could be more useful if it were divided into severity
levels and less severe errors were not signalled at all by default. But how
could you do that? In pure, DTD-expressible syntax for example, the
situation is dichotomic: either the document complies with the DTD (and
general markup syntax rules) or it does not. The syntax specification gives
no guidance for classifying the errors into severe and less severe. Such
classifications could be developed, but that's far from easy.
It is important that the errors aren't too annoying. I remember once
spending at least a day sorting out an XML entity called something like
&InvisibleTimes; to write F = ma in a Docbook document.

The entity was to go between m and a, denoting a product. Of course it
is important that the entity is there although it is completely
invisible-- for the sake of aural rendering, converting the document to
a telepathic stream for communicating with aliens in the future, etc.
etc..

But I can understand not everyone would have had as much patience as I
did.
>HTML is well-specified and easy to validate.

HTML is _relatively_ well-specified syntactically, but is <p></p_correct_
(it's explicitly frowned upon in HTML specs), and what about the common
<p>&nbsp;</p>?
The main thing to enforce is proper tree structure. As for DTD validity,
it seems to have become shades of grey. Browser internals implementing
CSS deal with a tree of nodes whose element types are nothing more than
indices into stylesheets. CSS specifies how to generate anonymous boxes
to deal with block boxes inside inline boxes, even with table-cell and
table boxes where you aren't expecting them.

Nevertheless, some browsers seem to enforce some of the DTD validity.
For example a stray <divinside a <trgets relocated by Firefox and
this is reflected in the DOM tree. But it wouldn't do anything like that
for other "less invalid" combinations like a <divin a <span>.
Even the formalized syntax is more difficult than one might think. Should
HTML be checked against the HTML 4.01 specification, thereby accepting
things like <em/foo/ that aren't supported by any (?) browsers? Besides, if
HTML is so easy to validate, how come even the W3C offers a "markup
validator" that suffers from several errors and obscurities, despite
continued error reports through years? (I know that markup validation is not
rocket science. It just hasn't been regarded as important enough by the
W3C.)
Well if it is hard to validate, the solution is not not to validate it!
Feb 8 '07 #22

P: n/a
On Thu, 08 Feb 2007 01:54:08 -0600, Ben C <sp******@spam.eggswrote:

On 2007-02-08, Kent Feiler <zz**@zzzz.comwrote:
On 7 Feb, 09:00, Ben C <spams...@spam.eggswrote:
>Ideally the standard would say the browser should display a helpful
error message instead of any part of the page at the first sign of
trouble.
--------------------------------------------------------------------

Post it where? Are we saying that a guy from New York writes some
bad html that causes another guy from Auckland, NZ to get a
screenfull of "helpful" error messages?
That is exactly what I'm suggesting.

The situation in reality is not so far different. Today the guy from
New York writes some bad HTML and the guy from Auckland gets a
screenful of boxes in the wrong place, buttons he can't navigate to,
forms that don't work and all the rest of it.
What nonsense! Who would they help?
HTML is well-specified and easy to validate. If the guy from Auckland
gets an error message, the guy from New York can be certain to get one
too, even if he is using a different browser. The only people it
wouldn't help are those who don't even test their content once in one
browser!

--------------------------------------------------------------------------

Let's flesh out these two guys. The guy from New York couldn't spell
html more than once in every three tries, but he is an expert on Big
Band music from the 1940s. He borrows an html file from a friend,
takes out the text, replaces it as best he can with Benny Goodman,
Duke Ellington, Tommy Dorsey, etc, and has the friend upload it to his
site.

The guy from Auckland has never heard of html. He doesn't want to ever
hear of it. But he is interested in Big Band music from the 1940s. He
accesses the file, but instead information on Mood Indigo, he gets a
bunch of "helpfull" error messages that may as well be written in Urdu
for all the help they give him.

So...what are we saying here? The guy from New York shouldn't be
allowed to write html? Or that he should have to pass a series of
rigorous W3C tests before he's allowed to upload it to an internet
site? Or that the guy from Auckland now is required to learn html,
figure out the problems, and then...what?

This reminds me of Wikepedia. Their attitude is something like, we
can't stop crap from getting into the dictionary, we just hope it gets
improved and/or eliminated over time.

Regards,
Kent Feiler
www.KentFeiler.com
Feb 8 '07 #23

P: n/a
On 2007-02-08, Kent Feiler <zz**@zzzz.comwrote:
[...]
Let's flesh out these two guys. The guy from New York couldn't spell
html more than once in every three tries, but he is an expert on Big
Band music from the 1940s. He borrows an html file from a friend,
takes out the text, replaces it as best he can with Benny Goodman,
Duke Ellington, Tommy Dorsey, etc, and has the friend upload it to his
site.

The guy from Auckland has never heard of html. He doesn't want to ever
hear of it. But he is interested in Big Band music from the 1940s. He
accesses the file, but instead information on Mood Indigo, he gets a
bunch of "helpfull" error messages that may as well be written in Urdu
for all the help they give him.

So...what are we saying here? The guy from New York shouldn't be
allowed to write html? Or that he should have to pass a series of
rigorous W3C tests before he's allowed to upload it to an internet
site? Or that the guy from Auckland now is required to learn html,
figure out the problems, and then...what?
Not saying any of those things.

The guy from New York would either use a well-designed easy-to-use
authoring tool that generates valid HTML, or his friend would fix it for
him before uploading it.

As I mentioned somewhere else, it's authoring tools which seem to
produce the worst HTML of all (and have the least excuse).

I'm not requiring everyone to read the specs or type angle brackets in a
text editor. Just making the point that "error tolerance" usually leads
to more confusion, and that in an ideal world we would not have got into
the state we are in.

Actually my original point was that it is better, given where we are,
that HTML5 specify error handling than not.
Feb 8 '07 #24

P: n/a
Scripsit Henri Sivonen:
>If you wish to have an extensible markup language, you should design
one. It could be XML based, just as virtually anything could, but
that's irrelevant.

Draconian error recovery doesn't work with text/html. A smooth update
to HTML must target text/html.
How is that related to what I wrote? I wrote nothing about Draconian
anything. Au contraire, I explained why error processing requirements are a
bad idea for HTML. They might be a better idea for something completely new.
So far, the CSS (and JavaScript) approaches to future versions seem to
be working better than the HTML and XML approaches. Therefore, it is
reasonable to try to follow the example of CSS.
Not at all. CSS 1, CSS 2, and the CSS 2.1 drafts are all mutually
incompatible. The CSS 3 novelties have been implemented in varying ways that
ensure that once they eventually become official, there will be a transition
period from -moz-foo to foo, if you get the idea.
>Standardized error recovery would convert the big crowd of authors
(those between idiots and geniuses) into non-smart in that sense.

I think I don't exactly follow your reasoning there.
When people read that browser _must_ behave in a certain way and observe
that browsers actually do so, they will treat the features as language
features, not as errors.
So why would making this stuff interoperable be worse than the status
quo? Which is more important, interoperability or making sure people
regard their markup as erroneous?
If interoperability for some features is what you regard as important, then
you should define those features as language rules, as opposite to errors
with prescribed error processing.
>When the source is syntactically malformed, they'll do _something_.

Of course. You seem to be arguing that browser vendors shouldn't agree
on what "something" is.
Right. And international consortia pretending to be collaborative and open
shouldn't get involved in such issues.

For existing malformed documents, browsers behave in different ways, and
they will probably keep doing so indefinitely to avoid breaking existing
pages. There's no need to change this situation in any direction. Defining
error processing rules would just strengthen people's tendencies to sloppy
coding. There is no guarantee that the defined error processing coincides
with what the author _means_.
>If you want to compete with such a modern browser on the market,
against the market leader, then you'll simply have to simulate tag
soup processing the best you can, trying to construct a tree from
something that ain't no tree logically.

This is exactly a key design goal in the development of the HTML5
parsing algorithm: being as close to IE as possible while still
keeping the resulting data structure as a tree - -
But that's reinventing the wheel. We already have tag soup processors.
Anyone can write a new one if he so wishes. Trying to make the situation
more predictable to authors who create malformed markup serves no useful
purpose.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Feb 9 '07 #25

P: n/a
In article <kX*****************@reader1.news.saunalahti.fi> ,
"Jukka K. Korpela" <jk******@cs.tut.fiwrote:
Scripsit Henri Sivonen:
If you wish to have an extensible markup language, you should design
one. It could be XML based, just as virtually anything could, but
that's irrelevant.
Draconian error recovery doesn't work with text/html. A smooth update
to HTML must target text/html.

How is that related to what I wrote?
You said: "It could be XML based".
So far, the CSS (and JavaScript) approaches to future versions seem to
be working better than the HTML and XML approaches. Therefore, it is
reasonable to try to follow the example of CSS.

Not at all. CSS 1, CSS 2, and the CSS 2.1 drafts are all mutually
incompatible.
Though not in a severe real-world-breaking way. When a later spec is in
the letter incompatible with a previous spec, chances are that the later
spec better reflects the real-world implementations.
So why would making this stuff interoperable be worse than the status
quo? Which is more important, interoperability or making sure people
regard their markup as erroneous?

If interoperability for some features is what you regard as important, then
you should define those features as language rules, as opposite to errors
with prescribed error processing.
There a two differences:
1) Non-browser markup consumers are allowed to implement Draconian
behavior on "language features" that are errors.
2) The well-defined processing models may still produces results that
the author didn't want when given non-conforming input.
When the source is syntactically malformed, they'll do _something_.
Of course. You seem to be arguing that browser vendors shouldn't agree
on what "something" is.

Right. And international consortia pretending to be collaborative and open
shouldn't get involved in such issues.
International consortia have indeed avoided facing this issue, which is
why it is dealt with by the WHATWG (which *is* collaborative and open).
For existing malformed documents, browsers behave in different ways, and
they will probably keep doing so indefinitely to avoid breaking existing
pages.
If we accept that Safari works with the real Web well enough, surely
Firefox can switch to Safari's parsing algorithm without severe breakage?
There's no need to change this situation in any direction.
Parties close to the WHATWG seem to disagree with you.
Defining error processing rules would just strengthen people's
tendencies to sloppy coding.
Well, not defining error processing does not stop sloppy coding, as
demonstrated by a large world-wide experiment.
There is no guarantee that the defined error processing coincides
with what the author _means_.
Of course browsers won't have psychic powers, which is why there will
still be value in authors communicating their intent with conforming
markup.
If you want to compete with such a modern browser on the market,
against the market leader, then you'll simply have to simulate tag
soup processing the best you can, trying to construct a tree from
something that ain't no tree logically.
This is exactly a key design goal in the development of the HTML5
parsing algorithm: being as close to IE as possible while still
keeping the resulting data structure as a tree - -

But that's reinventing the wheel. We already have tag soup processors.
Anyone can write a new one if he so wishes.
But you seem to be objecting to documenting how to do it.
Trying to make the situation
more predictable to authors who create malformed markup serves no useful
purpose.
It isn't about making it more predictable for them. It is about making
browsers interoperate when the authoring failure has already happened.
Browsers should compete on UI features instead of competing with error
recovery. Commodifying HTML parsing allows the differentiation effort to
be put into something more productive.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Feb 10 '07 #26

P: n/a
On 2007-02-09, Jukka K. Korpela <jk******@cs.tut.fiwrote:
[...]
For existing malformed documents, browsers behave in different ways, and
they will probably keep doing so indefinitely to avoid breaking existing
pages. There's no need to change this situation in any direction. Defining
error processing rules would just strengthen people's tendencies to sloppy
coding.
Providing a specification for an application that will work to browse
the existing www is a useful activity. At the moment there's a lot of
guesswork and trial-and-error to work out what browsers are doing and to
distinguish bugs from quirks. If WHATWG do some of the analysis and
refinement of this it gives third parties developing working www
browsers a head start.

If you don't change the situation in any direction you leave a lot of
people having to use IE6 and we don't make progress.
>>If you want to compete with such a modern browser on the market,
against the market leader, then you'll simply have to simulate tag
soup processing the best you can, trying to construct a tree from
something that ain't no tree logically.

This is exactly a key design goal in the development of the HTML5
parsing algorithm: being as close to IE as possible while still
keeping the resulting data structure as a tree - -

But that's reinventing the wheel. We already have tag soup processors.
Anyone can write a new one if he so wishes.
Writing a tag soup processor that emulates IE6 is not easy.

Fortunately it is becoming less necessary to emulate IE6 as more
webpages are also tested in Firefox. But this would not have happened if
Firefox had not caught on, and it would not have caught on if it hadn't
supported users' favourite pages, including the malformed ones.

"Reinventing the wheel" is exactly what's happening. The "wheel" is IE6:
it is out of true, has several broken spokes, and people are used to it
and fed up with it in equal measure. You have to make a rolling
replacement.
Feb 10 '07 #27

P: n/a
Henri Sivonen wrote:
>If we accept that Safari works with the real Web well enough, surely
Firefox can switch to Safari's parsing algorithm without severe breakage?
Eh, what?? This thread started with a message that contains the
sentence:
>>So what happened here? I'm guessing that <p>s are not allowed inside a
<spanand FF enforced that rule but IE didn't.
and I expect nonblock elements to be terminated when the surrounding
block ends. So, at least for this one aspect, it's Firefox that deals
with the web in the proper way, if you ask me. It's the others that
should adopt this rule.

--
Bart.
Feb 10 '07 #28

P: n/a
Scripsit Bart Lateur:
This thread started with a message that contains the
sentence:
>>So what happened here? I'm guessing that <p>s are not allowed
inside a <spanand FF enforced that rule but IE didn't.

and I expect nonblock elements to be terminated when the surrounding
block ends. So, at least for this one aspect, it's Firefox that deals
with the web in the proper way, if you ask me. It's the others that
should adopt this rule.
That's one policy, with good arguments behind it. In relatively disciplined
authoring, <span>text<p(where "text" might be long) typically results from
accidental omission of </spanbefore <p(and not using a validator this
time).

In less disciplined authoring, people often don't think in terms of inline
and block elements, or in terms of elements (vs. just tags) at all, so they
might have <span>text<p>text too</p>more text</spanand they really want
the <spanto span all the text. There are strong practical arguments in
favor of applying error handling that is based on this: it's how most
browsers have behaved and still do. Of course, it often results in foolish
appearance, e.g. when emphasis intended for a short phrase applies to all
text down to the end of the document, but at least it would be the same
foolish appearance as in most browsing situations now.

A clever (?) browser could even apply real error recovery, trying to convert
the document into a valid document with minimal (according to some measure)
changes. This could mean that the latter interpretation is applied if there
is an otherwise unmatched </spanlater in the document, whereas your
interpretation is applied if there is no such </span>.

Making your error processing method mandatory - if it were approved by major
browser vendors - would break millions of web pages (and that's an
understatement) that are now displayed roughly as intended, and it would win
nothing. Actually, if an author _knew_ that a browser implies </spanwhen
it encounters <pwhen a <spanelement is open, he could easily leave out
the redundant end tags. Why wouldn't he? The mandatory error processing
would in fact have changed the language rules by making </spanoptional
under certain conditions.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Feb 11 '07 #29

P: n/a
In article <Ke******************@reader1.news.saunalahti.fi >,
"Jukka K. Korpela" <jk******@cs.tut.fiwrote:
Scripsit Bart Lateur:
This thread started with a message that contains the
sentence:
>So what happened here? I'm guessing that <p>s are not allowed
inside a <spanand FF enforced that rule but IE didn't.
and I expect nonblock elements to be terminated when the surrounding
block ends.
....
Making your error processing method mandatory - if it were approved by major
browser vendors - would break millions of web pages (and that's an
understatement) that are now displayed roughly as intended, and it would win
nothing.
If that were the case, Firefox and Safari would break the Web horribly.
<spanisn't one of the reopening elements--not in Gecko, not in WebKit
and not in the HTML5 parsing algorithm.

Test case:
http://software.hixie.ch/utilities/j...21DOCTYPE%20ht
ml%3E%0A%3Cbody%3E%3Cp%3E%3Cspan%3Efoo%3C/p%3E%3Cp%3Ebar%3C/span%3E%3C/p%
3E%3C/body%3E

Compare also with:
http://software.hixie.ch/utilities/j...21DOCTYPE%20ht
ml%3E%0A%3Cbody%3E%3Cspan%3E%3Cp%3Efoo%3C/p%3E%3Cp%3Ebar%3C/p%3E%3C/span%
3E%3C/body%3E

You are probably confusing the tag soup compatibility requirements of
<spanwith those of, say, <i>:
http://software.hixie.ch/utilities/j...21DOCTYPE%20ht
ml%3E%0A%3Cbody%3E%3Cp%3E%3Ci%3Efoo%3C/p%3E%3Cp%3Ebar%3C/i%3E%3C/p%3E%3C/
body%3E
Actually, if an author _knew_ that a browser implies </spanwhen
it encounters <pwhen a <spanelement is open, he could easily leave out
the redundant end tags. Why wouldn't he?
For example because all legacy browsers don't agree as demonstrated by
the message that started this thread.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Feb 11 '07 #30

P: n/a
In article <g6********************************@4ax.com>,
Bart Lateur <ba*********@pandora.bewrote:
Henri Sivonen wrote:
If we accept that Safari works with the real Web well enough, surely
Firefox can switch to Safari's parsing algorithm without severe breakage?

Eh, what??
Well, it was suggested that browsers cannot change their error handling.
There are four successful browser engines that do different things. If
we accept that none of them cause severe breakage, surely any one of
them could switch to the behavior of another one without causing severe
breakage.

Therefore, the is some room for change in HTML error recovery. Not much,
though. The HTML5 parsing algorithm aims to stay within the room for
change by codifying the best practice as a fusion of the behaviors of
the top four browser engines while sticking to the DOM axioms required
by the strictest of the four from the DOM point of view.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Feb 11 '07 #31

P: n/a
Scripsit Henri Sivonen:
>Making your error processing method mandatory - if it were approved
by major browser vendors - would break millions of web pages (and
that's an understatement) that are now displayed roughly as
intended, and it would win nothing.

If that were the case, Firefox and Safari would break the Web
horribly.
Well, they _do_ break millions of pages. ("Horribly" was your word.) Through
years, we've seen complaints about "IE only" pages, and most users of
Firefox or Safari already know that quite a many pages need to be viewed on
IE (or something sufficiently compatible with it) in order to see them
roughly as intended.
<spanisn't one of the reopening elements--not in Gecko,
not in WebKit and not in the HTML5 parsing algorithm.
I won't try to scrutinize your parsing jargon like "reopening", because it's
irrelevant here. But I'll take your word for the HTML5 algorithm being
different from that of the most successful (so far) browser, hence broken by
design if intended to be practical.
>Actually, if an author _knew_ that a browser implies </spanwhen
it encounters <pwhen a <spanelement is open, he could easily
leave out the redundant end tags. Why wouldn't he?

For example because all legacy browsers don't agree as demonstrated by
the message that started this thread.
Which part of "if an author _knew_ - -" did you miss?

At present, authors can know that some browsers allow omission of </span>
(possibly doing what the author wants, or something else), some don't. The
proposed mandatory error processing, _if_ approved and widely implemented,
would imply that authors know what </span_can_ be omitted by certain
rules.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Feb 11 '07 #32

P: n/a
In article <14*****************@reader1.news.saunalahti.fi> ,
"Jukka K. Korpela" <jk******@cs.tut.fiwrote:
Scripsit Henri Sivonen:
Making your error processing method mandatory - if it were approved
by major browser vendors - would break millions of web pages (and
that's an understatement) that are now displayed roughly as
intended, and it would win nothing.
If that were the case, Firefox and Safari would break the Web
horribly.

Well, they _do_ break millions of pages. ("Horribly" was your word.) Through
years, we've seen complaints about "IE only" pages, and most users of
Firefox or Safari already know that quite a many pages need to be viewed on
IE (or something sufficiently compatible with it) in order to see them
roughly as intended.
Actually, "IE only" pages tend to be that way due to incompatible
scripting--not because of HTML parsing.
But I'll take your word for the HTML5 algorithm being
different from that of the most successful (so far) browser, hence broken by
design if intended to be practical.
Do I understand correctly that to be "practical" in your opinion an HTML
parsing algorithm must prescribe the output of the algorithm not to be a
tree under some circumstances?
Actually, if an author _knew_ that a browser implies </spanwhen
it encounters <pwhen a <spanelement is open, he could easily
leave out the redundant end tags. Why wouldn't he?
For example because all legacy browsers don't agree as demonstrated by
the message that started this thread.

Which part of "if an author _knew_ - -" did you miss?
I missed that the condition was meant to be always unsatisfiable. I
thought you meant the condition would be satisfied if the HTML5 parsing
algorithm was adopted in future browsers. Compare with your own text
below:
At present, authors can know that some browsers allow omission of </span>
(possibly doing what the author wants, or something else), some don't. The
proposed mandatory error processing, _if_ approved and widely implemented,
would imply that authors know what </span_can_ be omitted by certain
rules.
--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Feb 11 '07 #33

P: n/a
Scripsit Henri Sivonen:
Actually, "IE only" pages tend to be that way due to incompatible
scripting--not because of HTML parsing.
There are myriads of ways of making pages "IE only".
>But I'll take your word for the HTML5 algorithm being
different from that of the most successful (so far) browser, hence
broken by design if intended to be practical.

Do I understand correctly that to be "practical" in your opinion an
HTML parsing algorithm must prescribe the output of the algorithm not
to be a tree under some circumstances?
I wrote nothing of the kind. You can parse malformed HTML any way you like
and make up some tree. The way you construct the tree reflects your error
handling, which could be just anything, like implying a missing </spanat
the first block-starting tag _or_ at the end of document.
>>>Actually, if an author _knew_ that a browser implies </spanwhen
it encounters <pwhen a <spanelement is open, he could easily
leave out the redundant end tags. Why wouldn't he?

For example because all legacy browsers don't agree as demonstrated
by the message that started this thread.

Which part of "if an author _knew_ - -" did you miss?

I missed that the condition was meant to be always unsatisfiable.
No, I'm afraid you missed logic this time. Under the assumed condition
above, there would be no (significant) browsers that don't agree. As long as
there _are_ "legacy browsers" that play by different rules, the condition
won't be satisfied and the idea of enforcing particular error processing
hasn't been implemented. So where would _that_ take to us? Nowhere. Except
perhaps in a situation where some authors _mistakenly_ think that the goal
has been reached and they can rely on the new rules.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Feb 11 '07 #34

This discussion thread is closed

Replies have been disabled for this discussion.