<a href="https://bytes.com/topic/html-css/answers/464384-html-tidy-vs-html-validator">HTML Tidy vs. HTML Validator

In our last episode,
<11*********************@p10g2000cwp.googlegroups. com>,
the lovely and talented VK
broadcast on comp.infosystems.www.authoring.html:

Hi, After the response on my request from W3C I'm still unclear about Tidy
vs. Validator discrepansies.
Tidy is a lint and a prettyprinter. It doesn't parse.
That started with <IFRAME> issue, but
there is more as I know. Anyway, this very basic HTML page: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Strict//EN"
"http://www.w3.org/TR/html401/strict.dtd">
<html>
<head>
<title>Demo</title>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
</head>
<body>
<iframe src="http://www.w3.org"></iframe>
</body>
</html> gives 0 errors / 0 warnings in Tidy. At the same time it tells me "This
page is not Valid -//W3C//DTD HTML 4.01 Strict//EN!" in W3C HTML
Validator which is totally correct as there is not IFRAME in HTML
Strict. At the same time I understand Tidy's behavior either because no
one need a validator choking in IFRAME - no one would use it then.
Why don't you use loose then? If you have to have IFRAME there are DTDs
that have it, and 4.01 loose is one of them.
Nevertheless Tidy is linked on the w3.org front page and on
<http://www.w3.org/People/Raggett/tidy/> which seems as a direct
endorsement to me.
It is a good lint and prettyprinter. But it doesn't parse. You can even
add tags to Tidy.
In my request to W3C I asked to add IFRAME to HTML Strict, but the
response was that HTML DTD's are frozen so everything will stay as it
is.
The response should have been: if you add elements like IFRAME to
strict, why not call it loose? There is a DTD with IFRAME. It is called
loose.
My question is then: is this Tidy's behavior an illegal adjustment made
by his creators, or it's a W3C informally blessed "loosiness"? If Mr.
Raggett himself could elaborate on this issue it would be great.

--
Lars Eighner us****@larseighner.com http://www.larseighner.com/
"Fascism should more properly be called corporatism, since it is the
merger of state and corporate power."-Benito Mussolini * When you write the
check to pay your taxes, remember there are two l's in "Halliburton."

Mar 4 '06 #2

Ian Rastall

On 4 Mar 2006 05:02:27 -0800, "VK" <sc**********@yahoo.com> wrote:

After the response on my request from W3C I'm still unclear about Tidy
vs. Validator discrepansies.

HTML Tidy is a great utility for pretty-printing your HTML, and it may
clean it up a bit, though I wouldn't trust it to do too much of that.
It's not a validator, though. It's a linter. There is, orbiting
somewhere inside the W3C solar system, some validator software written
by one of the W3C guys. It's shareware, but works great. Sorry I can't
remember the name. One of its benefits is that it validates your
entire site locally, which you can't do with the W3C validator (or the
WDG validator, if your site is large enough.)

Ian
--
http://sundry.ws/

Mar 4 '06 #3

Lars Eighner wrote:

Tidy is a lint and a prettyprinter. It doesn't parse.
It does: change from Strict to Frameset or Transitional and different
tags will give you warnings or not. Also the explanation section in
Tidy window is called "HTML Validator". Try to use say "wrap" attribute
in form textarea and you'll get an error - in "HTML Validator" section.
Sorry, but it is much more than a "prettyprinter" may do. It is a
validator - or a program pretending to by such.

Why don't you use loose then? If you have to have IFRAME there are DTDs
that have it, and 4.01 loose is one of them.

My question was not about what to use. My question was about two
different outcomes (valid / invalid) for the very same page using HTML
Tidy and HTML Validator.

If HTML Tidy is not a validator, then it should say nothing about
<textarea wrap="soft"...> as it is not his business.
If it is (besides anything else) a validator, then it should scream
both about wrap and iframe - in HTML Strict.

If it's "mostly validator but just a prettyprinter in some selected
cases" then it should be spelled somewhere - with the list of
exceptions. Does it have sense?

Mar 4 '06 #4

David Dorward

VK wrote:

Lars Eighner wrote:
Tidy is a lint and a prettyprinter. It doesn't parse.
It does: change from Strict to Frameset or Transitional and different
tags will give you warnings or not. Also the explanation section in
Tidy window is called "HTML Validator".

That is what it says - however it doesn't compare the markup to a DTD,
everything is (as far as I know) internalised, and it makes many errors. It
cannot be trusted as a validator.
Try to use say "wrap" attribute in form textarea and you'll get an error -
in "HTML Validator" section. Sorry, but it is much more than a
"prettyprinter" may do. It is a validator - or a program pretending to by
such.
As Lars said - it is a lint.
If HTML Tidy is not a validator, then it should say nothing about
<textarea wrap="soft"...> as it is not his business.
Not being a validator doesn't prevent it from performing error checking that
a validator would also do.
If it's "mostly validator but just a prettyprinter in some selected
cases" then it should be spelled somewhere - with the list of
exceptions. Does it have sense?

The documentation for tidy is misleading. It shouldn't be marked as a
validator, but as an error checking tool (which doesn't cover everything
that a validator would cover).

--
David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is

Mar 4 '06 #5

In our last episode,
<11**********************@j33g2000cwa.googlegroups .com>,
the lovely and talented VK
broadcast on comp.infosystems.www.authoring.html:

Lars Eighner wrote:
Tidy is a lint and a prettyprinter. It doesn't parse.
It does: change from Strict to Frameset or Transitional and different
tags will give you warnings or not. Also the explanation section in
Tidy window is called "HTML Validator".
I don't know what "in Tidy window" is. I have been using Tidy for a long
time and have never seen a Tidy window. Tidy is not an HTML validator and
the documentation is very clear on that point:

<http://www.w3.org/People/Raggett/tidy/>
Try to use say "wrap" attribute in form textarea and you'll get an error -
in "HTML Validator" section. Sorry, but it is much more than a
"prettyprinter" may do. It is a validator - or a program pretending to by
such.
What HTML Validator section? Tidy is a command-line program. You can run
it in terminal window. Since you post to USENET with a web-browser, perhaps
you are using some web interface to Tidy. That isn't Tidy. Try learning to
use a real computer.

Evidently you think programs have minds of their own. The author says in
the documentation that Tidy does not parse, he doesn't pretend otherwise,
but somehow you just know the program has pretenses of it's own.
Why don't you use loose then? If you have to have IFRAME there are DTDs
that have it, and 4.01 loose is one of them.

My question was not about what to use. My question was about two
different outcomes (valid / invalid) for the very same page using HTML
Tidy and HTML Validator.
And the answer was given to you here: Tidy is not a parser. Tidy cannot
read a DTD, and it would be totally perplexed by an arbitrary sgml document
with an arbitrary DTD.
If HTML Tidy is not a validator, then it should say nothing about
<textarea wrap="soft"...> as it is not his business.
Wrap is not an attribute for any element in strict, loose, or frameset 4.01.
Tidy does not have to parse the document to know that.

A spelling check does not have to know English syntax to flag "wraarkx."
There simply is no such word in the language. But that does not mean that a
document which passes a spelling checker is English.
If it is (besides anything else) a validator, then it should scream
both about wrap and iframe - in HTML Strict.
Iframe is an element in some 4.01 DTD; wrap is not an attribute in any.
The spelling checker that chokes on "wraarkx," will pass "principle."
That does not mean that principle is used correctly.
If it's "mostly validator but just a prettyprinter in some selected
cases" then it should be spelled somewhere - with the list of
exceptions. Does it have sense?

It has documentation, which you obviously have not chosen to read.
I suspect you have not installed Tidy at all but are using a web
browser to run programs via web interfaces on other people's sites.
--
Lars Eighner us****@larseighner.com http://www.larseighner.com/
College: The fountains of knowledge, where everyone goes to drink.

Mar 4 '06 #6

Toby Inkster

VK wrote:

Lars Eighner wrote:
Tidy is a lint and a prettyprinter. It doesn't parse.

Sorry, but it is much more than a "prettyprinter" may do. It is a
validator - or a program pretending to by such.

Lars didn't say it was just a prettyprinter. He said it was a *linter* and
prettyprinter.

A linter does many of the same things that a validator does, though it
will allow some technically invalid things, which the author of the linter
decided were allowable, and may object to some technically valid things
that the author of the linter decided were objectionable.

Another difference between a linter and a validator is that the linter
will attempt to fix the errors it finds, whereas a validator will just
tell you about them.

Tidy is not a validator.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Now Playing ~ ./police/every_breath_you_take.ogg

Mar 4 '06 #7

Toby Inkster wrote:

VK wrote:
Lars Eighner wrote:
Tidy is a lint and a prettyprinter. It doesn't parse.

Sorry, but it is much more than a "prettyprinter" may do. It is a
validator - or a program pretending to by such.

Lars didn't say it was just a prettyprinter. He said it was a *linter* and
prettyprinter.

<http://tidy.sourceforge.net/docs/tidy_man.html>
tidy - validate, correct, and pretty-print HTML files
(version: 14 February 2006)

<http://tidy.sourceforge.net/license.html>
HTML Tidy
HTML parser and pretty printer
Copyright (c) 1998-2003 World Wide Web Consortium

.... so much of semantic games - a lawyer could be proud. One thing was
right though: I am using "Html Validator" extension for Firefox:
<https://addons.mozilla.org/extensions/moreinfo.php?id=249&application=firefox>
which is <q>based on Tidy</q>

I am not using Tidy itself as the only stay-alone Windows version is
outdated and not supported.

So let's us see: Html Validator is based on Tidy; Tidy validates,
corrects, and pretty-prints HTML files and is under W3C license.

Nevertheless the wording everywhere is chosen so carefully that no one
can officially require a Tidy-validated HTML to be W3C valid HTML -
while the wording is targeted to produce exactly that *impression*.

I do not care too much about validation itself: I have enough of
experience to produce correct HTML w/o any helpers.
I'd just like to see one straight statement of a kind "Yes, iframe is
not valid in HTML Strict but we prefer to not make big deal of it to
not look stupid". Maybe not this exact statement :-) but not these
lawyer games as above.

Mar 4 '06 #8

Benjamin Niemann

VK wrote:

Toby Inkster wrote:
VK wrote:
> Lars Eighner wrote:
>
>> Tidy is a lint and a prettyprinter. It doesn't parse.
>
> Sorry, but it is much more than a "prettyprinter" may do. It is a
> validator - or a program pretending to by such.
Lars didn't say it was just a prettyprinter. He said it was a *linter*
and prettyprinter.

<http://tidy.sourceforge.net/docs/tidy_man.html>
tidy - validate, correct, and pretty-print HTML files
(version: 14 February 2006)

<http://tidy.sourceforge.net/license.html>
HTML Tidy
HTML parser and pretty printer
Copyright (c) 1998-2003 World Wide Web Consortium

... so much of semantic games - a lawyer could be proud. One thing was
right though: I am using "Html Validator" extension for Firefox:

<https://addons.mozilla.org/extensions/moreinfo.php?id=249&application=firefox> which is <q>based on Tidy</q>

I am not using Tidy itself as the only stay-alone Windows version is
outdated and not supported.

So let's us see: Html Validator is based on Tidy; Tidy validates,
corrects, and pretty-prints HTML files and is under W3C license.

Nevertheless the wording everywhere is chosen so carefully that no one
can officially require a Tidy-validated HTML to be W3C valid HTML -
while the wording is targeted to produce exactly that *impression*.

I do not care too much about validation itself: I have enough of
experience to produce correct HTML w/o any helpers.
I'd just like to see one straight statement of a kind "Yes, iframe is
not valid in HTML Strict but we prefer to not make big deal of it to
not look stupid". Maybe not this exact statement :-) but not these
lawyer games as above.

'Validation' (in the context of HTML and other SGML applications) has a very
specific and well-defined meaning. It means basically that a HTML document
syntactically correct and satisfy the rules as declared in the DTD (and the
abcense of IFRAME in strict.dtd is one of these rules). The detailed
definition of validation is buried in the depth of the SGML specification.

'Valid' does not imply that the document is correct (conforming) HTML,
because there are more rules in the HTML spec that cannot be expressed in a
DTD. But on the other hand validation is required for conforming HTML
documents.

If you want to create correct HTML documents, using a validator like
validator.w3c.org is essential. Once your document passes the validation,
you can use tidy, which may find more errors or things that are not errors
but just bad practice.

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/

Mar 4 '06 #9

Lars Eighner wrote :

In our last episode,
<11**********************@j33g2000cwa.googlegroups .com>,
the lovely and talented VK
broadcast on comp.infosystems.www.authoring.html:

Lars Eighner wrote:
Tidy is a lint and a prettyprinter. It doesn't parse.
It reports possible errors, mistakes, etc.

It does: change from Strict to Frameset or Transitional and different
tags will give you warnings or not. Also the explanation section in
Tidy window is called "HTML Validator".

I don't know what "in Tidy window" is. I have been using Tidy for a long
time and have never seen a Tidy window. Tidy is not an HTML validator

HTML VALIDATOR (based on Tidy)
http://users.skynet.be/mgueury/mozilla/index.html
and the documentation is very clear on that point:

<http://www.w3.org/People/Raggett/tidy/>
Try to use say "wrap" attribute in form textarea and you'll get an error -
in "HTML Validator" section. Sorry, but it is much more than a
"prettyprinter" may do. It is a validator - or a program pretending to by
such.

What HTML Validator section? Tidy is a command-line program. You can run
it in terminal window. Since you post to USENET with a web-browser, perhaps
you are using some web interface to Tidy. That isn't Tidy. Try learning to
use a real computer.

Evidently you think programs have minds of their own. The author says in
the documentation that Tidy does not parse, he doesn't pretend otherwise,
but somehow you just know the program has pretenses of it's own.

HTML VALIDATOR (based on Tidy)
http://users.skynet.be/mgueury/mozilla/index.html

Why don't you use loose then? If you have to have IFRAME there are DTDs
that have it, and 4.01 loose is one of them.
My question was not about what to use. My question was about two
different outcomes (valid / invalid) for the very same page using HTML
Tidy and HTML Validator.

And the answer was given to you here: Tidy is not a parser. Tidy cannot
read a DTD, and it would be totally perplexed by an arbitrary sgml document
with an arbitrary DTD.
If HTML Tidy is not a validator, then it should say nothing about
<textarea wrap="soft"...> as it is not his business.

Wrap is not an attribute for any element in strict, loose, or frameset 4.01.
Tidy does not have to parse the document to know that.

A spelling check does not have to know English syntax to flag "wraarkx."
There simply is no such word in the language. But that does not mean that a
document which passes a spelling checker is English.
If it is (besides anything else) a validator, then it should scream
both about wrap and iframe - in HTML Strict.

Iframe is an element in some 4.01 DTD; wrap is not an attribute in any.
The spelling checker that chokes on "wraarkx," will pass "principle."
That does not mean that principle is used correctly.

I really think the original poster had very *valid* - if I may use such
adjective :) - and relevant questions because I had the same questions
too when Marc Gueury started HTML validator based on Tidy.

Icab, Dillo browsers and others report errors found without being formal
validators:

http://scholar.lib.vt.edu/staff/hand...ing/#toperrors

https://bugzilla.mozilla.org/show_bug.cgi?id=6211

I've made a similar request for IE 7:
"Implement a feature which will report back to the user if a page uses
valid code, has markup and/or parsing CSS errors: some sort of a Webpage
Quality indicator icon (smiley or green check for valid page, frown or
red 'X' when invalid) on the statusbar (or somewhere else) which when
clicked would report more info to the user and give him more options
among which one would be to validate the page with the World Wide Web
Consortium W3C validator. Implement something like HTML Tidy as an
extension or an option into IE 7 and for IE 7 users."
http://channel9.msdn.com/wiki/defaul...andardsSupport
If it's "mostly validator but just a prettyprinter in some selected
cases" then it should be spelled somewhere - with the list of
exceptions. Does it have sense?

It has documentation, which you obviously have not chosen to read.
I suspect you have not installed Tidy at all but are using a web
browser to run programs via web interfaces on other people's sites.

Some (many?) are using HTML validator based on Tidy
http://users.skynet.be/mgueury/mozilla/index.html
and I sure find it very useful. It does find errors or possible ones
that the W3C validator will not report. Yes, sometimes, it will also
report errors which are not errors but then it's possible to file bugs too.

E.g.:

<del> and <ins> as block-level elements:
http://sourceforge.net/tracker/index...59&atid=390963
Testcase:
http://www.gtalbot.org/NvuSection/TestingTidy.html

Warning about empty action attribute in Form
http://sourceforge.net/tracker/index...59&atid=390963

Gérard
--
remove blah to email me

Mar 5 '06 #10

In article <46************@uni-berlin.de>,
Gérard Talbot <ne***********@gtalbot.org> wrote:

HTML VALIDATOR (based on Tidy)
http://users.skynet.be/mgueury/mozilla/index.html HTML VALIDATOR (based on Tidy)
http://users.skynet.be/mgueury/mozilla/index.html

Repeatedly calling it a validator does not make it one.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Mar 5 '06 #11

In our last episode,
<hs****************************@news.fv.fi>,
the lovely and talented Henri Sivonen
broadcast on comp.infosystems.www.authoring.html:

In article <46************@uni-berlin.de>,
Gérard Talbot <ne***********@gtalbot.org> wrote:
HTML VALIDATOR (based on Tidy)
http://users.skynet.be/mgueury/mozilla/index.html HTML VALIDATOR (based on Tidy)
http://users.skynet.be/mgueury/mozilla/index.html

Repeatedly calling it a validator does not make it one.

What is more, someone's website interface is NOT Tidy, and what the
website interface calls itself is the responsibility of the author of the
website and no one else. What exactly does "based on" mean?

HTML VALIDATOR (based on Tidy) is not Tidy any more than Google Groups is
USENET.

--
Lars Eighner us****@larseighner.com http://www.larseighner.com/
War hath no fury like a noncombatant.
- Charles Edward Montague

Mar 5 '06 #12

Benjamin Niemann wrote:

'Validation' (in the context of HTML and other SGML applications) has a very
specific and well-defined meaning. It means basically that a HTML document
syntactically correct and satisfy the rules as declared in the DTD (and the
abcense of IFRAME in strict.dtd is one of these rules). The detailed
definition of validation is buried in the depth of the SGML specification.

'Valid' does not imply that the document is correct (conforming) HTML,
because there are more rules in the HTML spec that cannot be expressed in a
DTD. But on the other hand validation is required for conforming HTML
documents.

So briefly it is possible to have a valid(ated) HTML 4.01 Strict
document which is not conforming to HTML standards? Despite I can
(vaguely) imagine what you're talking about, this fine semantical
distinction may be interesting only to some bored academical mind. The
regular user just want to know if his page is standard-compliant or
not; does it give a green banner (green sign) or red banner (red sign).
Trying to bring more in that picture is a sophicticated abuse of a
non-profoundly-experienced users IMHO.
They came to <http://validator.w3.org/> or installed Html Validator
because one says that "follow standards is good, Microsoft is bad".
They did not do it to be cheated by a bunch semantically experinced
lawyers, did they?

Mar 5 '06 #13

David Dorward

VK wrote:

So briefly it is possible to have a valid(ated) HTML 4.01 Strict
document which is not conforming to HTML standards?
Yes. For example:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<title>Test</title>
<h1>Test</h1>
<blockquote cite="John F. Keannedy">
<p>Ich bien ein Berliner</p>
</blockquote>

Now lets see the spec:

cite = uri [CT]
The value of this attribute is a URI that designates a source
document or message.

It is valid, since it conforms to the syntactic rules expressed by the DTD,
but it is non-conforming since it doesn't meet the additional restraints
imposed by the prose.
Despite I can (vaguely) imagine what you're talking about, this fine
semantical distinction may be interesting only to some bored academical
mind.
Given a tool which actually presents the cite to the user, this has
practical implications.
The regular user just want to know if his page is standard-compliant or
not; does it give a green banner (green sign) or red banner (red sign).
The problem here, which isn't helped by the introductions to any of the
tools being discussed, is that that the tools only check against a subset
of the rules for standards compliance.
Trying to bring more in that picture is a sophicticated abuse of a
non-profoundly-experienced users IMHO.

It isn't. It is a recognition that the tools are limited.

--
David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is

Mar 5 '06 #14

In our last episode,
<11**********************@u72g2000cwu.googlegroups .com>, the lovely and
talented VK broadcast on comp.infosystems.www.authoring.html:

So briefly it is possible to have a valid(ated) HTML 4.01 Strict
document which is not conforming to HTML standards?
Yes. But it is impossible to have a conforming document unless it
validates. A pretty good example is attribute values. href is
supposed to have a value that is a uri in some form, relative or absolute.
But you can put any nonsense in there, and it will validate. You can have
width="blue" and so forth. blockquote is supposed to be for lengthy
quotations, but if you use it just to make text indented, it will validate
because the validator has no way of knowing that you deliberately twarted
the standard.
Despite I can
(vaguely) imagine what you're talking about, this fine semantical
distinction may be interesting only to some bored academical mind. The
regular user just want to know if his page is standard-compliant or
not; does it give a green banner (green sign) or red banner (red sign).
Trying to bring more in that picture is a sophicticated abuse of a
non-profoundly-experienced users IMHO.
They came to <http://validator.w3.org/> or installed Html Validator
because one says that "follow standards is good, Microsoft is bad".
They did not do it to be cheated by a bunch semantically experinced
lawyers, did they?

Well, you know, they could download the standard, read it, understand it,
comply with it, and use the validator to check for careless errors. It's
one thing if they are just putting up a personal page for the amusement of
their friends (or mostly for their own amusement). But everyone thinks he
is a "Web Designer" or a "Webmaster" if he can put up a page that is not too
broken for him to view on his own browser.

The fact that you can download CAD software doesn't make you a bridge
designer. Our shores would be piled high with water-bloated bodies if
governments hired bridge designers the way some companies hire "webmasters"
and "web designers."

Is it really asking too much that these people learn something about what
they are doing?

Face it, someone who posts to USENET with google groups is a moron. Someone
who uses a web interface to a validator or a lint instead of downloading a
sgml checker or the lint and running it himself is an idiot. Woo-hoo! They
read a tutorial on the web, and now they are a Webmaster! What crap.

--
Lars Eighner us****@larseighner.com http://www.larseighner.com/
Cole's Law: Thinly sliced cabbage.

Mar 5 '06 #15

Lars Eighner wrote:

The fact that you can download CAD software doesn't make you a bridge
designer. Our shores would be piled high with water-bloated bodies if
governments hired bridge designers the way some companies hire "webmasters"
and "web designers."
Oh I see now - the old song to my ears :-) There must be Web standards
no one can fully understand exept a narrow set of "carefully selected
people" (a quote from another place and time).

:-)

Unfortunately your bridge building analogy is flawed, because unlike
engineering, programming, medicine etc. HTML standards "real
understanding" are not a self-contained publically useful knowledge nor
a separate profession. I hope you have no illusions on this matter.

Coming back to the original question about Tiny and Html Validator:
Does anyone have a recent Tiny version? Does it validate HTML Strict
with <iframe> in it? Does it validate HTML Frameset with <iframe> in
it?

An outdated Window version I'm having gives all kind of crap ("seems
like it's HTML 3.1...")

I still would like to find out if (and where) the patch was applied: by
Tiny itself or by Html Validator.
Face it, someone who posts to USENET with google groups is a moron.
Or a person who prefers to take the disadvantages in order to protect
the personality - like me. I mention sometimes my customers'
bizarrities or project idioticies and I prefer to not be connected with
any particular person.

Someone who's judging on people by used media seems kind of moron to
me.
Someone
who uses a web interface to a validator or a lint instead of downloading a
sgml checker or the lint and running it himself is an idiot. Woo-hoo! They
read a tutorial on the web, and now they are a Webmaster! What crap.

Agree. Just do not fell into another allusion: that someone who learned
all possible Web standards and their implications is a good web-master
by definition. She still may be not able produce a single functioning
page with an acceptable accessible design.

Mar 5 '06 #16

In our last episode,
<11**********************@j33g2000cwa.googlegroups .com>,
the lovely and talented VK
broadcast on comp.infosystems.www.authoring.html:

Coming back to the original question about Tiny and Html Validator:
Does anyone have a recent Tiny version? Does it validate HTML Strict
with <iframe> in it? Does it validate HTML Frameset with <iframe> in
it?
One more time: TIDY DOES NOT VALIDATE. It doesn't validate elements like
iframe; it doesn't validate documents in HTML; it doesn't valitdate
documents in XHTML; it doesn't validate strict, loose, or frameset; it
doesn't validate parking tickets. Tidy is not a validator.

Tidy will pass iframe when it is told it is linting html because iframe is
an element in some versions of html; Tidy will throw an exception to
wrap because wrap is not an attribute in any version of html.

If you want a validator, get SP or OpenSP. A typical job flow should
be something like:

editor -> preprocessor if you use one -> Tidy ->
untidy filter if you care to write one to cater to non-conforming
browsers, aka IE, that cannot handle white space correctly ->
sgml checker like nsgmls (in SP) or onsgmls (in OpenSP)

That still doesn't mean you have a standards-compliant page because
none of this will save you if you have used blockquote to indent
text or written nonsense as the value of href.

here is an example Makefile (Gnu make)

site:
$(MAKE) clean
$(MAKE) all
cd archives && $(MAKE) clean && $(MAKE) all
$(MAKE) install

objects = index.html policies.html support.html notfound.html homenews.html \
2006-home.css 2006-formal.css

all : $(objects)

styles: 2006-home.css 2006-formal.css

%.html:
php ~/allphp/main.php $@ | tidy -config /usr/local/data/etc/tidyrc | untidy.pl > $@
onsgmls -s $@

%.css :
php ~/allphp/main_css.php $@ > $@

install :
-mv *.css stylescss/

clean :
-rm *.html
-rm images_used.txt
-rm *.css

Someone who's judging on people by used media seems kind of moron to
me.
And someone who refuses to identify himself cannot expect his opinions to
count for anything.

Someone
who uses a web interface to a validator or a lint instead of downloading a
sgml checker or the lint and running it himself is an idiot. Woo-hoo! They
read a tutorial on the web, and now they are a Webmaster! What crap.

Agree. Just do not fell into another allusion: that someone who learned
all possible Web standards and their implications is a good web-master
by definition. She still may be not able produce a single functioning
page with an acceptable accessible design.

Yes, even bridges designed by engineers sometimes fall down. But I'd prefer
to take my chances with a professionally designed bridge.

--
Lars Eighner us****@larseighner.com http://www.larseighner.com/
War On Terrorism: Joe McCarthy Brigade
"The decadent left in its enclaves on the coasts is not dead -- and may well
mount a fifth column." Andrew Sullivan, _The New Republic_

Mar 5 '06 #17

Lars Eighner wrote:

One more time: TIDY DOES NOT VALIDATE. It doesn't validate elements like
iframe; it doesn't validate documents in HTML; it doesn't valitdate
documents in XHTML; it doesn't validate strict, loose, or frameset; it
doesn't validate parking tickets. Tidy is not a validator.

Are you one of Tidy project contributors?
If so then where could I find you on the sourceforge project page?
If not then let me go by what the official Tidy documentation says, not
by your personal impression.

One more time (emphasis is mine):
<http://tidy.sourceforge.net/docs/tidy_man.html>
tidy - <em>validate</em>, correct, and pretty-print HTML files
For HTML variants, it detects and corrects many common coding errors
and strives to produce visually equivalent markup that is both
<em>W3C compliant</em> and works on most browsers.

You are right though that Tidy doesn't validate HTML pages in the sense
of W3C Validator.
It seems simply chokes on DTD, at least based on how it was acting on
the OP iframe test:- on Strict it says "looks like Transitional", on
Frameset it says "DTD is not matching content"(?!) etc.

Overall I'm kind of out of idea what the f... it really is. The
documentation is layed out so nicely that one may conclude that it also
validates parking tickets and makes the best ice-cream in the area:-
though nothing is promised in strict terms.

Nevertheless one of my questions is answered: "iframe-tolerance" in
Html Validator is a proprietary patch of Html Validator authors, not a
Tidy one.

The second part of my question: "Is W3C aware and cares?" (as Firefox
is W3C beloved standard baby) seems impossible to answer. It seems to
be too strong mutual support between all involved members (W3C,
Mr.Raggett, Tidy team, Mozilla Foundation team)

Mar 5 '06 #18

Henri Sivonen wrote :

In article <46************@uni-berlin.de>,
Gérard Talbot <ne***********@gtalbot.org> wrote:
HTML VALIDATOR (based on Tidy)
http://users.skynet.be/mgueury/mozilla/index.html
HTML VALIDATOR (based on Tidy)
http://users.skynet.be/mgueury/mozilla/index.html

Repeatedly calling it a validator does not make it one.

I agree. "HTML Validator" by M. Gueury is not an HTML validator in the
SGML sense (validation = verifying syntax and comparing with rules
provided by DTD). I was replying to Lars specifically whom, I thought,
had no clue regarding what triggered the OP to post his questions.

Lars was saying:
Try learning to
use a real computer.

Evidently you think programs have minds of their own. The author says in
the documentation that Tidy does not parse, he doesn't pretend otherwise,
but somehow you just know the program has pretenses of it's own.
and
I suspect you have not installed Tidy at all but are using a web
browser to run programs via web interfaces on other people's sites.

I want to repeat that IMO the original poster had valid and relevant
questions about validation and Tidy to begin with.

Gérard
--
remove blah to email me

Mar 5 '06 #19

Lars Eighner wrote :

In our last episode,
<hs****************************@news.fv.fi>,
the lovely and talented Henri Sivonen
broadcast on comp.infosystems.www.authoring.html:
In article <46************@uni-berlin.de>,
Gérard Talbot <ne***********@gtalbot.org> wrote:
HTML VALIDATOR (based on Tidy)
http://users.skynet.be/mgueury/mozilla/index.html HTML VALIDATOR (based on Tidy)
http://users.skynet.be/mgueury/mozilla/index.html

Repeatedly calling it a validator does not make it one.

What is more, someone's website interface is NOT Tidy, and what the
website interface calls itself is the responsibility of the author of the
website and no one else.

If you mean to say that Marc Gueury may have mis-chosen the name of his
extension, then I think you're right. Just like CSE HTML validator is
not a validator in the SGML sense.
What exactly does "based on" mean?
In my mind, "based on" means that it started with Tidy but then was
added more features. Likewise.
HTML VALIDATOR (based on Tidy) is not Tidy any more than Google Groups is
USENET.

I really do not want to fight with you on this. Tidy is a software. And
I'm sure Tidy is at the basis of the Marc Gueury's Firefox extension
(whatever we call it) and its features.

Gérard
--
remove blah to email me

Mar 5 '06 #20

Lars Eighner wrote :

[snipped]

Well, you know, they could download the standard, read it, understand it,
comply with it, and use the validator to check for careless errors. It's
one thing if they are just putting up a personal page for the amusement of
their friends (or mostly for their own amusement). But everyone thinks he
is a "Web Designer" or a "Webmaster" if he can put up a page that is not too
broken for him to view on his own browser.

The fact that you can download CAD software doesn't make you a bridge
designer. Our shores would be piled high with water-bloated bodies if
governments hired bridge designers the way some companies hire "webmasters"
and "web designers."
I agree with the above.

Is it really asking too much that these people learn something about what
they are doing?
Now, that would be a very interesting post to discuss. Remember that
recently google.com examined over 1 billion webpages and reported stats
regarding the semantic of those 1 billion webpages.

My own webhost company has an help site for making webpages and they
recommend to use Netscape Communicator's Composer. That's right: they
are recommending to use a sofware which was designed more than 10 years
ago... I have asked my webhost company to at least modernize that
webpage but they refused.

Face it, someone who posts to USENET with google groups is a moron.

You're harsch.

Someone
who uses a web interface to a validator or a lint instead of downloading a
sgml checker or the lint and running it himself is an idiot. Woo-hoo!

You're over-excessively harsch here, I'd say.
In any case, there's no need for name-calling here.

They
read a tutorial on the web, and now they are a Webmaster! What crap.

What you don't say is that the inherent, intrinsec quality of so many
tutorials on the web are rather bad to begin with.

What you don't say is that the intrinsec quality and inherit merits of
pre-fabricated scripts coming from DreamWeaver 2, 3, 4 (and other HTML
editor like CoffeeCup something) are very debatable or plain wrong. It's
no surprise if we stumbled on these later.

Same thing with so many javascript copy-N-paste sites, popup-maker
sites, popup-generator sites.

What I do not agree with (do not accept) in your statements it is that
they over-simplify (simplistic) the causes of the problems. Of course,
the newbie web author has a part, even a large part, of responsibility
(particularly when maintaining and trying to improve his sites) but he
should be better guided too (1)(2) and people like you and me should be
fighting more the bad sites/tutorials/softwares (even browser
manufacturers) giving bad, utterly bad advices, bad recommendations. You
can find do-as-you-pleases advices, Homer-Simpson-like recommendations
pretty much everywhere at microsoft.com, you know.

(1) I do my best with this:
http://www.gtalbot.org/NvuSection/

(2) Browser bug sections:
http://www.gtalbot.org/BrowserBugsSection/
Gérard
--
remove blah to email me

Mar 5 '06 #21

Lars Eighner wrote :

In our last episode,
<11**********************@u72g2000cwu.googlegroups .com>, the lovely and
talented VK broadcast on comp.infosystems.www.authoring.html:

Well, you know, they could download the standard, read it, understand it,
comply with it,
Not everyone can do this. That's one reason why Nvu was able to be a
rather moderate-good success, I'd say.

and use the validator to check for careless errors. It's one thing if they are just putting up a personal page for the amusement of
their friends (or mostly for their own amusement). But everyone thinks he
is a "Web Designer" or a "Webmaster" if he can put up a page that is not too
broken for him to view on his own browser.

The fact that you can download CAD software doesn't make you a bridge
designer. Our shores would be piled high with water-bloated bodies if
governments hired bridge designers the way some companies hire "webmasters"
and "web designers."

Is it really asking too much that these people learn something about what
they are doing?

For some people, yes. E.g.: John and Mary got married. John, an
assistant manager at Wal-Mart with no college scholarship/study, uploads
the photos of the mariage onto a site where all the visitors, families
can view the images. The site markup code is awful but no one complains,
bothers or even notices because

a) the site achieves the expected results in the currently frequently
used browsers (MSIE 6, Firefox 1.5)
b) people feel that the rest is irrelevant, not that important, "we're
not techies", "we're not computer-savvy", "results is what matters", etc.

I want here to underline that nevertheless, I strongly believe+advocate
that HTML editors (like CoffeeCup, MS FrontPage, Nvu, DreamWeaver)
should all and completely comply with ATAG 1.0 priorities,
recommendations, guidelines and checkpoints. I have reported my complete
and detailed position on this (and elsewhere) at

Front Page Feedback webpage

http://channel9.msdn.com/wiki/defaul...ntPageFeedback

- Basic Authoring Tool Accessibility Guidelines 1.0 Priorities

- Built-in HTML validator and CSS validator in Frontpage

I have also denounced HTML editors which provide bad javascript/DHTML
code before. It's entirely unacceptable to me that CoffeeCup 2005
software does not provide entirely interoperable, cross-browser DHTML
pre-fabricated widget/code.

It is entirely unacceptable for documentation projects (like MSDN) to
have examples that don't work or don't interoperate (cross-browser),
don't trigger web standards compliant rendering mode in IE6, don't use
valid markup code, etc.
MSDN is possibly the most shameful aspect of Microsoft so-called
commitment to comply with W3C web standards.

Gérard
--
remove blah to email me

Mar 5 '06 #22

Lars Eighner wrote :

What is more, someone's website interface is NOT Tidy, and what the
website interface calls itself is the responsibility of the author of the
website and no one else. What exactly does "based on" mean?

HTML VALIDATOR (based on Tidy) is not Tidy any more than Google Groups is
USENET.

"(...) This extension is based on HTML Tidy, it includes the code of
Tidy without changes. From my experience, the errors are the same in the
2 programs except that Tidy shows more errors. It shows errors about
attributes values. And it tries also to clean the page of useless, empty
tags and so on. (...)"

You can read more at question 13 of this page:
http://users.skynet.be/mgueury/mozilla/faq.html

Gérard
--
remove blah to email me

Mar 5 '06 #23

Ian Rastall wrote :

On 4 Mar 2006 05:02:27 -0800, "VK" <sc**********@yahoo.com> wrote:
After the response on my request from W3C I'm still unclear about Tidy
vs. Validator discrepansies.
HTML Tidy is a great utility for pretty-printing your HTML, and it may
clean it up a bit, though I wouldn't trust it to do too much of that.
It's not a validator, though. It's a linter. There is, orbiting
somewhere inside the W3C solar system, some validator software written
by one of the W3C guys. It's shareware, but works great. Sorry I can't
remember the name.

It must be "A Real Validator":

http://arealvalidator.com/
Offline HTMLHelp.com Validator
http://www.htmlhelp.com/tools/valida.../index.html.en
"The Offline Validator is geared toward Unix users. Windows users may
prefer A Real Validator by the same author."

One of its benefits is that it validates your entire site locally, which you can't do with the W3C validator (or the
WDG validator, if your site is large enough.)

Ian

Gérard
--
remove blah to email me

Mar 5 '06 #24

Toby Inkster

VK wrote:

If not then let me go by what the official Tidy documentation says, not
by your personal impression.

The official Toby documentation says that I can fly, walk on water and
breathe fire.

Don't take the official documentation at face value. Evaluate the product
for yourself and see if it lives up to its advertising.

Tidy will process an HTML 4 Strict document with an IFRAME -- clearly an
invalid HTML 4 Strict document -- but does not complain that the document
is invalid. Hence it is not a validator, or if it is, must have at least
one major bug.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

Mar 6 '06 #25

Eric B. Bednarz

"VK" <sc**********@yahoo.com> writes:

One more time
(Baby, baby...)

Britney Spears, right?
<http://tidy.sourceforge.net/docs/tidy_man.html> You are right though that Tidy doesn't validate HTML pages in the sense
of W3C Validator.
Tidy does not validate anything in the sense of a validating SGML
system.
It seems simply chokes on DTD,
It does not choke at all, it just does not read the document type
declaration subset at all.
"looks like Transitional"

....and beyond the infinite.

Irregular expressions considered harmful.
--
||| hexadecimal EBB
o-o decimal 3771
--oOo--( )--oOo-- octal 7273
205 goodbye binary 111010111011

Mar 6 '06 #26

Marc Gueury

Hi Lars, and all,

It seems I need to defend my small HTML Validator extension for Firefox
name :)

Despite that the current version has only the tidy algorithm. The next
version will propose 2 algorithms to check a page. Tidy and OpenJade
(the same SGML parser than the validator.w3.org) use.

Please read this for more info:

http://users.skynet.be/mgueury/mozilla/preview_080.html
Some interesting facts:
- It took me a lot more time to integrate OpenJade in firefox cleanly.
- Despite, what people could think, OpenJade is not a very well
maintained project. Nobody answers to questions and the production
code does not compile on most platforms. It is really a pity :/
- Unlike tidy that has a good active community.

Regards,

Marc
Lars Eighner wrote: In our last episode,
<11**********************@j33g2000cwa.googlegroups .com>,
the lovely and talented VK
broadcast on comp.infosystems.www.authoring.html:

Coming back to the original question about Tiny and Html Validator:
Does anyone have a recent Tiny version? Does it validate HTML Strict
with <iframe> in it? Does it validate HTML Frameset with <iframe> in
it?

One more time: TIDY DOES NOT VALIDATE. It doesn't validate elements like
iframe; it doesn't validate documents in HTML; it doesn't valitdate
documents in XHTML; it doesn't validate strict, loose, or frameset; it
doesn't validate parking tickets. Tidy is not a validator.

Tidy will pass iframe when it is told it is linting html because iframe is
an element in some versions of html; Tidy will throw an exception to
wrap because wrap is not an attribute in any version of html.

If you want a validator, get SP or OpenSP. A typical job flow should
be something like:

editor -> preprocessor if you use one -> Tidy ->
untidy filter if you care to write one to cater to non-conforming
browsers, aka IE, that cannot handle white space correctly ->
sgml checker like nsgmls (in SP) or onsgmls (in OpenSP)

That still doesn't mean you have a standards-compliant page because
none of this will save you if you have used blockquote to indent
text or written nonsense as the value of href.

here is an example Makefile (Gnu make)

site:
$(MAKE) clean
$(MAKE) all
cd archives && $(MAKE) clean && $(MAKE) all
$(MAKE) install

objects = index.html policies.html support.html notfound.html homenews.html \
2006-home.css 2006-formal.css

all : $(objects)

styles: 2006-home.css 2006-formal.css

%.html:
php ~/allphp/main.php $@ | tidy -config /usr/local/data/etc/tidyrc | untidy.pl > $@
onsgmls -s $@

%.css :
php ~/allphp/main_css.php $@ > $@

install :
-mv *.css stylescss/

clean :
-rm *.html
-rm images_used.txt
-rm *.css

Someone who's judging on people by used media seems kind of moron to
me.

And someone who refuses to identify himself cannot expect his opinions to
count for anything.

Someone
who uses a web interface to a validator or a lint instead of downloading a
sgml checker or the lint and running it himself is an idiot. Woo-hoo! They
read a tutorial on the web, and now they are a Webmaster! What crap.

Agree. Just do not fell into another allusion: that someone who learned
all possible Web standards and their implications is a good web-master
by definition. She still may be not able produce a single functioning
page with an acceptable accessible design.

Yes, even bridges designed by engineers sometimes fall down. But I'd prefer
to take my chances with a professionally designed bridge.

Mar 7 '06 #27

Marc Gueury <mg*****@skynet.be> wrote:

It seems I need to defend my small HTML Validator extension for Firefox
name :)

Despite that the current version has only the tidy algorithm. The next
version will propose 2 algorithms to check a page. Tidy and OpenJade
(the same SGML parser than the validator.w3.org) use.

A validator always and only checks against the DTD. A choice of parsing
with Tidy or SP means that your tool continues to falsely advertise
itself as a validator.

--
Spartanicus

Mar 7 '06 #28

Marc Gueury

Spartanicus wrote:

Marc Gueury <mg*****@skynet.be> wrote:

It seems I need to defend my small HTML Validator extension for Firefox
name :)

Despite that the current version has only the tidy algorithm. The next
version will propose 2 algorithms to check a page. Tidy and OpenJade
(the same SGML parser than the validator.w3.org) use.

A validator always and only checks against the DTD. A choice of parsing
with Tidy or SP means that your tool continues to falsely advertise
itself as a validator.

To be precise, as more than a validator :)

Marc

Mar 7 '06 #29

Marc Gueury <mg*****@skynet.be> wrote:

Despite that the current version has only the tidy algorithm. The next
version will propose 2 algorithms to check a page. Tidy and OpenJade
(the same SGML parser than the validator.w3.org) use.

A validator always and only checks against the DTD. A choice of parsing
with Tidy or SP means that your tool continues to falsely advertise
itself as a validator.

To be precise, as more than a validator :)

Again (this time with emphasis): A validator always and *only* checks
against the DTD.

A tool that does more than check against the DTD is not a validator.

--
Spartanicus

Mar 7 '06 #30

Spartanicus wrote :

Marc Gueury <mg*****@skynet.be> wrote:
It seems I need to defend my small HTML Validator extension for Firefox
name :)

Despite that the current version has only the tidy algorithm. The next
version will propose 2 algorithms to check a page. Tidy and OpenJade
(the same SGML parser than the validator.w3.org) use.

A validator always and only checks against the DTD. A choice of parsing
with Tidy or SP means that your tool continues to falsely advertise
itself as a validator.

Spartanicus, it seems that everyone is against the use of "Validator".
Then the normal, reasonable followup question (in the spirit of trying
to be constructive, positive) would seem to be: how would you
name/identify Marc Gueury's extension then?

a) HTML Tidy+
b) Extended HTML Tidy
c) Advanced HTML Tidy
d) Extended HTML Validator
e) Advanced HTML Validator
f) HTML Validator +
g) what? any idea?

Personally, I think this thread about HTML Tidy versus HTML Validator
has reached its goals and natural limits. And I would hate to have this
thread annoy uselessly or pointlessly a person trying to develop an
excellent tool which can (and has already) improved the web for everyone.

Gérard
--
remove blah to email me

Mar 8 '06 #31

Gérard Talbot <ne***********@gtalbot.org> wrote:

A validator always and only checks against the DTD. A choice of parsing
with Tidy or SP means that your tool continues to falsely advertise
itself as a validator.
Spartanicus, it seems that everyone is against the use of "Validator".
Then the normal, reasonable followup question (in the spirit of trying
to be constructive, positive) would seem to be: how would you
name/identify Marc Gueury's extension then?

Anything that doesn't contain "validator" in the tool name. Since
validation is one of the available functions in the future version of
the tool this can be mentioned in the prose.
Personally, I think this thread about HTML Tidy versus HTML Validator
has reached its goals and natural limits.
This isn't nagging about mere semantics of the tool's name. There is
already a huge amount of confusion about linters, validation, spec
conformance and the respective damage done, drawbacks, limitations and
benefits. I see no reason to add to that confusion by falsely labeling a
tool.
And I would hate to have this
thread annoy uselessly or pointlessly a person trying to develop an
excellent tool which can (and has already) improved the web for everyone.

If pointing out a mistake annoys him he needs to get over himself, it's
no reason not to mention it.

--
Spartanicus

Mar 8 '06 #32

In article
<cs********************************@news.spartanic us.utvinternet.ie>,
Spartanicus <in*****@invalid.invalid> wrote:

Again (this time with emphasis): A validator always and *only* checks
against the DTD.

A tool that does more than check against the DTD is not a validator.

Depends of what definition of "validator" is used. In the context of
RELAX NG, Schematron or W3C XML Schema validation does not involve a DTD.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Mar 9 '06 #33

Henri Sivonen <hs******@iki.fi> wrote:

A tool that does more than check against the DTD is not a validator.

Depends of what definition of "validator" is used. In the context of
RELAX NG, Schematron or W3C XML Schema validation does not involve a DTD.

RELAX NG, Schematron etc. checkers are not validators, they are
conformance checkers, a step up from validators.

Sadly some of these conformance checkers are also falsely labeling
themselves as validators.

--
Spartanicus

Mar 10 '06 #34

Eric B. Bednarz

Spartanicus <in*****@invalid.invalid> writes:

Henri Sivonen <hs******@iki.fi> wrote:

Depends of what definition of "validator" is used. In the context of ^^^^^^^^^^^^^^^^^ RELAX NG, Schematron or W3C XML Schema validation does not involve a
DTD.

RELAX NG, Schematron etc. checkers are not validators

Putting schema validation in the same basket as, say, CSE and the likes
(which I suppose you mean by 'checkers') is plain silly. I cannot
connect to the ISO site at the moment for the actual official status of
either RELAX NG or Schematron, but I'm happy to quote from the RELAX NG
final draft all the same:

| ISO/IEC FDIS 19757-2:2002(E)

| 3.29
| validator
| software module that determine whether a schema is
| correct and whether an instance is valid with respect to a schema

So unlike in ISO 8879 (as far as I remember, 'validating system' is as
good as it gets), the term 'validator' is quite officially defined in
this context.

If you have objections to that, you should probably contact James Clark
(change the spec, relabel Jing, stick with SP :).
--
Good-bye, and keep cold

Mar 10 '06 #35

In article
<pk********************************@news.spartanic us.utvinternet.ie>,
Spartanicus <in*****@invalid.invalid> wrote:

Henri Sivonen <hs******@iki.fi> wrote:
A tool that does more than check against the DTD is not a validator.
Depends of what definition of "validator" is used. In the context of
RELAX NG, Schematron or W3C XML Schema validation does not involve a DTD.

RELAX NG, Schematron etc. checkers are not validators,

The ISO RELAX NG spec (well, the last draft thereof at least) disagrees
with you and defines 'validator'. W3C XML Schema and ISO Schematron talk
about 'valid' and 'validation' which imply 'validator'.
they are conformance checkers, a step up from validators.

They can act as conformance checkers if the conformance criteria you are
checking for can be expressed in schemas in the mentioned schema
languages and such a schema is given as the other input to the
validation function.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Mar 10 '06 #36

Benjamin Niemann

Spartanicus wrote:

Again (this time with emphasis): A validator always and *only* checks
against the DTD.
A validator for *HTML and other SGML applications* always checks against the
DTD.
A tool that does more than check against the DTD is not a validator.

Depends of what definition of "validator" is used. In the context of
RELAX NG, Schematron or W3C XML Schema validation does not involve a DTD.

RELAX NG, Schematron etc. checkers are not validators, they are
conformance checkers, a step up from validators.

The word 'valid' has only this special meaning (as defined in the SGML spec)
in the context of SGML. Other standards can define different semantics for
the concept 'validation'.

(The discussion could now continue about wheather XML is a SGML application
or if it forked far enough to be a document format on its own...)

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/

Mar 10 '06 #37

Henri Sivonen <hs******@iki.fi> wrote:

RELAX NG, Schematron etc. checkers are not validators,

The ISO RELAX NG spec (well, the last draft thereof at least) disagrees
with you and defines 'validator'.

Can't comment on that, I don't have access to ISO publications.

http://www.w3.org/TR/html4/sgml/intro.html#h-19.1 doesn't clarify it for
me. With the phrase "For better validation, you should check your
document against an SGML parser such as nsgmls" it suggests that all
forms of checking a document can be labeled as "validation" (the lesser
form of "validation" referred to is checking in browsers!). This would
open the door to claiming that CSE and Tidy can be labeled as a
validator or some sort after all.

But then it goes on to say "This is because an SGML parser relies solely
on the given SGML DTD which does not express all aspects of a valid HTML
4 document." which implies that document validity = spec compliance.
What results from the latter is that "validators" that check against a
DTD should not declare a document as valid or claim it has no errors.

Now the HTML4 spec is renowned for it's sloppy language and loose
"definitions", so I'll hold off drawing a conclusion pending further
discussion.

--
Spartanicus

Mar 10 '06 #38

Benjamin Niemann

Spartanicus wrote:

Henri Sivonen <hs******@iki.fi> wrote:
RELAX NG, Schematron etc. checkers are not validators,
The ISO RELAX NG spec (well, the last draft thereof at least) disagrees
with you and defines 'validator'.

Can't comment on that, I don't have access to ISO publications.

http://www.w3.org/TR/html4/sgml/intro.html#h-19.1 doesn't clarify it for
me. With the phrase "For better validation, you should check your
document against an SGML parser such as nsgmls" it suggests that all
forms of checking a document can be labeled as "validation" (the lesser
form of "validation" referred to is checking in browsers!).

They say that peoply *assume* that if the browsers can render their
documents they are valid. IMHO they make it pretty clear that this is no
correct validation.
This would
open the door to claiming that CSE and Tidy can be labeled as a
validator or some sort after all.

But then it goes on to say "This is because an SGML parser relies solely
on the given SGML DTD which does not express all aspects of a valid HTML
4 document." which implies that document validity = spec compliance.
It implies correctly document validity *!=* spec compliance.
What results from the latter is that "validators" that check against a
DTD should not declare a document as valid or claim it has no errors.
Validators can only claim that a document is valid - it has no errors that
make it invalid. A validator cannot claim that a document conforms to the
HTML spec.
Now the HTML4 spec is renowned for it's sloppy language and loose
"definitions", so I'll hold off drawing a conclusion pending further
discussion.

The HTML spec only refers to the SGML spec, which defines the details of the
term 'valid'. Section 4.1: "An HTML document is an SGML document that meets
the constraints of this specification."

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/

Mar 10 '06 #39