By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,738 Members | 1,080 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,738 IT Pros & Developers. It's quick & easy.

what kind of doctype is this?

P: n/a
I'm trying to validate my code and I can't figure out what kind of doctype
I have. The validator can't tell me anything because it can't move beyond
the doctype declaration.

http://www.wavian.com/clients/pugwash/

Is there anyway to tell what kind of doctype this is? I tried inserting a
few different types (please excuse me if this is the stupid way to do it,
I am learning...) but am unsuccessful.

Please, no flames at me for wrongdoing; I am trying to learn and make
things right.

Holly
Jul 20 '05 #1
Share this Question
Share on Google+
39 Replies


P: n/a
Holly wrote:
I'm trying to validate my code and I can't figure out what kind of
doctype I have. http://www.wavian.com/clients/pugwash/
You don't have any. You need to decide what dtd (document type
declaration) you want to use, that is, what dtd you want to write your
pages against. I suppose you could think of it as choosing a language in
which to write an essay. We don't know which language you want. You
choose. Here's a list of commonly used doc types.

http://www.w3.org/QA/2002/04/valid-dtd-list.html

I'd recommend html 4.01/strict. But note that this won't mean anything
until you learn what html 4 is, or perhaps more broadly, what html is.
It is not very useful as a desktop publishing program, that is, as a
screen layout program. It is useful as a document description program.
It sounds like what you need is a primer. Perhaps this will help?

http://tranchant.plus.com/web/html-start
Is there anyway to tell what kind of doctype this is? I tried
inserting a few different types (please excuse me if this is the
stupid way to do it, I am learning...) but am unsuccessful.
There is a bit of a learning curve to this. I hope you don't get too
frustrated and give up. In the end, I've found it's worth it to learn,
but YYMV (YYMV = your mileage may vary).
Please, no flames at me for wrongdoing; I am trying to learn and make
things right.


Understood. :-)

--
Brian (remove "invalid" from my address to email me)
http://www.tsmchughs.com/
Jul 20 '05 #2

P: n/a
Holly <me@here.edu> wrote:
I'm trying to validate my code and I can't figure out what kind of doctype
I have. The validator can't tell me anything because it can't move beyond
the doctype declaration.

http://www.wavian.com/clients/pugwash/

Is there anyway to tell what kind of doctype this is? I tried inserting a
few different types (please excuse me if this is the stupid way to do it,
I am learning...) but am unsuccessful.


That page isn't valid to any standard HTML doctype. The closest would
be HTML 4.01 Transitional but you still have 45 errors, mostly missing
alt attributes.

New pages should aim to be written to HTML 4.01 Strict, or if XHTML
1.0 Strict if you have some good reason to use XHTML.

Steve
--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net> <http://steve.pugh.net/>
Jul 20 '05 #3

P: n/a
Holly wrote:
I'm trying to validate my code and I can't figure out what kind of doctype
I have. The validator can't tell me anything because it can't move beyond
the doctype declaration.

http://www.wavian.com/clients/pugwash/


The Doctype is HTML 3.2; the document doesn't conform to any known standard.

HTML 4.01 Strict is best suited for most new documents, I suggest you use
that and then edit your document until it conforms.

http://www.w3.org/QA/2002/04/valid-dtd-list.html

--
David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
Jul 20 '05 #4

P: n/a
Holly schrieb:
I'm trying to validate my code and I can't figure out what kind of doctype
I have. The validator can't tell me anything because it can't move beyond
the doctype declaration.

http://www.wavian.com/clients/pugwash/

Is there anyway to tell what kind of doctype this is? I tried inserting a
few different types (please excuse me if this is the stupid way to do it,
I am learning...) but am unsuccessful.

Please, no flames at me for wrongdoing; I am trying to learn and make
things right.


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

???

But it's a good site, anyhow!!!

Werner

--
-----------------------------------------------------------
Werner Partner * Tel +49 2366 886606 * Fax: 886608
mailto:ka****@sonoptikon.de * http://www.sonoptikon.de
hören Sie Klassik: http://www.drmk.ch/
Jul 20 '05 #5

P: n/a
In article Holly wrote:
I'm trying to validate my code and I can't figure out what kind of doctype
I have.
You claim it to be HTML3.2. I think your tagsoup is closest to html4.01
transitional.
The validator can't tell me anything because it can't move beyond
the doctype declaration.


Which validator?
http://www.htmlhelp.com/cgi-
bin/validate.cgi?url=http://www.wavian.com/clients/pugwash/&warnings=yes

http://validator.w3.org/check?uri=ht...ients/pugwash/

Both validate it OK.

--
Lauri Raittila <http://www.iki.fi/lr> <http://www.iki.fi/zwak/fonts>
Saapi lähettää meiliä, jos aihe ei liity ryhmään, tai on yksityinen
tjsp., mutta älä lähetä samaa viestiä meilitse ja ryhmään.

Jul 20 '05 #6

P: n/a
Werner Partner scribbled something along the lines of:
Holly schrieb:
I'm trying to validate my code and I can't figure out what kind of
doctype
I have. The validator can't tell me anything because it can't move beyond
the doctype declaration.

http://www.wavian.com/clients/pugwash/

Is there anyway to tell what kind of doctype this is? I tried inserting a
few different types (please excuse me if this is the stupid way to do it,
I am learning...) but am unsuccessful.

Please, no flames at me for wrongdoing; I am trying to learn and make
things right.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

???

But it's a good site, anyhow!!!


IIRC the correct Doctype for HTML 3.2 is

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"
"http://www.w3.org/MarkUp/Wilbur/HTML32.dtd">

or if you use lowercase elements (or at least write the root in lowercase):

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"
"http://www.w3.org/MarkUp/Wilbur/HTML32.dtd">

I'm not sure whether the requirement for a SYSTEM identifier came with
3.2 or 4.0 tho.
--
Alan Plum, WAD/WD, Mushroom Cloud Productions
http://www.mushroom-cloud.com/
Jul 20 '05 #7

P: n/a
Ashmodai scribbled something along the lines of:
Werner Partner scribbled something along the lines of:
Holly schrieb:
I'm trying to validate my code and I can't figure out what kind of
doctype
I have. The validator can't tell me anything because it can't move
beyond
the doctype declaration.

http://www.wavian.com/clients/pugwash/

Is there anyway to tell what kind of doctype this is? I tried
inserting a
few different types (please excuse me if this is the stupid way to do
it,
I am learning...) but am unsuccessful.

Please, no flames at me for wrongdoing; I am trying to learn and make
things right.


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

???

But it's a good site, anyhow!!!

IIRC the correct Doctype for HTML 3.2 is

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"
"http://www.w3.org/MarkUp/Wilbur/HTML32.dtd">

or if you use lowercase elements (or at least write the root in lowercase):

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"
"http://www.w3.org/MarkUp/Wilbur/HTML32.dtd">

I'm not sure whether the requirement for a SYSTEM identifier came with
3.2 or 4.0 tho.


NM. My bad. It came with 4.01 or so:

http://www.w3.org/QA/2002/04/valid-dtd-list.html

--
Alan Plum, WAD/WD, Mushroom Cloud Productions
http://www.mushroom-cloud.com/
Jul 20 '05 #8

P: n/a
Ashmodai wrote:
Werner Partner scribbled something along the lines of:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
[...] IIRC the correct Doctype for HTML 3.2 is

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"
"http://www.w3.org/MarkUp/Wilbur/HTML32.dtd">


No, Werner is right. There is indeed no system identifier
specified/required for HTML 3.2:

<http://www.w3.org/TR/REC-html32.html#html>

However, the system identifier (URL) you stated is correct (it indeed
specifies a HTML 3.2 DTD resource). But it will not trigger "Full
Standards Mode" ("Standards Compliance Mode") in UAs with DOCTYPE switch.
So AFAIS you will not want/need the system identifier here for any good
reason:

<http://gutfeldt.ch/matthias/articles/doctypeswitch.html>
or if you use lowercase elements (or at least write the root in lowercase):

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"
"http://www.w3.org/MarkUp/Wilbur/HTML32.dtd">

I'm not sure whether the requirement for a SYSTEM identifier came with
3.2 or 4.0 tho.


In HTML, there is no requirement of system identifier whatsoever, instead
omitting it and using only the public identifier in the DOCTYPE declaration
is a standards compliant way to trigger Quirks/Compatibility Mode in user
agents that support the "DOCTYPE switch" (see above). W3C-validate a HTML
document without system identifier to see this confirmed.

Of course it is better to declare a HTML 4.01 than a HTML 3.2 DTD these days.
HTH

PointedEars
Jul 20 '05 #9

P: n/a
Holly wrote:
I'm trying to validate my code and I can't figure out what kind of doctype
I have. The validator can't tell me anything because it can't move beyond
the doctype declaration.


Use HTML Tidy. It will not only try to detect the DOCTYPE that matches
your markup best, but also also tidy up your markup based on that DOCTYPE
by criteria you specify if you want that.

<http://www.w3.org/People/Raggett/tidy/>
HTH

PointedEars
Jul 20 '05 #10

P: n/a
Holly wrote:
I'm trying to validate my code and I can't figure out what kind of doctype
I have. The validator can't tell me anything because it can't move beyond
the doctype declaration.


Use HTML Tidy. It will not only try to detect the DOCTYPE that matches
your markup best, but also tidy up your markup based on that DOCTYPE by
criteria you specify if you want that.

<http://www.w3.org/People/Raggett/tidy/>
HTH

PointedEars
Jul 20 '05 #11

P: n/a
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
In HTML, there is no requirement of system identifier whatsoever


In HTML 4 it is, see <http://www.w3.org/TR/html4/struct/global.html#h-
7.2>

--
David Håsäther
Jul 20 '05 #12

P: n/a
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
In HTML, there is no requirement of system identifier whatsoever,
As David Håsäther pointed out, there is. Such arbitrary requirements have
contributed to widespread misunderstandings.
instead omitting it and using only the public identifier in the
DOCTYPE declaration is a standards compliant way to trigger
Quirks/Compatibility Mode in user agents that support the "DOCTYPE
switch" (see above).


Such trickery has nothing to do with any standards. It would be really
Müncchausenian to have a standards compliant way to make browsers violate
standards.

Doctype swítching is absurdity. Learn how it works, for practical
authoring, but don't try to find a rational explanation for it, still
less a justification based on standards.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #13

P: n/a
Jukka K. Korpela wrote:
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
In HTML, there is no requirement of system identifier whatsoever,


As David Håsäther pointed out, there is. [...]


Then why are HTML (4.01) documents without system identifier
Valid HTML (4.01) according to <http://validator.w3.org/>?
PointedEars
Jul 20 '05 #14

P: n/a
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
In HTML, there is no requirement of system identifier whatsoever,


As David Håsäther pointed out, there is. [...]


Then why are HTML (4.01) documents without system identifier
Valid HTML (4.01) according to <http://validator.w3.org/>?


Because they are valid. It would in fact be impossible to impose the
requirement on the presence of a system identifier as a validity
constraint, since the DOCTYPE declaration is a method of specifying the
DTD, according to which validity will be judged.

The SGML terminology is odd. Being valid means nothing but conformance to
a formalized syntax specification. Requirements that are not expressed in
that syntax specification, the DTD, cannot affect the validity issue. In
particular, requirements on _how_ a DTD is to be specified do not affect
validity.

And XML makes things eXtra odd. The term "valid" is retained, and a new
term, "well-formed", is introduced in a yet another technical meaning,
which refers - very unintuitively - to compliance with low-level syntax
rules that are independent of DTDs.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #15

P: n/a
In article <Xn***************************@195.67.237.53>,
"David Håsäther" <ha******@msn.com> writes:
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
In HTML, there is no requirement of system identifier whatsoever

Correct.
In HTML 4 it is, see <http://www.w3.org/TR/html4/struct/global.html#h-
7.2>


Nope. In HTML 4 you must use an FPI specifying one of three DTDs.
But it says nothing about what from the FPI should take, beyond a
reference to ISO8879 (SGML) from which the rules are derived.

--
Nick Kew

Nick's manifesto: http://www.htmlhelp.com/~nick/
Jul 20 '05 #16

P: n/a
In article <40**************@pointedears.de>,
Thomas 'PointedEars' Lahn <Po*********@web.de> writes:
Use HTML Tidy. It will not only try to detect the DOCTYPE that matches
your markup best, but also tidy up your markup based on that DOCTYPE by
criteria you specify if you want that.
Has it been fixed now to get rid of the annoying bug of calling everything
"transitional"? You might just as well declare it "transitional" yourself
as have Tidy tell you (rightly or wrongly) that's what it is.

Note that "transitional" is for legacy documents. That is to say,
documents that were considered legacy in 1998. It has no place in
new documents this century.

Of course, if a document is tag-soup and bears little resemblance
to HTML of any kind, then tidy can help fix it. That's what tidy
is good for.
<http://www.w3.org/People/Raggett/tidy/>


That's very old, and definitely has the bug. tidy.sf.net may be
more up-to-date.

--
Nick Kew

Nick's manifesto: http://www.htmlhelp.com/~nick/
Jul 20 '05 #17

P: n/a
ni**@hugin.webthing.com (Nick Kew) wrote:
In HTML, there is no requirement of system identifier whatsoever


Correct.
In HTML 4 it is, see <http://www.w3.org/TR/html4/struct/global.html#h-
7.2>


Nope. In HTML 4 you must use an FPI specifying one of three DTDs.
But it says nothing about what from the FPI should take, beyond a
reference to ISO8879 (SGML) from which the rules are derived.


Pardon? It clearly says: "authors must include one of the following
document type declarations in their documents" and then lists down three
<!DOCTYPE> declarations.

It fixes the allowed document type declarations, not just the document
type definitions (DTDs). This is odd, but it matches the oddness of
browser behavior in <!DOCTYPE> sniffing.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #18

P: n/a
Nick Kew wrote:
Tomas 'PointedEars' Lahn <Po*********@web.de> writes:
Use HTML Tidy. It will not only try to detect the DOCTYPE that matches
your markup best, but also tidy up your markup based on that DOCTYPE by
criteria you specify if you want that.
Has it been fixed now to get rid of the annoying bug of calling everything
"transitional"?


If there are no elements/attributes of HTML 4 in it, Tidy calls it HTML 3.2.
If I includes such an element/attribute, it calls it HTML 4.01 Transitional.

$ tidy -v
HTML Tidy for Cygwin released on 1st September 2003

(This is the latest release for Cygwin, updated regularly.)
So, apparently the answer is No. But ...
You might just as well declare it "transitional" yourself
as have Tidy tell you (rightly or wrongly) that's what it is.
.... I disagree that declaring HTML 4.01 Transitional is a bug,
not even with documents that would validate as HTML 4.01 Strict.
Note that "transitional" is for legacy documents. That is to say,
documents that were considered legacy in 1998.
XHTML 1.0 Transitional exists.

| XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition)
| A Reformulation of HTML 4 in XML 1.0
| W3C Recommendation 26 January 2000, revised 1 August 2002
It has no place in new documents this century.
Disagreed. There are good reasons why one still want to declare
Transitional. For example, there may be good reason why one want
to use frames or open new windows. Or a user agent needs to be
supported that does not know/care about CSS. You cannot do that
with plain HTML 4.01 Strict (without client-side scripting) since,
e.g., the "target" and "border" attributes are deprecated and are
therefore not defined by the Strict DTD(s).
Of course, if a document is tag-soup and bears little resemblance
to HTML of any kind, then tidy can help fix it. That's what tidy
is good for.


And such documents are better declared HTML 4.01 Transitional than
anything else. Tagsoup browsers will handle it and if the author
choose the better way of validating it, he/she will not be crushed
(and thus discouraged) by a load of error messages from the W3C
Validator that are not necessary.
<http://www.w3.org/People/Raggett/tidy/>


That's very old, and definitely has the bug. tidy.sf.net may be
more up-to-date.


tidy.sf.net *is* more up-to-date when in comes to file downloads. I used
the above URI since it provides a decent documentation what HTML Tidy can
do. It is linked on the SourceForge site as well as the SourceForge site
is described and linked on top of that documentation.
PointedEars
Jul 20 '05 #19

P: n/a
Jukka K. Korpela wrote:
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
In HTML, there is no requirement of system identifier whatsoever,
As David Håsäther pointed out, there is. [...]
Then why are HTML (4.01) documents without system identifier
Valid HTML (4.01) according to <http://validator.w3.org/>?


Because they are valid. It would in fact be impossible to impose the
requirement on the presence of a system identifier as a validity
constraint, since the DOCTYPE declaration is a method of specifying the
DTD, according to which validity will be judged.

The SGML terminology is odd. Being valid means nothing but conformance to
a formalized syntax specification. Requirements that are not expressed in
that syntax specification, the DTD, cannot affect the validity issue. In
particular, requirements on _how_ a DTD is to be specified do not affect
validity.


IBTD. The W3C Validator uses a SGML parser that validates against a DTD,
either the DTD you specified in the document or a selected DTD. It simply
cannot do this if there is no resource that provides the DTD. It can,
however, validate a document as HTML 4.01 Strict/Transitional/Frameset if
the system identifier is missing. And if you specify the DTD that should
be validated against to override the DOCTYPE you may have declared in your
document, it only includes the public identifier and the SGML parser still
does not fail. So it is to be asked: Where does this information come from?

I cannot see anything in the specification that says that the system
identifier is required for (valid) HTML (4) documents. Instead, means
to support user agents that uses public identifiers in preference to
system identifiers are specified:

| The binding between public identifiers and files can be specified using a
| catalog file following the format recommended by the Oasis Open Consortium
| (see [OASISOPEN]). A sample catalog file for HTML 4.01[1] is included at
| the beginning of the section on SGML reference information for HTML. The
| last two letters of the declaration indicate the language of the DTD. For
| HTML, this is always English ("EN").
|
| [1] http://www.w3.org/TR/html4/sgml/intro.html#catalog

It is highly likely that the W3C validator makes use of that catalog
feature, and Mozilla/5.0 obviously makes use of that at least for XHTML
1.1 and MathML (search for .dtd files in the Mozilla program directory)

This clearly indicates that the system identifier is *not* required
(as a catalog file can provide the required information) for HTML (4;
as it was not in HTML 3.2 as I already showed), and in this context
Nick's statement makes perfect sense (to me).
And XML makes things eXtra odd. The term "valid" is retained, and a new
term, "well-formed", is introduced in a yet another technical meaning,
which refers - very unintuitively - to compliance with low-level syntax
rules that are independent of DTDs.


Firstly, you are wrong that "well-formedness" is independent of the DTD.
Invalid markup (i.e. markup that did not validate against a DTD) does not
fulfill the requirement of well-formedness, too.

Secondly, of course DTDs are not the only means of specifying a markup
language. There is a SGML definition of HTML and there is a SGML
definition of XML (the "low-level syntax rules").[1] The term "well-formed"
refers to compliance to both this definition and the specified DTD (e.g.
that elements must be closed explicitely is defined with OMITTAG NO in
the MINIMIZE feature section of the SGML definition of XML), and I cannot
find anything unintuitive about it.

[1] <http://www.w3.org/TR/NOTE-sgml-xml-971215> (looks best with IE since
that XML document lacks a style information associated with it)
PointedEars
Jul 20 '05 #20

P: n/a
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
Jukka K. Korpela wrote:
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
> In HTML, there is no requirement of system identifier
> whatsoever,
As David Håsäther pointed out, there is. [...]

Then why are HTML (4.01) documents without system identifier
Valid HTML (4.01) according to <http://validator.w3.org/>?
Because they are valid. It would in fact be impossible to impose
the requirement on the presence of a system identifier as a
validity constraint, since the DOCTYPE declaration is a method of
specifying the DTD, according to which validity will be judged.

The SGML terminology is odd. Being valid means nothing but
conformance to a formalized syntax specification. Requirements
that are not expressed in that syntax specification, the DTD,
cannot affect the validity issue. In particular, requirements on
_how_ a DTD is to be specified do not affect validity.


IBTD. The W3C Validator uses a SGML parser that validates against
a DTD, either the DTD you specified in the document or a selected
DTD. It simply cannot do this if there is no resource that
provides the DTD.


Right.
It can, however, validate a document as HTML
4.01 Strict/Transitional/Frameset if the system identifier is
missing. And if you specify the DTD that should be validated
against to override the DOCTYPE you may have declared in your
document, it only includes the public identifier and the SGML
parser still does not fail. So it is to be asked: Where does this
information come from?
It uses the FPI to look up the system-id in a catalog.
I cannot see anything in the specification that says that the
system identifier is required for (valid) HTML (4) documents.
The specification lists three DOCTYPE declarations, all with a system
identifier, and says "authors must include one of the following
document type declarations in their documents."
See: http://www.w3.org/TR/html4/struct/global.html#h-7.2
| The binding between public identifiers and files can be specified
| using a catalog file following the format recommended by the
| Oasis Open Consortium (see [OASISOPEN]). A sample catalog file
| for HTML 4.01[1] is included at the beginning of the section on
| SGML reference information for HTML. The last two letters of the
| declaration indicate the language of the DTD. For HTML, this is
| always English ("EN").
|
| [1] http://www.w3.org/TR/html4/sgml/intro.html#catalog

It is highly likely that the W3C validator makes use of that
catalog feature
I think that's what it does when you don't specify a system-id.
And XML makes things eXtra odd. The term "valid" is retained, and
a new term, "well-formed", is introduced in a yet another
technical meaning, which refers - very unintuitively - to
compliance with low-level syntax rules that are independent of
DTDs.


Firstly, you are wrong that "well-formedness" is independent of
the DTD. Invalid markup (i.e. markup that did not validate against
a DTD) does not fulfill the requirement of well-formedness, too.


Not necessarily.
Secondly, of course DTDs are not the only means of specifying a
markup language. There is a SGML definition of HTML and there is
a SGML definition of XML (the "low-level syntax rules").
SGML _declaration_.
<http://www.w3.org/TR/NOTE-sgml-xml-971215> (looks best with
IE since that XML document lacks a style information associated with
it)


Try http://www.w3.org/TR/NOTE-sgml-xml-971215.html :-)

--
David Håsäther
Jul 20 '05 #21

P: n/a
David Håsäther wrote:
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
IBTD. The W3C Validator uses a SGML parser that validates against
a DTD, either the DTD you specified in the document or a selected
DTD. It simply cannot do this if there is no resource that
provides the DTD.


Right.
It can, however, validate a document as HTML
4.01 Strict/Transitional/Frameset if the system identifier is
missing. And if you specify the DTD that should be validated
against to override the DOCTYPE you may have declared in your
document, it only includes the public identifier and the SGML
parser still does not fail. So it is to be asked: Where does this
information come from?


It uses the FPI to look up the system-id in a catalog.


Exactly. So how can something that can be obtained in other ways
be *required*? It simply cannot.
I cannot see anything in the specification that says that the
system identifier is required for (valid) HTML (4) documents.


The specification lists three DOCTYPE declarations, all with a system
identifier, and says "authors must include one of the following
document type declarations in their documents."
See: http://www.w3.org/TR/html4/struct/global.html#h-7.2


I think the specification is misleading/incomplete here, since HTML would
work without the explicit system ID (and indeed does) as pointed out *ibid*.
| [1] http://www.w3.org/TR/html4/sgml/intro.html#catalog

It is highly likely that the W3C validator makes use of that
catalog feature


I think that's what it does when you don't specify a system-id.


And if you select the DTD to validate against. See?
And XML makes things eXtra odd. The term "valid" is retained, and
a new term, "well-formed", is introduced in a yet another
technical meaning, which refers - very unintuitively - to
compliance with low-level syntax rules that are independent of
DTDs.


Firstly, you are wrong that "well-formedness" is independent of
the DTD. Invalid markup (i.e. markup that did not validate against
a DTD) does not fulfill the requirement of well-formedness, too.


Not necessarily.


-v please
Secondly, of course DTDs are not the only means of specifying a
markup language. There is a SGML definition of HTML and there is
a SGML definition of XML (the "low-level syntax rules").


SGML _declaration_.


Of course, my bad.
<http://www.w3.org/TR/NOTE-sgml-xml-971215> (looks best with
IE since that XML document lacks a style information associated with
it)


Try http://www.w3.org/TR/NOTE-sgml-xml-971215.html :-)


Ah, thanks a lot. Yet it's a pity that the XML version of such an important
document is easily readable in non-conforming UAs and not in conforming UAs
(by default everything in XML is an inline element, that's why mozilla.org
decided to show the document tree for such documents.)
PointedEars
Jul 20 '05 #22

P: n/a
* Nick Kew wrote in comp.infosystems.www.authoring.html:
Use HTML Tidy. It will not only try to detect the DOCTYPE that matches
your markup best, but also tidy up your markup based on that DOCTYPE by
criteria you specify if you want that.


Has it been fixed now to get rid of the annoying bug of calling everything
"transitional"?


At least a year ago. And "everything" is a bit of a stretch. It is still
not perfect though.
<http://www.w3.org/People/Raggett/tidy/>


That's very old, and definitely has the bug.


There are neither binaries nor source code, it rather refers to sf.net.
Jul 20 '05 #23

P: n/a
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
So how can something that can be obtained in other ways
be *required*? It simply cannot.
They just did. The definition of a markup system can impose any rules
that its designers want. They can surely require conformance to one of
three DTDs _and_ require that one of three specific <!DOCTYPE>s be useed
_and_ that the first character on line 42 must be an "A". If they so
wish.
I think the specification is misleading/incomplete here,
It is strange, but not incomplete.
since HTML
would work without the explicit system ID (and indeed does)
HTML would work, and actually works, without any doctypes or SGML or XML.
Whether a genuine SGML application works without system IDs would be a
different matter, and not relevant here.
as pointed out *ibid*.


So what? Besides, remember that NOTEs are non-normative. The normative
prose requires that one of three <!DOCTYPE> alternatives be used. And we
know that doctype sniffers take that seriously. It makes no sense in SGML
terms, but neither does doctype sniffing.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #24

P: n/a
* Thomas Lahn wrote in comp.infosystems.www.authoring.html:
If there are no elements/attributes of HTML 4 in it, Tidy calls it HTML 3.2.
If I includes such an element/attribute, it calls it HTML 4.01 Transitional.


Are you sure you tested this properly?
Jul 20 '05 #25

P: n/a
(Jukka answered the rest)

Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
Firstly, you are wrong that "well-formedness" is independent of
the DTD. Invalid markup (i.e. markup that did not validate against
a DTD) does not fulfill the requirement of well-formedness, too.


Not necessarily.


-v please


(Actually, I don't know what "-v" means, but I can guess from the
context)

The following, for example, is invalid but well-formed:

<!DOCTYPE x [
<!ELEMENT x (#PCDATA)>
]>
<y>z</y>

--
David Håsäther
Jul 20 '05 #26

P: n/a
Bjoern Hoehrmann wrote:
* Thomas Lahn wrote in comp.infosystems.www.authoring.html:
If there are no elements/attributes of HTML 4 in it, Tidy calls it HTML 3.2.
If I includes such an element/attribute, it calls it HTML 4.01 Transitional.


Are you sure you tested this properly?


Not any longer. Sorry, my bad. I had a table[align="center"] in my test.
Grmbl.

#v+

[2004-04-26 00:55:27] pointedears:~/bin
$ echo '<html>
<head>
<title></title>
</head>
<body><table><col width="100"><tr><td>foo</td></tr></table>
</body>
</html>' | tidy
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 5 column 7 - Warning: <table> lacks "summary" attribute
Info: Document content looks like HTML 4.01 Strict
2 warnings, 0 errors were found!

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
[...]

#v-

(Note that not even HTML Tidy which is created by someone working for
the W3C includes the system identifier in tidied documents ...)
PointedEars
Jul 20 '05 #27

P: n/a
On Fri, 23 Apr 2004, Brian wrote:
Holly wrote:
I'm trying to validate my code and I can't figure out what kind of
doctype I have.

http://www.wavian.com/clients/pugwash/


You don't have any. You need to decide what dtd (document type
declaration) you want to use, that is, what dtd you want to write your
pages against. I suppose you could think of it as choosing a language in
which to write an essay. We don't know which language you want. You
choose. Here's a list of commonly used doc types.


This sort of thing is always a good laugh. The fact that the "document type
declaration" is syntactically a COMMENT, i.e. it starts "<!", effectively means
that it should be ignored! :-) Some standard!
Jul 20 '05 #28

P: n/a
Brian wrote:
Holly wrote:
I'm trying to validate my code and I can't figure out what kind of
doctype I have.

http://www.wavian.com/clients/pugwash/


You don't have any.


Obviously, I was wrong; I swear I was looking at the right page, but
apparently was not. Sorry.

--
Brian (remove "invalid" from my address to email me)
http://www.tsmchughs.com/
Jul 20 '05 #29

P: n/a
D. Stussy wrote:
This sort of thing is always a good laugh. The fact that the "document
type declaration" is syntactically a COMMENT, i.e. it starts "<!",


Don't do much SGML, do you?

An SGML declaration begins with "<!" and ends with ">". Inside of that,
comments are the stuff between matching sets of '--' delimiters.

sherm--

--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
Jul 20 '05 #30

P: n/a
On Mon, 26 Apr 2004, Sherm Pendley wrote:
D. Stussy wrote:
This sort of thing is always a good laugh. The fact that the "document
type declaration" is syntactically a COMMENT, i.e. it starts "<!",
Don't do much SGML, do you?


No, I don't.
An SGML declaration begins with "<!" and ends with ">". Inside of that,
comments are the stuff between matching sets of '--' delimiters.


Then I stand corrected. However, it in itself still isn't a pure HTML
construct, which is what one SHOULD have if it were part of the specification of
the language. Instead, it appears to be an escape to a different language.
Jul 20 '05 #31

P: n/a
* Thomas Lahn wrote in comp.infosystems.www.authoring.html:
(Note that not even HTML Tidy which is created by someone working for
the W3C includes the system identifier in tidied documents ...)


Oh, it does when it thinks it makes sense. For example if the input
document had a system identifier or if the output is XHTML. Always
adding one is asking for trouble with doctype-switch disabled browsers.
Jul 20 '05 #32

P: n/a
Wow, I am overwhelmed by all the advice on how to validate sites and
declare document types. I appreciate, most of all, the kind way people
tried to help in their posts... I have some serious reading to do! I just
bookmarked a bunch of links and will be reading through these.

Holly

Jul 20 '05 #33

P: n/a
Bjoern Hoehrmann wrote:
* Thomas Lahn wrote in comp.infosystems.www.authoring.html:
(Note that not even HTML Tidy which is created by someone working for
the W3C includes the system identifier in tidied documents ...)
Oh, it does when it thinks it makes sense. For example if the input
document had a system identifier or if the output is XHTML.


ACK
Always adding one is asking for trouble with doctype-switch disabled
browsers.


You must be kidding.
PointedEars
Jul 20 '05 #34

P: n/a
D. Stussy wrote:
On Mon, 26 Apr 2004, Sherm Pendley wrote:
D. Stussy wrote:
This sort of thing is always a good laugh. The fact that the
"document type declaration" is syntactically a COMMENT, i.e. it
starts "<!",
Don't do much SGML, do you?


No, I don't.
An SGML declaration begins with "<!" and ends with ">". Inside of that,
comments are the stuff between matching sets of '--' delimiters.


Then I stand corrected. However, it in itself still isn't a pure HTML
construct,


As pointed out before, even "<!-- ... -->" is not a "pure HTML construct".
You won't debate that SGML comments (/-- (?!--|--(?!--)*--)* --/ [CMIIW]
within /<! [^>]* >/) are extremely useful in HTML, will you?
which is what one SHOULD have if it were part of the specification of
the language.
Don't do much HTML either, do you?

<http://www.w3.org/TR/html4/struct/global.html#h-7.2>
Instead, it appears to be an escape to a different language.


HTML is a markup language that is defined by SGML markup.

,-<http://www.w3.org/TR/html4/>
|
| [...]
| HTML 4 is an SGML application conforming to International Standard
| ISO 8879 -- Standard Generalized Markup Language [ISO8879].

There are both an SGML declaration of HTML:

<http://www.w3.org/TR/html4/sgml/sgmldecl.html>

and HTML DTDs that are written in SGML:

<http://www.w3.org/TR/html4/sgml/dtd.html>
<http://www.w3.org/TR/html4/sgml/loosedtd.html>
<http://www.w3.org/TR/html4/sgml/framesetdtd.html>
HTH

PointedEars
Jul 20 '05 #35

P: n/a
Jukka K. Korpela wrote:
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
So how can something that can be obtained in other ways
be *required*? It simply cannot.


They just did. The definition of a markup system can impose any rules
that its designers want. They can surely require conformance to one of
three DTDs _and_ require that one of three specific <!DOCTYPE>s be useed
_and_ that the first character on line 42 must be an "A". If they so
wish.


So *their* validator violates *their* specification.
Thanks for clarification ...
PointedEars
Jul 20 '05 #36

P: n/a
David Håsäther wrote:
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
Firstly, you are wrong that "well-formedness" is independent of
the DTD. Invalid markup (i.e. markup that did not validate against
a DTD) does not fulfill the requirement of well-formedness, too.
Not necessarily.

-v please


(Actually, I don't know what "-v" means, [...]


Verbose. See for example
<http://www.trash.net/~reeler/iacd/doc/admin/node20.html> :)

Thanks for the example.
PointedEars
Jul 20 '05 #37

P: n/a
On Tue, 4 May 2004, Thomas 'PointedEars' Lahn wrote:
Jukka K. Korpela wrote:
They just did. The definition of a markup system can impose any rules
that its designers want.
Unless they attempt to impose rules which are mutually contradictory.
E.g to define HTML as an application of SGML, and then to make rules
which contradict the rules of SGML. (Let's not go looking for
specific examples - I'm just making a general remark there.)
They can surely require conformance to one of
three DTDs _and_ require that one of three specific <!DOCTYPE>s be useed
_and_ that the first character on line 42 must be an "A". If they so
wish.


I guess so.
So *their* validator violates *their* specification.
Does it? Remember, the term "validation" has a rather precise
technical meaning in an SGML context. It certainly does *not*
guarantee that it will point up all the issues on which a document
does not conform to an HTML specification.
Thanks for clarification ...


Sorry, my irony detector is in for maintenance at the mo.

Jul 20 '05 #38

P: n/a
Alan J. Flavell wrote:
On Tue, 4 May 2004, Thomas 'PointedEars' Lahn wrote:
So [the W3C's] validator violates [the W3C's] specification.
Does it?


Unfortunately, it does, if we assume that the system identifier is required.

If the DOCTYPE declaration of the resource to be validated misses the system
identifier and there are no further errors, the document is called Valid
HTML (4.01). If we assume that section 19 of the HTML 4.01 Specification is
only informational (for which I fail to see proof), this is clearly a bug.
Remember, the term "validation" has a rather precise
technical meaning in an SGML context. It certainly does *not*
guarantee that it will point up all the issues on which a document
does not conform to an HTML specification.


If the DOCTYPE declaration is missing or uses an invalid public identifier
in the resource to be validated, then validation is impossible. So far, so
good. But if the user then chooses to select a DOCTYPE in the Validator's
form and revalidates the resource, the resulting DOCTYPE declaration does
not contain a system identifier (and the Validator calls the resource
"Tentatively Valid"). If we assume that the system identifier is required,
this is clearly a bug, too.

If we instead assume that the system identifier is _not_ required as section
19 of the HTML 4.01 Specification is not informational but instead either
contradictory or supplemental to subsection 7.2, neither of these two
behaviors can be considered a bug. Since I fail to see proof for the thesis
that section 19 is only informational, until further notice I will adhere to
the thesis that section 19 is supplemental to subsection 7.2 and thus a
*HTML 4.01* DOCTYPE declaration without system identifier is correct.
PointedEars
Jul 20 '05 #39

P: n/a
Thomas 'PointedEars' Lahn <Po*********@web.de> wrote:
Alan J. Flavell wrote:
On Tue, 4 May 2004, Thomas 'PointedEars' Lahn wrote:
So [the W3C's] validator violates [the W3C's] specification.
Does it?


Unfortunately, it does, if we assume that the system identifier is
required.


Our assumptions do not change the SGML rules. The HTML specification may
require a system identifier, but this additional requirement does not
affect _validity_.
If the DOCTYPE declaration of the resource to be validated misses the
system identifier and there are no further errors, the document is
called Valid HTML (4.01).
That's misleading, but the document is valid. Whether it is HTML (4.01)
depends on interpretation of the HTML spec.
until further notice I will adhere to the thesis that section 19 is
supplemental to subsection 7.2 and thus a *HTML 4.01* DOCTYPE
declaration without system identifier is correct.


Section 19 is confused and confusing, but the situation is pretty simple:
a document that otherwise conforms to the HTML specification but does not
have a system identifier in the DOCTYPE declaration is valid (a valid
SGML document) but it is not a conforming HTML document, so it is
incorrect to call it "valid HTML" or "valid HTML 4.01".

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #40

This discussion thread is closed

Replies have been disabled for this discussion.