Root element specified by DTD ?

Andy Dingley

What specifies the permitted root element(s) for a document ? HTML,
SGML, XHTML or XML ?
Valid HTML documents need to have a well-known DTD and a doctypedecl in
each document like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

The document's root element is "HTML", and is specified by the
doctypedecl. For HTML and XHTML it's possible that the prose of their
recommendation restricts it too.
My question is, is there any way to author a non-HTML DTD (SGML or XML)
so as to restrict valid documents to only allow a certain subset of
their elements to be used as the root element? Can this restriction be
expressed _entirely_ within a DTD? Is this used within the HTML DTDs ?
(i.e. not just in the doctypedecl)

Is this fragment a valid HTML document ? If not, why isn't it? Just
which part of its definition is forbidding this fragmentary use?
<!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<div>
<p>Foo</p>
</div>
Good tutorial refs on DTDs are also welcome. I don't know anything like
enough on DTD innards.

Thanks

Jun 2 '06 #1

Subscribe Post Reply

2652

Lachlan Hunt

Andy Dingley <di*****@codesmiths.com> wrote:

What specifies the permitted root element(s) for a document ? HTML,
SGML, XHTML or XML ?
Any element may be the root element. There is nothing in the DTD that
says which elements may or may not be the root element. The element
used as the root element is specified by the DOCTYPE, just like in the
example you gave.
Is this fragment a valid HTML document ?...
<!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<div>
<p>Foo</p>
</div>

Yes, it's valid. The validator would have told you that.

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://GetThunderbird.com/ Reclaim your Inbox

Jun 2 '06 #2

Chris Morris

Lachlan Hunt <sp***********@gmail.com> writes:

Andy Dingley <di*****@codesmiths.com> wrote:
Is this fragment a valid HTML document ?...
<!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<div>
<p>Foo</p>
</div>

Yes, it's valid. The validator would have told you that.

It's valid, but is it a valid *HTML* document? I think not, since
http://www.w3.org/TR/html4/struct/global.html
requires HTML documents to have title elements
"Every HTML document *must* have a TITLE element in the HEAD section."

Those requirements can't be fully enforced at the DTD level, but are
in the specification. It's clearly a valid SGML document, but I think
describing it as HTML is dubious.

--
Chris

Jun 2 '06 #3

Andy Dingley

Lachlan Hunt wrote:

Andy Dingley <di*****@codesmiths.com> wrote:

Is this fragment a valid HTML document ?...
<!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<div>
<p>Foo</p>
</div>

Yes, it's valid. The validator would have told you that.

I don't know _what_ the validator is telling me. As an example (from
Tidy) it gives a warning
"inserting missing 'title' element"

Now to my mind, this suggests that it's seen as a valid serialisation
of a HTML document, but that after parsing it the HTML-specific tool
has implied the <html>, <head>, <title> and presumably <body> elements.
Now that's quite a different behaviour to "These documents are valid
as fragments based on any root element".

I also don't have a generic SGML parser to hand, just HTML ones. My
real interest here is in the XML or SGML cases, not anything
HTML-specific that is being implied by the context or HTTP headers.

Jun 2 '06 #4

Joe Kesselman

Chris Morris wrote:

It's valid, but is it a valid *HTML* document?

Please note: HTML is not an XML language; it's based on SGML, and its
DTDs follow somewhat different rules.

If you're talking about XML-validity and HTML in the same sentence, you
want to move to XHTML (and hope the tools you and your customers are
using support it). Or, work in XML at the source level, and then render
into HTML at the end for output to the user; XSLT can be used to do that.

Jun 2 '06 #5

Joe Kesselman

>I don't know _what_ the validator is telling me. As an example (from

Tidy) it gives a warning
"inserting missing 'title' element"

Tidy isn't a validatator. It's a tool for repairing broken documents.

Jun 2 '06 #6

Harlan Messinger

Chris Morris wrote:

Lachlan Hunt <sp***********@gmail.com> writes:
Andy Dingley <di*****@codesmiths.com> wrote:
Is this fragment a valid HTML document ?...
<!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<div>
<p>Foo</p>
</div>

Yes, it's valid. The validator would have told you that.

It's valid, but is it a valid *HTML* document?

It isn't an HTML document at all. By its own declaration, it's a DIV
document.

Jun 2 '06 #7

Harlan Messinger

Harlan Messinger wrote:

Chris Morris wrote:
Lachlan Hunt <sp***********@gmail.com> writes:
Andy Dingley <di*****@codesmiths.com> wrote:
Is this fragment a valid HTML document ?...
<!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<div>
<p>Foo</p>
</div>
Yes, it's valid. The validator would have told you that.

It's valid, but is it a valid *HTML* document?

It isn't an HTML document at all. By its own declaration, it's a DIV
document.

http://www.w3.org/TR/html4/struct/global.html#h-7.3

"After document type declaration, the remainder of an HTML document is
contained by the HTML element."

Jun 2 '06 #8

Peter Flynn

Andy Dingley <di*****@codesmiths.com> wrote:

What specifies the permitted root element(s) for a document ? HTML,
SGML, XHTML or XML ?
When using a DTD, any declared element type can be the root element.
It must be specified in the Document Type Declaration in the XML file.
The same is true for SGML, HTML, XHTML eg

<!DOCTYPE table PUBLIC "-//W3C//DTD HTML 4.01 Strict//EN">

specifies a document starting with <table> and containing anything
valid in HTML 4.01 tables.

Warning: *browsers* are not SGML conforming applications, so they won't
understand this. They *will* understand if you use XML or XHTML, but
I don't know what their reaction to a XHTML fragment would be.
My question is, is there any way to author a non-HTML DTD (SGML or XML)
so as to restrict valid documents to only allow a certain subset of
their elements to be used as the root element?
Yep, just use the element type name of your choice in the Document
Type Declaration. This is required to be supported by all conforming
editors using a DTD. If you use a Schema, all bets are off, as the
specification of a root element type is done quite differently there.
Can this restriction be
expressed _entirely_ within a DTD?
No, not at all. *Any* element type of a DTD can be used as the root
element type.

But conforming applications (eg editors) usually make a good guess
if they are worth anything, when they parse the DTD -- it's not
hard for them to spot that at least one element type is never used
in the content model of any other element type, and is therefore a
good choice for a default root element type. Oddly, some otherwise
very good editors fail to do this, possibly because their programmers
simply didn't grok XML markup.
Is this used within the HTML DTDs ?
(i.e. not just in the doctypedecl)
Not explicitly.
Is this fragment a valid HTML document ?
Yes, perfectly.
If not, why isn't it? Just
which part of its definition is forbidding this fragmentary use?
<!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<div>
<p>Foo</p>
</div>
You can test this by running it through any SGML validating parser
(eg nsgmls).
Good tutorial refs on DTDs are also welcome. I don't know anything like
enough on DTD innards.

The best by far is still Eve Maler and Jeanne El Andaloussi, "Developing
SGML DTDs -- from text to model to markup", Prentice Hall, 1996. You
just have to skip the bits which refer to those parts of SGML which were
dropped in the XML Specification (see the list in the FAQ on converting
DTDs to XML at http://xml.silmaril.ie/developers/dtdconv/).

But you should also bone up on Relax NG, which is a schema language with
a short (DTD-like) syntax as well as a verbose syntax, from which you
can generate DTDs, W3C Schemas, and more. This may be an easier way into
document modelling.

///Peter
--
XML FAQ: http://xml.silmaril.ie/

Jun 2 '06 #9

Peter Flynn

Chris Morris wrote:

Lachlan Hunt <sp***********@gmail.com> writes:
Andy Dingley <di*****@codesmiths.com> wrote:
Is this fragment a valid HTML document ?...
<!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<div>
<p>Foo</p>
</div>

Yes, it's valid. The validator would have told you that.

It's valid, but is it a valid *HTML* document? I think not, since
http://www.w3.org/TR/html4/struct/global.html
requires HTML documents to have title elements
"Every HTML document *must* have a TITLE element in the HEAD section."

Those requirements can't be fully enforced at the DTD level, but are
in the specification. It's clearly a valid SGML document, but I think
describing it as HTML is dubious.

It's a HTML *fragment*. Browsers may gag on it. Properly conformant
software won't.

///Peter
--
XML FAQ: http://xml.silmaril.ie/

Jun 2 '06 #10

Jukka K. Korpela

Peter Flynn <pe********@m.silmaril.ie> scripsit:

Is this fragment a valid HTML document ?

Yes, perfectly.

No, it is a valid SGML document, but it is not an HTML document, as defined
in HTML specifications. (Of course, most "HTML documents" on the Web are not
HTML documents in that sense, but the question is meaningful only if
interpreted as relating to specifications. "HTML document" in the loose
sense - as well as "XML document" when well-formedness is not required - is
far too fuzzy a concept to be argued about.)

If not, why isn't it? Just
which part of its definition is forbidding this fragmentary use?
<!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<div>
<p>Foo</p>
</div>

You can test this by running it through any SGML validating parser
(eg nsgmls).

That would indicate the validity, but the HTML 4.01 specification requires
that one of three specific DOCTYPE declarations be used - not just that one
of three DTDs be used. And this isn't one of them. Moreover, the
specification explicitly says:
"After document type declaration, the remainder of an HTML document is
contained by the HTML element."
http://www.w3.org/TR/REC-html40/stru...bal.html#h-7.3

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Jun 3 '06 #11

Joe Kesselman

In other words: As always, a DTD -- or a schema -- is only a partial
description of what makes a document correct and meaningful. Think of
these as "higher-level syntax checking"; the application is always going
to impose semantic constraints as well.

Having the schema or DTD describes the document's structure in a
machine-readable form that tools can take advantage of, so they don't
have to do *all* the checking themselves. That's valuable. But don't
expect it to be complete.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry

Jun 3 '06 #12

Jukka K. Korpela

Joe Kesselman <ke************@comcast.net> scripsit:

In other words:
In future, please quote or paraphrase the message that you are commenting
on.
As always, a DTD -- or a schema -- is only a partial
description of what makes a document correct and meaningful.
It depends on. There's no law that requires additional rules, though pure
syntax as such _is_ somewhat boring.
Think of
these as "higher-level syntax checking"; the application is always
going to impose semantic constraints as well.

What's "higher-level" here? Anyway, in the issue discussed in this thread,
it is the additional _syntactic_ constraints that imply that a certain kind
of document is not an HTML document. There's nothing semantic in the
requirement that a document contain a specific DOCTYPE declaration or that a
document contain a <title> element. (Requiring that the <title> element
contain text that is a descriptive name for the document, especially for use
as a title for it in different contexts, would be a semantic requirement.
Whether HTML specifications make such a requirement is debatable; the prose
in the specs is a mixture of normative-looking prose, comments, hints,
wishful thinking, etc.)

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Jun 3 '06 #13

Henri Sivonen

In article <11**********************@i40g2000cwc.googlegroups .com>,
"Andy Dingley <di*****@codesmiths.com>" <di*****@codesmiths.com>
wrote:

My question is, is there any way to author a non-HTML DTD (SGML or XML)
so as to restrict valid documents to only allow a certain subset of
their elements to be used as the root element? Can this restriction be
expressed _entirely_ within a DTD?
No and no.

RELAX NG can restrict the allowed roots and does not allow the document
to override.
Is this fragment a valid HTML document ? <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<div>
<p>Foo</p>
</div>
Valid in the SGML sense but not conforming to the HTML 4.01 spec.
Validity is overrated. DTD-validity is especially overrated.
Good tutorial refs on DTDs are also welcome. I don't know anything like
enough on DTD innards.

Since you haven't learning invested in DTDs, unless you have a
non-negotiable requirement to use them, I suggest learning RELAX NG
Compact Syntax instead:
http://relaxng.org/compact-tutorial-20030326.html

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Validation Service for RELAX NG: http://hsivonen.iki.fi/validator/

Jun 3 '06 #14

Alan J. Flavell

On Sat, 3 Jun 2006, Joe Kesselman wrote:

In other words:
Who and what are you trying to restate? Your header says it's
<UH**************@reader1.news.jippii.net> by Jukka, but readers have
no idea which part(s) of that posting you are trying to comment, on,
contradict, misquote, or whatever. Please observe customary usenet
courtesies.
As always, a DTD -- or a schema -- is only a partial
description of what makes a document correct and meaningful.
The W3C HTML specification requires the document root to be the <html>
element. That seems to me to be a syntactic constraint on anything
which lays claim to being an "HTML document" (as opposed to a
fragment). Which is part of what Jukka said, and which you appear to
be trying to obfuscate.
Think of these as "higher-level syntax checking"; the application is
always going to impose semantic constraints as well.
Of course; but your comment, far from being a restatement "in other
words" of the article you were following-up to, appears to be some
quite unrelated issue, that throws little or no light on what Jukka
said. By failing to quote the relevant parts on which you are
commenting, you give the unfortunate impression that you are making it
harder for readers to see just how the reasoning is being de-railed.
Having the schema or DTD describes the document's structure in a
machine-readable form that tools can take advantage of, so they
don't have to do *all* the checking themselves. That's valuable. But
don't expect it to be complete.

It seems to me that you could do well to distinguish between an "HTML
document", and an HTML fragment. The kind of HTML fragment under
discussion here is not (IMO) an "HTML document" within the meaning of
the applicable specifications, and that is on syntactic grounds.

Jukka is going a bit far at the point where he says:

|the HTML 4.01 specification requires that one of three specific
|DOCTYPE declarations be used ...

- since this would appear to rule out ISO HTML as being a bona fide
kind of HTML, quite apart from the various custom DTD which are
around, and which I think most folks would accept as *kinds* of HTML
document, albeit not approved by the W3C.

But the main argument does not hinge on that detail, as far as I can
tell. Their root element (express or implied) needs to be <html>
before they can be an "HTML document".

h t h

Jun 3 '06 #15

Henri Sivonen

In article <Pi*******************************@ppepc87.ph.gla. ac.uk>,
"Alan J. Flavell" <fl*****@physics.gla.ac.uk> wrote:

Jukka is going a bit far at the point where he says:

|the HTML 4.01 specification requires that one of three specific
|DOCTYPE declarations be used ...

- since this would appear to rule out ISO HTML as being a bona fide
kind of HTML,

I think it is quite appropriate to claim that ISO HTML is not conforming
HTML *4.01*.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Jun 3 '06 #16

Alan J. Flavell

On Sat, 3 Jun 2006, Henri Sivonen wrote:

"Alan J. Flavell" <fl*****@physics.gla.ac.uk> wrote:
Jukka is going a bit far at the point where he says:

|the HTML 4.01 specification requires that one of three specific
|DOCTYPE declarations be used ...

- since this would appear to rule out ISO HTML as being a bona
fide kind of HTML,

I think it is quite appropriate to claim that ISO HTML is not
conforming HTML *4.01*.

Oh, indeed. What Jukka said was entirely reasonable within its own
terms, but what light did it throw on a generic definition of the term
"HTML document"? I suppose I was griping more about what he didn't
say, than about what he did. Sorry.

Maybe we're losing sight of where this discussion came from:

|> > Just
|> > which part of its definition is forbidding this fragmentary use?
|> > <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
|> > "http://www.w3.org/TR/html4/strict.dtd">
|> > <div>
|> > <p>Foo</p>
|> > </div>

It seems entirely plausible to test *that* particular question against
the HTML/4.01 specification, since it calls-out the HTML/4.01 DTD [1]

But then we have to differentiate the question 'what defines an "HTML
document" according to this or that specific flavour of HTML?' from
the more general question of 'who is entitled to define the term "HTML
document" without reference to any specific flavour of HTML, and where
would we find such a definition?'.

I'm saying that - no matter which specific HTML DTD were to be called
out from the above DOCTYPE - the result could be an HTML fragment, but
it would be unreasonable to claim it as an "HTML document". But I'm
not sure that I would be able to give you chapter and verse to settle
that argument authoritiatively. And no review of definitions of each
/individual version of HTML/ could suffice to define the term "HTML"
generically.

regards

[1] Yes, I've reviewed the historic arguments about an SGML DTD not
defining what we all had thought it did. But they relied on doing
things which HTML rules out, but which SGML does not allow to be ruled
out. Taken to its logical conclusion, that would result in HTML
disappearing entirely in a puff of logic. I didn't want to go there.

Jun 3 '06 #17

Jack

Henri Sivonen wrote:

In article <Pi*******************************@ppepc87.ph.gla. ac.uk>,
"Alan J. Flavell" <fl*****@physics.gla.ac.uk> wrote:
Jukka is going a bit far at the point where he says:

|the HTML 4.01 specification requires that one of three specific
|DOCTYPE declarations be used ...

- since this would appear to rule out ISO HTML as being a bona fide
kind of HTML,

I think it is quite appropriate to claim that ISO HTML is not
conforming HTML *4.01*.

Would you care to expand on this apparently rather odd statement?

As far as I am aware, ISO HTML is essentially a restatement of W3C HTML
4.01, with certain recommendations transformed into requirements, and
certain deprecations transformed into exclusions. Apart from that, the
recommended DTD declaration is different; but the exact DTD to be
declared is not a requirement of W3C HTML 4.01 anyway.

Pleae explain whatever I may have misunderstood!

--
Jack.

Jun 3 '06 #18

Henri Sivonen

In article <e5*******************@news.demon.co.uk>,
Jack <mr*********@nospam.jackpot.uk.net> wrote:

Henri Sivonen wrote:
In article <Pi*******************************@ppepc87.ph.gla. ac.uk>,
"Alan J. Flavell" <fl*****@physics.gla.ac.uk> wrote:
Jukka is going a bit far at the point where he says:

|the HTML 4.01 specification requires that one of three specific
|DOCTYPE declarations be used ...

- since this would appear to rule out ISO HTML as being a bona fide
kind of HTML,
I think it is quite appropriate to claim that ISO HTML is not
conforming HTML *4.01*.

Would you care to expand on this apparently rather odd statement?

The specs make incompatible requirements about the doctype, which means
conformance to the specs is mutually exclusive.
As far as I am aware, ISO HTML is essentially a restatement of W3C HTML
4.01, with certain recommendations transformed into requirements, and
certain deprecations transformed into exclusions. Apart from that, the
recommended DTD declaration is different; but the exact DTD to be
declared is not a requirement of W3C HTML 4.01 anyway.

But Jukka Korpela pointed out in the quoted part that W3C HTML 4.01 does
have a requirement of particular doctypes.

(Whether these requirements should be considered bogus or not is another
matter.)

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Jun 3 '06 #19

Alan J. Flavell wrote:

I'm saying that - no matter which specific HTML DTD were to be called
out from the above DOCTYPE - the result could be an HTML fragment, but
it would be unreasonable to claim it as an "HTML document".

You have no choice but claim it as "HTML document". It is served from
the served with "Content-Type: text/html", for local files it is served
as the same type by association .html,.htm... --> text/html.

So before any DTD you /have/ to explicetly declare what document you
are serving - this is the only way to make an application to react on
it. This way however you would twist around an HTML code, it is always
/HTML document/ for the recipient: correctly formatted or badly broken
is another issue. Out of curiosity you can serve a page from your
server such as:

Content-Type: text/html\n\n
!@#$%&*
P.S. I'm really glad to see that the discussion at
<http://groups.google.com/group/comp.infosystems.www.authoring.html/browse_frm/thread/4fd4218808cd53ce>

triggered your curiosity and the thinking process in whole.

Just try to not put your frustration on Mr.Kesselman - he has nothing
to do with it.

Jun 3 '06 #20

Joe Kesselman

Jukka K. Korpela wrote:

In future, please quote or paraphrase the message that you are
commenting on.
I usually do. Apologies.
It depends on. There's no law that requires additional rules
Granted. It's rare that there aren't any, in my experience, unless the
document type is pure structure.

Think of
these as "higher-level syntax checking"; the application is always
going to impose semantic constraints as well.

What's "higher-level" here?

Higher than the basic XML syntax.
Anyway, in the issue discussed in this
thread, it is the additional _syntactic_ constraints that imply that a
certain kind of document is not an HTML document.
That's what I was agreeing with, though apparently I may have phrased it
badly. The DTD is not always a completely constrained specification of
"a kind of document". That flexibility may in fact have been deliberate;
I strongly suspect the intent was that a single DTD could describe
several documents which share related structures.
Whether HTML specifications make such a
requirement is debatable; the prose in the specs is a mixture of
normative-looking prose, comments, hints, wishful thinking, etc.)

http://www.w3.org/TR/1999/REC-html40...bal.html#h-7.1

The complicating factor here is the use of the word "should". The HTML4
spec predates the W3C's adoption of the normative use of MAY, SHOULD,
and MUST to mean "optional", "don't violate this without extremely good
reason", and "required by the spec" respectively. So we need to
crosscheck that.

XHTML 1.0 does follow that convention, so we can backhandedly check the
intent by looking at that spec. There, a Strictly Conforming Document
must (!) have html as its root element, and this is *NOT* flagged as one
of the differences from HTML4 either in this spec or in the
compatability guidelines (http://www.w3.org/TR/xhtml1/#guidelines). This
strongly suggests that the W3C intended that HTML4 docs follow this rule.

I agree, that's a less than ideal way to answer this question, but I can
tell you that even folks working on the W3C's specs often have to resort
to that kind of pointer chasing to nail things down.

If you need a fully official answer... I haven't checked; are any of us
members of the (X)HTML Working Group? If not, I'd suggest dropping a
quick note to ww******@w3.org and suggesting that it might be good to
have an erratum which clarifies whether this "should" was intended to be
"must" or not. (I checked; there isn't one.)

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry

Jun 3 '06 #21

Jukka K. Korpela

VK <sc**********@yahoo.com> scripsit:

Alan J. Flavell wrote:
I'm saying that - no matter which specific HTML DTD were to be called
out from the above DOCTYPE - the result could be an HTML fragment,
but it would be unreasonable to claim it as an "HTML document".
You have no choice but claim it as "HTML document".

Surely there's the option of being silent? And, in fact, saying that it is
not an HTML document.
It is served from
the served with "Content-Type: text/html",
So what? Serving it as image/gif would not make it a GIF image. The Internet
media type would be incorrectly declared. A Content-Type declaration does
not magically _make_ the data conform to the specification of a specific
media type.
for local files it is
served as the same type by association .html,.htm... --> text/html.
That's a rule that you just made up. Besides, nobody said the filename
suffix is .html or .htm. For all that you can know, it can be .gif or .foo.
So before any DTD you /have/ to explicetly declare what document you
are serving

Nope. Nobody forces you to serve a document on the Internet, or using HTTP
in particular.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Jun 3 '06 #22

Jukka K. Korpela

Joe Kesselman <ke************@comcast.net> scripsit:

http://www.w3.org/TR/1999/REC-html40...bal.html#h-7.1

The complicating factor here is the use of the word "should".
I don't see any "should" in the statement "An HTML 4 document is composed of
three parts:...", which explicitly mentions the <head> element, which by the
DTD must be nonempty. (The <head> and </head> tags are omissible, but the
<title> element is not.)

Besides, reading a bit further, under 7.2 you find
"HTML 4.01 specifies three DTDs, so authors must include one of the
following document type declarations in their documents."

Regarding the more abstract and more vague question what is an "HTML
document" in general, surely any reasonable definition would require
syntactic conformance to _some_ published specification (though not
necessarily one that uses a DTD, for example). The issue was a document that
contains a DOCTYPE declaration referring to an HTML 4.01 DTD, so what HTML
specification could it possibly comply with?
If you need a fully official answer... I haven't checked; are any of
us members of the (X)HTML Working Group? If not, I'd suggest dropping
a quick note to ww******@w3.org and suggesting that it might be good
to have an erratum which clarifies whether this "should" was intended
to be "must" or not. (I checked; there isn't one.)

They are clearly not interested in doing such things. Look at the errata:
http://www.w3.org/MarkUp/html4-updates/errata
(The absence of any additions since May 2001 does not mean that no errors
have been reported.)
HTML 4.01 is closed for all practical purposes, with all the flaws,
ambiguities, and vagueness.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Jun 3 '06 #23

Peter Flynn

Jukka K. Korpela wrote:

Peter Flynn <pe********@m.silmaril.ie> scripsit:
Is this fragment a valid HTML document ?
Yes, perfectly.

No, it is a valid SGML document, but it is not an HTML document, as
defined in HTML specifications.

Yes, if you need to reference the HTML Spec in addition to the DTD.
That would indicate the validity, but the HTML 4.01 specification
requires that one of three specific DOCTYPE declarations be used - not
just that one of three DTDs be used.
That's why it is unenforceable by a standard parser. Only browsers
implement this requirement, and they are not conforming SGML
applications.
And this isn't one of them.
Moreover, the specification explicitly says:
"After document type declaration, the remainder of an HTML document is
contained by the HTML element."
http://www.w3.org/TR/REC-html40/stru...bal.html#h-7.3

I'm not clear why you were asking this question if you already knew
the answer.

///Peter

Jun 3 '06 #24

Jukka K. Korpela

Peter Flynn <pe********@m.silmaril.ie> scripsit:

Jukka K. Korpela wrote:
Peter Flynn <pe********@m.silmaril.ie> scripsit:
Is this fragment a valid HTML document ?

Yes, perfectly.
No, it is a valid SGML document, but it is not an HTML document, as
defined in HTML specifications.

Yes, if you need to reference the HTML Spec in addition to the DTD.

I'm not sure I see what you are saying "Yes" to and what the if statement
relates to. Surely what is or is not an HTML document is to be defined in
HTML specifications, not in a DTD.

That would indicate the validity, but the HTML 4.01 specification
requires that one of three specific DOCTYPE declarations be used -
not just that one of three DTDs be used.

That's why it is unenforceable by a standard parser.

Yes, but the question was not whether something can be enforced.
Only browsers
implement this requirement, and they are not conforming SGML
applications.
They surely aren't, but they don't implement the requirement. They simply
started using the presence and exact form of a DOCTYPE declaration to decide
on the "quirks" vs. "standard" mode. They don't reject a document on the
grounds that it lacks a correct DOCTYPE; they simply process it differently.
(OK, you might say that "quirks" mode intentionally deviates from the
standards, but this is really just a difference in degree - the "standards"
mode isn't standard-conforming either. Besides, "quirks" mode largely means
intentionally broken CSS implementation rather than intentionally broken
HTML implementation.)
I'm not clear why you were asking this question if you already knew
the answer.

I wasn't. It wasn't me who asked the original question. I'm just commenting
on the answers.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Jun 3 '06 #25

Lachlan Hunt

Chris Morris wrote:

Lachlan Hunt <sp***********@gmail.com> writes:
Andy Dingley <di*****@codesmiths.com> wrote:
Is this fragment a valid HTML document ?...
<!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<div>
<p>Foo</p>
</div>

Yes, it's valid. The validator would have told you that.

It's valid, but is it a valid *HTML* document?

There is a difference between validity and conformance. It is a valid
document, though it is not a conforming HTML document.

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://GetThunderbird.com/ Reclaim your Inbox

Jun 4 '06 #26

Joe Kesselman

Jukka K. Korpela wrote:

HTML 4.01 is closed for all practical purposes, with all the flaws,
ambiguities, and vagueness.

Granted; new effort is going into XHTML.. But in my experience, that
doesn't mean you can't get answers about HTML if you ask intelligent
questions.

I don't care enough to pursue it further. If you do, either try to get
an official ruling or live with ambiguity.
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry

Jun 4 '06 #27

Andy Dingley

Henri Sivonen wrote:

Since you haven't learning invested in DTDs, unless you have a
non-negotiable requirement to use them, I suggest learning RELAX NG
Compact Syntax instead:
http://relaxng.org/compact-tutorial-20030326.html

Thanks to everyone for their contributions to this useful thread.

As to Relax, then I've been using that for a couple of years now and
found it an excellent format for human-readable definitions. However
most of my actual work is with Schema, simply because it's the
data-typing layer I use with some OWL work (although Relax is making
inroads there).

This particular job needs to be built around DTDs though, something
which so far I've managed to avoid bothering with.

Jun 5 '06 #28

Andy Dingley

Joe Kesselman wrote:

I don't know _what_ the validator is telling me. As an example (from
>Tidy) it gives a warning
>"inserting missing 'title' element"

Tidy isn't a validatator. It's a tool for repairing broken documents.

Agreed. But it's already on my desktop and nsgmls isn't
(or at least is refusing to install and work right thus far)
Anyone care to comment on what Tidy thinks this document _is_ ?

Now I think we can agree that "<!doctype...><div><p>Foo</p></div>" is
probably a valid HTML fragment, but that it's not correct to serve such
things over the web.

Now what's Tidy trying to interpet it as? As far as I can judge, Tidy
think this is _also_ a valid HTML document, albeit one that needs a lot
of implicit content adding to <head> beforehand. Is this at all
justifiable, or is Tidy completely out to lunch here?

Jun 5 '06 #29

Root element specified by DTD ?

Similar topics