473,836 Members | 1,358 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Root element specified by DTD ?

What specifies the permitted root element(s) for a document ? HTML,
SGML, XHTML or XML ?
Valid HTML documents need to have a well-known DTD and a doctypedecl in
each document like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

The document's root element is "HTML", and is specified by the
doctypedecl. For HTML and XHTML it's possible that the prose of their
recommendation restricts it too.
My question is, is there any way to author a non-HTML DTD (SGML or XML)
so as to restrict valid documents to only allow a certain subset of
their elements to be used as the root element? Can this restriction be
expressed _entirely_ within a DTD? Is this used within the HTML DTDs ?
(i.e. not just in the doctypedecl)

Is this fragment a valid HTML document ? If not, why isn't it? Just
which part of its definition is forbidding this fragmentary use?
<!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<div>
<p>Foo</p>
</div>
Good tutorial refs on DTDs are also welcome. I don't know anything like
enough on DTD innards.

Thanks

Jun 2 '06
28 2709
Peter Flynn <pe********@m.s ilmaril.ie> scripsit:
Is this fragment a valid HTML document ?


Yes, perfectly.


No, it is a valid SGML document, but it is not an HTML document, as defined
in HTML specifications. (Of course, most "HTML documents" on the Web are not
HTML documents in that sense, but the question is meaningful only if
interpreted as relating to specifications. "HTML document" in the loose
sense - as well as "XML document" when well-formedness is not required - is
far too fuzzy a concept to be argued about.)
If not, why isn't it? Just
which part of its definition is forbidding this fragmentary use?
<!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<div>
<p>Foo</p>
</div>


You can test this by running it through any SGML validating parser
(eg nsgmls).


That would indicate the validity, but the HTML 4.01 specification requires
that one of three specific DOCTYPE declarations be used - not just that one
of three DTDs be used. And this isn't one of them. Moreover, the
specification explicitly says:
"After document type declaration, the remainder of an HTML document is
contained by the HTML element."
http://www.w3.org/TR/REC-html40/stru...bal.html#h-7.3

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Jun 3 '06 #11
In other words: As always, a DTD -- or a schema -- is only a partial
description of what makes a document correct and meaningful. Think of
these as "higher-level syntax checking"; the application is always going
to impose semantic constraints as well.

Having the schema or DTD describes the document's structure in a
machine-readable form that tools can take advantage of, so they don't
have to do *all* the checking themselves. That's valuable. But don't
expect it to be complete.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Jun 3 '06 #12
Joe Kesselman <ke************ @comcast.net> scripsit:
In other words:
In future, please quote or paraphrase the message that you are commenting
on.
As always, a DTD -- or a schema -- is only a partial
description of what makes a document correct and meaningful.
It depends on. There's no law that requires additional rules, though pure
syntax as such _is_ somewhat boring.
Think of
these as "higher-level syntax checking"; the application is always
going to impose semantic constraints as well.


What's "higher-level" here? Anyway, in the issue discussed in this thread,
it is the additional _syntactic_ constraints that imply that a certain kind
of document is not an HTML document. There's nothing semantic in the
requirement that a document contain a specific DOCTYPE declaration or that a
document contain a <title> element. (Requiring that the <title> element
contain text that is a descriptive name for the document, especially for use
as a title for it in different contexts, would be a semantic requirement.
Whether HTML specifications make such a requirement is debatable; the prose
in the specs is a mixture of normative-looking prose, comments, hints,
wishful thinking, etc.)

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Jun 3 '06 #13
In article <11************ **********@i40g 2000cwc.googleg roups.com>,
"Andy Dingley <di*****@codesm iths.com>" <di*****@codesm iths.com>
wrote:
My question is, is there any way to author a non-HTML DTD (SGML or XML)
so as to restrict valid documents to only allow a certain subset of
their elements to be used as the root element? Can this restriction be
expressed _entirely_ within a DTD?
No and no.

RELAX NG can restrict the allowed roots and does not allow the document
to override.
Is this fragment a valid HTML document ? <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<div>
<p>Foo</p>
</div>
Valid in the SGML sense but not conforming to the HTML 4.01 spec.
Validity is overrated. DTD-validity is especially overrated.
Good tutorial refs on DTDs are also welcome. I don't know anything like
enough on DTD innards.


Since you haven't learning invested in DTDs, unless you have a
non-negotiable requirement to use them, I suggest learning RELAX NG
Compact Syntax instead:
http://relaxng.org/compact-tutorial-20030326.html

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Validation Service for RELAX NG: http://hsivonen.iki.fi/validator/
Jun 3 '06 #14
On Sat, 3 Jun 2006, Joe Kesselman wrote:
In other words:
Who and what are you trying to restate? Your header says it's
<UH************ **@reader1.news .jippii.net> by Jukka, but readers have
no idea which part(s) of that posting you are trying to comment, on,
contradict, misquote, or whatever. Please observe customary usenet
courtesies.
As always, a DTD -- or a schema -- is only a partial
description of what makes a document correct and meaningful.
The W3C HTML specification requires the document root to be the <html>
element. That seems to me to be a syntactic constraint on anything
which lays claim to being an "HTML document" (as opposed to a
fragment). Which is part of what Jukka said, and which you appear to
be trying to obfuscate.
Think of these as "higher-level syntax checking"; the application is
always going to impose semantic constraints as well.
Of course; but your comment, far from being a restatement "in other
words" of the article you were following-up to, appears to be some
quite unrelated issue, that throws little or no light on what Jukka
said. By failing to quote the relevant parts on which you are
commenting, you give the unfortunate impression that you are making it
harder for readers to see just how the reasoning is being de-railed.
Having the schema or DTD describes the document's structure in a
machine-readable form that tools can take advantage of, so they
don't have to do *all* the checking themselves. That's valuable. But
don't expect it to be complete.


It seems to me that you could do well to distinguish between an "HTML
document", and an HTML fragment. The kind of HTML fragment under
discussion here is not (IMO) an "HTML document" within the meaning of
the applicable specifications, and that is on syntactic grounds.

Jukka is going a bit far at the point where he says:

|the HTML 4.01 specification requires that one of three specific
|DOCTYPE declarations be used ...

- since this would appear to rule out ISO HTML as being a bona fide
kind of HTML, quite apart from the various custom DTD which are
around, and which I think most folks would accept as *kinds* of HTML
document, albeit not approved by the W3C.

But the main argument does not hinge on that detail, as far as I can
tell. Their root element (express or implied) needs to be <html>
before they can be an "HTML document".

h t h

Jun 3 '06 #15
In article <Pi************ *************** ****@ppepc87.ph .gla.ac.uk>,
"Alan J. Flavell" <fl*****@physic s.gla.ac.uk> wrote:
Jukka is going a bit far at the point where he says:

|the HTML 4.01 specification requires that one of three specific
|DOCTYPE declarations be used ...

- since this would appear to rule out ISO HTML as being a bona fide
kind of HTML,


I think it is quite appropriate to claim that ISO HTML is not conforming
HTML *4.01*.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jun 3 '06 #16
On Sat, 3 Jun 2006, Henri Sivonen wrote:
"Alan J. Flavell" <fl*****@physic s.gla.ac.uk> wrote:
Jukka is going a bit far at the point where he says:

|the HTML 4.01 specification requires that one of three specific
|DOCTYPE declarations be used ...

- since this would appear to rule out ISO HTML as being a bona
fide kind of HTML,


I think it is quite appropriate to claim that ISO HTML is not
conforming HTML *4.01*.


Oh, indeed. What Jukka said was entirely reasonable within its own
terms, but what light did it throw on a generic definition of the term
"HTML document"? I suppose I was griping more about what he didn't
say, than about what he did. Sorry.

Maybe we're losing sight of where this discussion came from:

|> > Just
|> > which part of its definition is forbidding this fragmentary use?
|> > <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
|> > "http://www.w3.org/TR/html4/strict.dtd">
|> > <div>
|> > <p>Foo</p>
|> > </div>

It seems entirely plausible to test *that* particular question against
the HTML/4.01 specification, since it calls-out the HTML/4.01 DTD [1]

But then we have to differentiate the question 'what defines an "HTML
document" according to this or that specific flavour of HTML?' from
the more general question of 'who is entitled to define the term "HTML
document" without reference to any specific flavour of HTML, and where
would we find such a definition?'.

I'm saying that - no matter which specific HTML DTD were to be called
out from the above DOCTYPE - the result could be an HTML fragment, but
it would be unreasonable to claim it as an "HTML document". But I'm
not sure that I would be able to give you chapter and verse to settle
that argument authoritiativel y. And no review of definitions of each
/individual version of HTML/ could suffice to define the term "HTML"
generically.

regards

[1] Yes, I've reviewed the historic arguments about an SGML DTD not
defining what we all had thought it did. But they relied on doing
things which HTML rules out, but which SGML does not allow to be ruled
out. Taken to its logical conclusion, that would result in HTML
disappearing entirely in a puff of logic. I didn't want to go there.
Jun 3 '06 #17
Henri Sivonen wrote:
In article <Pi************ *************** ****@ppepc87.ph .gla.ac.uk>,
"Alan J. Flavell" <fl*****@physic s.gla.ac.uk> wrote:
Jukka is going a bit far at the point where he says:

|the HTML 4.01 specification requires that one of three specific
|DOCTYPE declarations be used ...

- since this would appear to rule out ISO HTML as being a bona fide
kind of HTML,


I think it is quite appropriate to claim that ISO HTML is not
conforming HTML *4.01*.

Would you care to expand on this apparently rather odd statement?

As far as I am aware, ISO HTML is essentially a restatement of W3C HTML
4.01, with certain recommendations transformed into requirements, and
certain deprecations transformed into exclusions. Apart from that, the
recommended DTD declaration is different; but the exact DTD to be
declared is not a requirement of W3C HTML 4.01 anyway.

Pleae explain whatever I may have misunderstood!

--
Jack.

Jun 3 '06 #18
In article <e5************ *******@news.de mon.co.uk>,
Jack <mr*********@no spam.jackpot.uk .net> wrote:
Henri Sivonen wrote:
In article <Pi************ *************** ****@ppepc87.ph .gla.ac.uk>,
"Alan J. Flavell" <fl*****@physic s.gla.ac.uk> wrote:
Jukka is going a bit far at the point where he says:

|the HTML 4.01 specification requires that one of three specific
|DOCTYPE declarations be used ...

- since this would appear to rule out ISO HTML as being a bona fide
kind of HTML,
I think it is quite appropriate to claim that ISO HTML is not
conforming HTML *4.01*.

Would you care to expand on this apparently rather odd statement?


The specs make incompatible requirements about the doctype, which means
conformance to the specs is mutually exclusive.
As far as I am aware, ISO HTML is essentially a restatement of W3C HTML
4.01, with certain recommendations transformed into requirements, and
certain deprecations transformed into exclusions. Apart from that, the
recommended DTD declaration is different; but the exact DTD to be
declared is not a requirement of W3C HTML 4.01 anyway.


But Jukka Korpela pointed out in the quoted part that W3C HTML 4.01 does
have a requirement of particular doctypes.

(Whether these requirements should be considered bogus or not is another
matter.)

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jun 3 '06 #19
VK
Alan J. Flavell wrote:
I'm saying that - no matter which specific HTML DTD were to be called
out from the above DOCTYPE - the result could be an HTML fragment, but
it would be unreasonable to claim it as an "HTML document".


You have no choice but claim it as "HTML document". It is served from
the served with "Content-Type: text/html", for local files it is served
as the same type by association .html,.htm... --> text/html.

So before any DTD you /have/ to explicetly declare what document you
are serving - this is the only way to make an application to react on
it. This way however you would twist around an HTML code, it is always
/HTML document/ for the recipient: correctly formatted or badly broken
is another issue. Out of curiosity you can serve a page from your
server such as:

Content-Type: text/html\n\n
!@#$%&*
P.S. I'm really glad to see that the discussion at
<http://groups.google.c om/group/comp.infosystem s.www.authoring .html/browse_frm/thread/4fd4218808cd53c e>

triggered your curiosity and the thinking process in whole.

Just try to not put your frustration on Mr.Kesselman - he has nothing
to do with it.

Jun 3 '06 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
3262
by: Sascha Kerschhofer | last post by:
If I define more than one element "globally" in an XML schema, is there any hint which one is the actual root element for any instance document? e.g. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"> <xs:complexType> <xs:sequence> <xs:element ref="b"/> <xs:element name="c"/> </xs:sequence>
2
2428
by: Stanimir Stamenkov | last post by:
I'm trying to clear some sizing issues relative to the initial containing block and the root document element. The sample document I'm trying with: http://stanio.info/viewport_fill.html Basically, for some tests I want to specify the height of an example DIV element inside the BODY using percentages of the viewport height. For this to work the BODY container should fill the viewport height where I'm using:
1
2519
by: Brian | last post by:
Every time add data and save an xml document using XmlDataDocument.Save I get another root node added to the xml file. Am I doing something wrong or is this supposed to happen? Sample Code: srdReader = New StreamReader(SCHEMA1.xsd) xmlFile.DataSet.ReadXmlSchema(srdReader) xmlFile.Load(XML1.xml)
28
2562
by: Andy Dingley | last post by:
What specifies the permitted root element(s) for a document ? HTML, SGML, XHTML or XML ? Valid HTML documents need to have a well-known DTD and a doctypedecl in each document like this: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> The document's root element is "HTML", and is specified by the
16
3534
by: TT (Tom Tempelaere) | last post by:
Hi all, I created an XSD to define the structure of an XML file for my project. I made an XML file linked to the XSD using XmlSpy. The problem is that if I read the file using .NET XmlDocument and then query for the root element, the result is always null (1). However if I strip the root element of all attributes generated by XmlSpy, then there is no problem to find the root element with .NET XML classes (2). (1) The XML for which...
8
1950
by: VK | last post by:
Can be multiple instances of element used as the root element? That's a curly way of asking, but I did not come up with a better sentence, sorry. What I mean is with a document like: <?xml version="1.0" encoding="UTF-8"?> <root> <element>Content</element> <root><element>Content</element></root> <element>Content</element>
0
1573
by: Dave Hill | last post by:
Forgive a newbie question. I'm learning the .NET XML environment. In the walkthrough on using XML designer to create an xsd, there is no discussion of the root element of the target xml document. I realize that the namespace specifying the xsd is an attribute of the root element of an xml document conforming to the xsd, so is logically outside the xsd. I built a simple schema, then added an XML document, added a root element to the...
9
6489
by: Mark Olbert | last post by:
I'm trying to serialize (using XmlSerializer.Serialize) a class that I generated from an XSD schema using XSD.EXE /c. The problem I'm running into is that the root element needs to be unqualified, and the default namespace needs to be included on it as an attribute. The schema I'm using is this: <xs:schema xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:html="http://www.w3.org/TR/REC-html40"...
0
2826
by: icesign | last post by:
I know that the selector of these elements has a scope relative to the element being declared, but maybe there is a way to get beyond bounds of this scope or maybe just a way to extend base element? Here’s a working example: <xs:schema id="schema" targetNamespace="http://tempuri.org/schema.xsd" elementFormDefault="qualified" xmlns="http://tempuri.org/schema.xsd" xmlns:mstns="http://tempuri.org/schema.xsd" ...
0
9813
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9665
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10835
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10249
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7785
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6976
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5645
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4447
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3108
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.