469,276 Members | 1,645 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Share your developer knowledge by writing an article on Bytes.

What an (X)HTML document should look like

8,651 Expert Mod 8TB
Lately I have seen so much awful HTML, that I like to show what a HTML document should look like, regarding the requirements from the W3C.

the absolute minimum is defined as:
An HTML 4 document is composed of three parts:
  1. a line containing HTML version information,
  2. a declarative header section (delimited by the HEAD element),
  3. a body, which contains the document's actual content. The body may be implemented by the BODY element or the FRAMESET element.
or expressed in code:
Expand|Select|Wrap|Line Numbers
  2.    "http://www.w3.org/TR/html4/strict.dtd">
  3. <HTML>
  4.    <HEAD>
  5.       <TITLE>My first HTML document</TITLE>
  6.    </HEAD>
  7.    <BODY>
  9.    </BODY>
  10. </HTML>
note that the page title is a required HTML Element!

From my experiences there is at least one additional element, you absolutely should use: the "content-type" meta element.
Not only does it provide the browser with the HTML MIME type, it also tells it about the used character encoding (while this may be of no concern to english content, it is a concern to anyone whose language uses more than the ASCII characters (e.g. Chinese, Japanese and other Asian languages, European languages (nearly every european language besides English and Italian, e.g. French (, , , ), German (, , , ), etc.), Languages using cyrillic (and related) characters (e.g. Russian), Arabic languages and many more). And there is usually more than one character set for a language…

OK, back to topic. This is the HTML template that I use for coding (variables in curly braces)
Expand|Select|Wrap|Line Numbers
  1. <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
  2. <html lang="{en}">
  3.     <head>
  4.         <meta http-equiv="content-type" content="text/html;charset={utf-8}">
  5.         <title>{your page title}</title>
  6.         <link rel="stylesheet" href="{your.css}" media="screen">
  7.         <style type="text/css">
  8. {your CSS definitions}
  9.         </style>
  10.         <script type="text/javascript" src="{your.js}"></script>
  11.         <script type="text/javascript">
  12. {your JavaScript code}
  13.         </script>
  14.     </head>
  15.     <body>
  17.     </body>
  18. </html>
additional notes:
  • some earlier browsers required the script content to be wrapped in comment tags (<!-- -->), so that the content is not displayed. Any decent browser nowadays knows the difference between <head> and <body>, so this is not necessary anymore.
  • HTML element names may be written upper or lower case. However, I recommend lower case.

Of course there is another upcoming HTML variant: XHTML. before I get to the code, one word of caution: Internet Explorer does not support native XHTML (there are some hacks, though)
however, since XHTML is a subset of XML, it is recommended to us an XHTML Media Type (probably due to the compatibility issues (mainly with IE)), which is "application/xhtml+xml".

IMPORTANT: if you serve your XHTML with the "text/html" mediatype, it is not treated as XHTML at all. It is then processed as HTML, which makes the effort of writing XHTML futile.

Naturally, there are also requirements for XHTML (including a HTML/XHTML comparison).

This is my XHTML template, which is not intended to be HTML compatible (in the first place), thus using the somewhat more restrictive XHTML 1.1 subset. (To serve these files to IE, I use a server script to transform it into HTML.)
Expand|Select|Wrap|Line Numbers
  1. <?xml version="1.0" encoding="{ISO-8859-1}" ?>
  2. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
  3. <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="{en}">
  4.     <head>
  5.         <meta http-equiv="content-type" content="application/xhtml+xml;charset={ISO-8859-1}"/>
  6.         <title>{your page title}</title>
  7.         <link rel="stylesheet" href="{your.css}" media="screen"/>
  8.         <style type="text/css">
  9. /* <![CDATA[ */
  10. {your CSS definitions}
  11. /* ]]> */
  12.         </style>
  13.         <script type="text/javascript" src="{your.js}"></script>
  14.         <script type="text/javascript">
  15. // <![CDATA[
  16. {your JavaScript code}
  17. // ]]>        
  18.         </script>
  19.     </head>
  20.     <body>
  22.     </body>
  23. </html>
additional notes:
  • in contrast to HTML, XHTML files are checked for validaty by the browser (the XML parser, to be exact). If your document is neither well-formed nor valid (or both), you are prompted with an error message.
  • if—and only if—you use the UTF-8 or UTF-16 charset, you may omit the XML prolog (it’s the default encoding)
  • using " />" (with a leading space) for the empty element tag should be used when you’re writing HTML compatible XHTML*.
  • XHTML element names must be written in lower case.
  • the CDATA sections in <script> and <style> are required to keep the document valid**.
  • the <body> element itself does not allow for plain text and inline elements***.
* but in this case you could write HTML as well…
** due to XML restrictions
*** as defined in the DTD

thanks for your attention so far and happy copy/paste

PS don’t forget to save the document in that encoding you defined in the document.
Oct 27 '09 #1
11 4548
884 Expert 512MB
Nice article.

I'm "not" a HTML expert....but when I try to validate your code in this article on w3c's HTML validation service (http://validator.w3.org/, it returns some errors/warnings. Can you please elaborate on this? I'm designing a small personal website and I'm planning to use this article as a reference as I don't know much HTML.

Dec 15 '09 #2
8,651 Expert Mod 8TB
not sure, which you mean. the only note I get is due to the validation method. direct input validation automaticly uses UTF-8 and gives a warning if you constrain the charset value to something different as well as a note telling about that.

the HTML template throws an error, because I omitted the body content (which is not part of the template).

and of course you have to remove the curly braces…

PS. and of course you can’t get the XHTML to run in IE…
Dec 15 '09 #3
884 Expert 512MB
Oops....my bad. I never read those error messages carefully but then I don't know much HTML either. I must agree though that I should have atleast identified that 'curly braces' error.

If 'body' is empty, why is this flagged as error? Or is it just that specification doesnot allows body to be empty.

Nevermind my previous reply now.

Anyways, when are you going to move this (and many other articles) to insights section. Many of your articles are nicely done but they are still in writing room. Is writing room visible in Google search results?

PS. and of course you cant get the XHTML to run in IE
Does Internet Explorer works natively on Linux(Ubuntu) and BSD (OpenBSD)? ;-)
I'm using firefox on both of my *nix boxes.

Dec 15 '09 #4
884 Expert 512MB
me too.

EDIT: A few days ago, the HTML of my personal website was a good example of 'awful' HTML.
Dec 15 '09 #5
8,651 Expert Mod 8TB
excerpt from the DTDs
Expand|Select|Wrap|Line Numbers
  1. <!ELEMENT body %Block;>
this requires a block level element child (%Block; is the replacement group for the block elements).

Expand|Select|Wrap|Line Numbers
  1. <!ELEMENT BODY O O (%block;|SCRIPT)+ +(INS|DEL) -- document body -->
not sure what the controls are for, but basicly it’s the same, except that you can choose between <script> and block level elements.

I wish I knew, but there’s currently no chief editor and if there’s no response, you don’t know if there may be an error left.

*rofl* that's a good one…
ahem… no (and I hope it stays this way)
Dec 15 '09 #6
7,435 Expert 4TB
The xml declaration, the first line shown in the example, is entirely correct, however IE<8 does not know what to do with it and goes into quirks mode. Since most people don't serve XHTML as XHTML, that line can be safely removed.

The charset does not have to be set in the meta tag if it's sent in the http headers. Most servers do this but the validator will flag it as an error if it's not in the html. Some programs may look at the meta tag to determine the charset, too, but you'd be aware of that if it did.

Little known factoid: the html, body and head tags are optional.
Dec 18 '09 #7
8,651 Expert Mod 8TB
talking about IE issues would require an article of its own

Experience shows, that that not every server sends the correct charset. and usually you dont know, whether your server does it correct or at all.

therefore, the least problematic charset is UTF-8, because in most cases it is the default.

Little known factoid: the html, body and head tags are optional.
care to prove?
Dec 18 '09 #8
7,435 Expert 4TB
Experience shows, that that not every server sends the correct charset. and usually you don’t know, whether your server does it correct or at all.
It's easy enough to see what the server is sending using many online tools or extensions in Firefox (LiveHttpHeaders).
care to prove?
W3C Index of Elements
Dec 20 '09 #9
"<!ELEMENT BODY O O (%block;|SCRIPT)+ +(INS|DEL) -- document body -->"

What's the meaning of this code?? When and where we can use this code.

Ashwani Sharma
Feb 24 '10 #10
7,435 Expert 4TB
With HTML, you don't. That is a line of a DTD for HTML describing, in this case, the body element. In XML and XHTML, DTDs may be written for custom elements but that's another topic and another forum.
Feb 24 '10 #11
Thanks for your support Drhowarddrfine :)
Feb 24 '10 #12

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

1 post views Thread by dan baker | last post: by
47 posts views Thread by Neal | last post: by
3 posts views Thread by Sunny | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.