473,378 Members | 1,387 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

HTML docs for browser writers (not users)??????

Nearly everything written about HTML falls into one of two categories:

1. Material written for HTML authors, or
2. Material written for user-agent implementors about standard HTML

However, programmers writing a browser need to know about invalid,
obsolete, and non-standard HTML also, because that's what shows up on the
web. For example, on the Yahoo site alone I see:

-- use of hex numbers for color attributes without the #
-- use of the <spacer> tag
-- use of the <image> tag (in addition to <img>, of course)

and more.

Does anyone know of documentation about what one might call "reality HTML,"
versus "standard HTML" or "recommended HTML?" I don't mean documentation of
deprecated tags... any tag that was ever standardized is more than
adequately documented. What I'm looking for is documentation on tags or
attributes that are in widespread use, but were never standardized.

Does anyone know of any surveys of actual HTML that's in use, maybe in the
form of a report of frequency of occurrence of tags, whether standardized
or not? (If not, I'm thinking of conducting such a survey using a custom-
designed web crrawler.)

--Marc
Jul 20 '05 #1
8 2041
Marc Rochkind wrote:

[snip]
Does anyone know of documentation about what one might call "reality
HTML," versus "standard HTML" or "recommended HTML?" [snip]

I'm afraid not. That's part of the problem with tag soup, of course, UA
authors need to reverse engineer popular browsers rather than just code to
a specification. There's been a few illuminating posts by Hixie & Hyatt on
their blogs, in particular:

<URL:http://ln.hixie.ch/?start=1037910467&count=1>
<URL:http://weblogs.mozillazine.org/hyatt/archives/2003_03.html#002904>

You'll want to have a skim of their sites for more stuff like this,
especially now that Hixie has started at Opera (Hyatt works on Safari, and
Mozilla before that). The bug databases of common browsers might be
helpful:

<URL:http://bugzilla.mozilla.org/> (Mozilla derivatives)
<URL:http://bugs.kde.org/> (Konqueror's bugs are in the main KDE bug
tracker, but fairly easy to pull out with a query)

You may want to join the relevant W3C mailing lists as well, I'm sure you
could get a few questions answered there:

<URL:http://lists.w3.org/>

html-tidy, www-validator, www-validator-css, www-html and www-style are
probably the ones I would monitor in your position. Reviewing the source
for tidy wouldn't be a bad idea either.

It might be worth lurking here and next door in ciwas as well to get a
handle on where the most common problems are.

Does anyone know of any surveys of actual HTML that's in use, maybe in the
form of a report of frequency of occurrence of tags, whether standardized
or not? (If not, I'm thinking of conducting such a survey using a custom-
designed web crrawler.)


I haven't heard of anything like that. You'll want to make the distinction
between tags and elements if you are writing a UA though.

--
Jim Dabell

Jul 20 '05 #2
On Wed, Aug 13, Marc Rochkind inscribed on the eternal scroll:
Does anyone know of documentation about what one might call "reality HTML,"
versus "standard HTML" or "recommended HTML?"


I think you'd find http://www.blooberry.com/indexdot/index.html
to be a useful resource.

Jul 20 '05 #3
On 14 Aug 2003 09:09:23 -0700, Brian Wilson <us****@blooberry.com> wrote:

[snip]

Actually, I do mention it under the page for the IMG element. 8-}


[snip]

Indeed, I see it now....

Brian, yours is a sensational site! Last night I found myself just reading
pages at random, fascinated by all the detail. (And, if anyone ever tries
to tell me that the best sites are the best looking sites, I can use yours
to prove them wrong. ;-) )

I wonder if HTML wins some sort of award for the most precisely specified
yet most sloppily practiced standard? (In other areas, such as hardware
standards, OS standards, computer language standards, and the like, a non-
standard usage simply won't work. HTML is unusual in that the software --
mostly browsers -- is so flexible. Can you think of any other similar
situations?)

Naturally, the conspiracy-theorists among us can point out that it's in
Microsoft's interests to have web pages as difficult to process as
possible, so as to raise the cost of developing a browser. I learned from
my days working for a VC that the higher the development costs, the more
effective the barrier to entry.

What I don't understand is why some top sites, such as Yahoo, Google News,
CNN, etc., are so poorly coded. Yahoo is such as mess it's laughable. Where
does their HTML come from? It's obviously coded by hand, and by a weak hand
at that.

--Marc
Jul 20 '05 #4
On Thu, 14 Aug 2003 23:35:26 +0100, Nick Kew <ni**@fenris.webthing.com>
wrote:

[snip]
What I don't understand is why some top sites, such as Yahoo, Google
News, CNN, etc., are so poorly coded. Yahoo is such as mess it's
laughable. Where


Yahoo made its name early - before HTML standardisation - and got
themselves the strongest name amongst journos who had never actually
used the web - and hence the general public in the mid-90s. Now they
live
on their name. [snip]


But my point was that, assuming Yahoo wants the widest possible readership,
why wouldn't they code their HTML in the most conforming possible way,
instead of using non-standard and invalid constructs?

It's a mystery to me... I find it hard to believe that with all their
resources they don't know better.

--Marc
Jul 20 '05 #5
In article <op**************@den.news.speakeasy.net>, Marc Rochkind wrote:
Does anyone know of any surveys of actual HTML that's in use, maybe in the
form of a report of frequency of occurrence of tags, whether standardized
or not? (If not, I'm thinking of conducting such a survey using a custom-
designed web crrawler.)


<URL:http://www.ub.uib.no/elpub/2001/h/413001/>

--
Chris Hoess
Jul 20 '05 #6
On Fri, 15 Aug 2003 23:07:30 +0000 (UTC), Chris Hoess
<ch****@stwing.upenn.edu> wrote:
In article <op**************@den.news.speakeasy.net>, Marc Rochkind
wrote:
Does anyone know of any surveys of actual HTML that's in use, maybe in
the form of a report of frequency of occurrence of tags, whether
standardized or not? (If not, I'm thinking of conducting such a survey
using a custom-designed web crrawler.)


<URL:http://www.ub.uib.no/elpub/2001/h/413001/>

Thanks! Exactly what I was looking for.

--Marc
Jul 20 '05 #7
In article <Vm********************@giganews.com>,
Jim Dabell <ji********@jimdabell.com> wrote:
Reviewing the sourcefor tidy wouldn't be a bad idea either.


Also, reading the source of Mozilla's HTML parser might help, although
the code is neither pretty nor easy to follow. (Safari's HTML parser
doesn't handle as much quirkiness as Mozilla's but might be easier to
read.)

Then there's Tag Soup, a tag soup parser written in Java which from the
application point of view appears to be a SAX parser that is parsing
XHTML. http://mercury.ccil.org/~cowan/XML/tagsoup/

If I were to write a non-browser program that had to deal with
real-world HTML, I'd probably use Tag Soup.

--
Henri Sivonen
hs******@iki.fi
http://www.iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jul 20 '05 #8
In article <dq********************************@4ax.com>,
Tim <ad***@sheerhell.lan> wrote:
On Sat, 16 Aug 2003 12:38:36 +0300,
Henri Sivonen <hs******@iki.fi> wrote:
If I were to write a non-browser program that had to deal with
real-world HTML, I'd probably use Tag Soup.


It looks like quite a few do that.

e.g. Rather than properly parse a list in a document, they indent text
after a UL or OL tag, they bullet text after a LI tag, together
producing the common indented bulleted list, but separately still doing
something.


I meant I'd use the parser called "Tag Soup" which would allow me to
write my app code as if I was dealing with a SAX parser parsing XHTML.

--
Henri Sivonen
hs******@iki.fi
http://www.iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jul 20 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Ed Lai | last post by:
A few weeks ago I have the idea of converting XML instance data to a HTML form, using tags as the label and the resulting form can be used to edit the XML data. So I started to play around with it,...
27
by: Fuli Chang | last post by:
I don't like other people see my html code. Is there a way to hide it? Thanks.
81
by: sinister | last post by:
I wanted to spiff up my overly spartan homepage, and started using some CSS templates I found on a couple of weblogs. It looks fine in my browser (IE 6.0), but it doesn't print right. I tested...
23
by: Charles Law | last post by:
Does anyone have a regex pattern to parse HTML from a stream? I have a well structured file, where each line is of the form <sometag someattribute='attr'>text</sometag> for example <SPAN...
20
by: msa | last post by:
Hi there, First off, let me say that I know that launching to full screen is a bad idea. I would never do it given the choice, but I must follow orders from my boss, the boss that desparately...
9
by: Patient Guy | last post by:
Taking the BODY element as an example, all of its style attributes ('alink', 'vlink', 'background', 'text', etc.) are deprecated in HTML 4.01, a fact noted in the DOM Level 2 HTML specification. ...
82
by: Eric Lindsay | last post by:
I have been trying to get a better understanding of simple HTML, but I am finding conflicting information is very common. Not only that, even in what seemed elementary and without any possibility...
59
by: Lennart Björk | last post by:
Hi All, I have a tiny program: <!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <title>MyTitle</title> <meta...
0
by: Timothy Grant | last post by:
On Mon, Aug 11, 2008 at 10:05 AM, <anartz@anartz.cjb.netwrote: It looks to me like you are opening the url, but never retrieving the content of the url. I think you may have better luck...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.