-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Nick Kew schrieb am Thu, 25 Nov 2004 09:22:30 +0000:
I am currently reviewing some HTML parsing software.
Does it claim to follow HTML (SGML) rules, XHTML (XML) rules, or tag-soup
(whatever takes the author's fancy) rules?
It states: "supports HTML".
The software in question uses a very plain parser that only extracts
the plain text enclosed by CODE tags and then starts the real
processing.
- From what I can see: The Soup roules! (not only tag-soup but also entity-soup).
# Netscape: <!-- --> comments nest
Comments nest? Interesting thought. It could almost be a
misinterpretation for doing the right thing - though that seems unlikely.
I've never read that comments could be nested inside of comments.
Have I missed something while reading the HTML & XHTML docs?
Does anyone know of post 1998 HTML documents that use the IE or
Netscape "features"?
XML-style comments are valid both as HTML and XHTML as well as
broken-parser-safe, and seem to be the norm. The only serious
brokenness often seen in the wild is use of -- within what the
author intends to be a comment.
Glad to hear that, now I can remove/cleanup a lot of the parsing code.
Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.9.9 (GNU/Linux)
iD8DBQFBpcS93w+/yD4P9tIRAjtUAJ4/xzgZGBhUTJzS0l7IgnI/ZAi1rACglE5v
Vwz/mhRNJ/WqumkUo7gpEd0=
=rAbX
-----END PGP SIGNATURE-----