473,396 Members | 1,966 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

legacy comments


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,

I am currently reviewing some HTML parsing software.

One of the source code comments reads:
# Scan to end of comment.
# Comments are defined any of a number of ways.
# IE 5.0: <!-- followed by >
# "HTML The Definitive Guide": <!-- text with at least one space in it -->
# Netscape: <!-- --> comments nest
# w3c: whitespace can appear between -- and > of comment close

Does anyone know of post 1998 HTML documents that use the IE or
Netscape "features"?
Thanks for any hints and comments.

Thomas

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.9.9 (GNU/Linux)

iD8DBQFBpXeS3w+/yD4P9tIRAjW9AKDPiBf/lQ5N6w6ac+ok9Q2a29SzagCeNPgE
1DG2XNq7bSYI/omcUrC6tkA=
=GSzX
-----END PGP SIGNATURE-----
Jul 23 '05 #1
5 2097
In article <jl************@kuehne.cn>,
Thomas Kuehne <st*************@example.com> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Isn't that singularly pointless?
I am currently reviewing some HTML parsing software.
Does it claim to follow HTML (SGML) rules, XHTML (XML) rules, or tag-soup
(whatever takes the author's fancy) rules?
# Scan to end of comment.
# Comments are defined any of a number of ways.
# IE 5.0: <!-- followed by >
That bears no relation to any form of HTML.
# "HTML The Definitive Guide": <!-- text with at least one space in it -->
Why the space? The start and end are right for XML.
# Netscape: <!-- --> comments nest
Comments nest? Interesting thought. It could almost be a
misinterpretation for doing the right thing - though that seems unlikely.
# w3c: whitespace can appear between -- and > of comment close
Indeed, under SGML rules it can, but there's more to it than that.
Seems like the author of that software hasn't grasped SGML comments.
Does anyone know of post 1998 HTML documents that use the IE or
Netscape "features"?


XML-style comments are valid both as HTML and XHTML as well as
broken-parser-safe, and seem to be the norm. The only serious
brokenness often seen in the wild is use of -- within what the
author intends to be a comment.

--
Nick Kew

Nick's manifesto: http://www.htmlhelp.com/~nick/
Jul 23 '05 #2

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Nick Kew schrieb am Thu, 25 Nov 2004 09:22:30 +0000:
I am currently reviewing some HTML parsing software.


Does it claim to follow HTML (SGML) rules, XHTML (XML) rules, or tag-soup
(whatever takes the author's fancy) rules?


It states: "supports HTML".
The software in question uses a very plain parser that only extracts
the plain text enclosed by CODE tags and then starts the real
processing.

- From what I can see: The Soup roules! (not only tag-soup but also entity-soup).
# Netscape: <!-- --> comments nest


Comments nest? Interesting thought. It could almost be a
misinterpretation for doing the right thing - though that seems unlikely.


I've never read that comments could be nested inside of comments.
Have I missed something while reading the HTML & XHTML docs?
Does anyone know of post 1998 HTML documents that use the IE or
Netscape "features"?


XML-style comments are valid both as HTML and XHTML as well as
broken-parser-safe, and seem to be the norm. The only serious
brokenness often seen in the wild is use of -- within what the
author intends to be a comment.


Glad to hear that, now I can remove/cleanup a lot of the parsing code.

Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.9.9 (GNU/Linux)

iD8DBQFBpcS93w+/yD4P9tIRAjtUAJ4/xzgZGBhUTJzS0l7IgnI/ZAi1rACglE5v
Vwz/mhRNJ/WqumkUo7gpEd0=
=rAbX
-----END PGP SIGNATURE-----
Jul 23 '05 #3
On Thu, 25 Nov 2004 07:11:31 +0100, Thomas Kuehne
<st*************@example.com> wrote:
[...]
# Comments are defined any of a number of ways.
No; "comments" has a very specific definition.
# IE 5.0: <!-- followed by >
Bullshit!
# "HTML The Definitive Guide": <!-- text with at least one space in it -->
Ambiguous. It may come out right but it is not as per definition.
# Netscape: <!-- --> comments nest
Bullshit!
# w3c: whitespace can appear between -- and > of comment close
Yes, that's correct.
Does anyone know of post 1998 HTML documents that use the IE or
Netscape "features"?
No, we try our best to forget about those.

An SGML (and XML) comment, is a special case of a "MARKUP DECLARATION"
that can be inserted in markup as follows...

<! = MDO = Markup Declaration Open

MDO must be directly followed by a NAME-START character,
or by a COM, where COM is defined as '--' i.e. two ASCII dashes.

A conforming processor that has found a correct MDO+COM in its data
stream shall treat anything that follows as "disregardable data", i.e
its a comment, up to the next occurrence of a COM.

Between balanced COM's there can be an arbitrary number of any
characters.

Between the last balancing COM there can be an arbitrary number of
declared white space characters until the final MDC "Markup Declaration
Close" is found in the data stream. MDC = '>'

And that would be where the "comment" ends.

Syntax description...

<! = MDO = Markup Declaration Open
-- = COM = Comment start or end = MDC = Markup Declaration Close
Example...

<!-- this text is a good comment --
-- and so is this text too --
but this text is outside of a comment area
-- once again a good comment text --
Note the white space between that last COM and the MDC.

Well now, what about this...

<!--- Is this a good comment? -->

Yes it is, the content of the comment is...

- Is this a good comment?

(note that the third dash becomes content of the commentary text)

Further; is this a good comment?

<!---- Is this a good comment? -->

Nope, it's not since now the parser will find...

<! = MDO
-- = COM
-- = COM
Is this a good comment? -->

.... where the text is outside of comment area.

Another example...

<!-- Is this a good comment? --->

No it's not since it leaves a "hanging dash" to be handled by the parser
as in this parsing example...

<!
--
Is this a good comment?
--
-


That last "hanging" dash will give a parse error since it's not defined
to be a member of the NAMESTART or NAME character groups.

How is it that such a simple thing could become one of the most misused
things on the www. I mean; MS has "innovated" a non intended use of SGML
comments... (there should be a law... :-)

--
Rex
Jul 23 '05 #4
Thomas Kuehne wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Are you aware that each message starts with the above and ends with...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.9.9 (GNU/Linux)

iD8DBQFBpcS93w+/yD4P9tIRAjtUAJ4/xzgZGBhUTJzS0l7IgnI/ZAi1rACglE5v
Vwz/mhRNJ/WqumkUo7gpEd0=
=rAbX
-----END PGP SIGNATURE-----


Rather dumb, wouldn't you say? I can't find your newsreader in your
headers, but there must be a way to fix it.
Jul 23 '05 #5
Nick Kew wrote:
In article <jl************@kuehne.cn>,
Thomas Kuehne <st*************@example.com> writes:
# Scan to end of comment.
# Comments are defined any of a number of ways.
# IE 5.0: <!-- followed by >


That bears no relation to any form of HTML.


Not standard HTML, though might be WinIE conditional comments which do
follow that general syntax.

--
Reply email address is a bottomless spam bucket.
Please reply to the group so everyone can share.
Jul 23 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: JellBell | last post by:
I dont know what is a legacy system..please help me out Posted Via Usenet.com Premium Usenet Newsgroup Services ---------------------------------------------------------- ** SPEED ** RETENTION...
2
by: Rajiv Kumar | last post by:
Hi, I am a Java prof for last 5 years and have delivered many enterprise application using J2EE. Now my company is making headway in the .NET area of the market, and I am trying to sort out few...
0
by: inetmug | last post by:
Hello: I am using ASP.NET as our front end. I also have to interface to some legacy systems that use a callback mechanism. The legacy systems use callbacks (via CORBA) to communicate back to...
3
by: masood.iqbal | last post by:
In this day and age, you never say no to any work that is thrown at you ---- so when I was offered this short-term contract to convert legacy C code to C++, I did not say no. Personally I believed...
3
by: Sai Kit Tong | last post by:
I posted for help on legacy code interface 2 days ago. Probably I didn't make it clear in my original mail. I got a couple of answers but none of them address my issues directly (See attached...
2
by: Mark Olbert | last post by:
First off, the sympathy is for all you poor buggers out there who have to figure out how to marry Managed Extensions for C++ onto your legacy code. My condolences; my brief experience with the...
9
by: Roy Chastain | last post by:
I have a legacy structure that appears on a communications line that has 16 bit, 8 bit, 2 bit and 1 bit fields in it. The best I have been able to do with this was to use FieldOffset for the 8 and...
4
by: Jason Madison | last post by:
I would like to create a .net application that still uses a few screens from an old legacy application we have. I can list records from the database in my .net app, but when it comes to making...
3
by: ishwarbg | last post by:
Hi Everyone, I have a .Net Application, through which I am invoking a function from a legacy DLL developed in C++. My structure in C# contains some data of type double which I need to pass to to...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.