By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,874 Members | 1,058 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,874 IT Pros & Developers. It's quick & easy.

XmlTextWriter Encodes HTML Entities?

P: n/a
Can anybody make sense of this crazy and inconsistent results?

// IE7 Feed Reading View disabled displays this raw XML
<?xml version="1.0" encoding="utf-8" ?>
<!-- AT&T HTML entities & XML <elementsare displayed -->
<rss version="2.0">
<channel>
<title>AT&T HTML entities & XML <elementsare displayed</title>
....
<description>
<![CDATA[ AT&T HTML entities & XML <elementsusing CDATA ]]>
</description>
....

The XML comment data comes directly from the TextBox on the Form
as text. The XmlTextWriter writer.WriteElementString("title", title)
generates
the <titleelement and writer.WriteCData(description) generates the
<descriptionelement.

// Drag the testRSS.xml file into NotePad displays
<?xml version="1.0" encoding="utf-8"?>
<!--AT&T HTML entities & XML <elementsare displayed-->
<rss version="2.0">
<channel>
<title>AT&amp;T HTML entities &amp; XML &lt;elements&gt; are
displayed</title>
<description><![CDATA[AT&T HTML entities & XML <elementsusing
CDATA]]></description>

// Enable IE7 Feed Reading View and observe that IE7
// either violates XML by encoding HTML entities and XML elements
// or encodes unencoded XML data for display of RSS
AT&T HTML entities & XML <elementsare displayed
Its bad enough IE7 is likely still a sloppy parser and will violate XML
validity rules
by encoding unencoded feed data which really makes life all FUBAR for an
application developer but worse yet what is encoding the HTML entities and
the
XML element in the <titleelement when the testRSS.xml file is dragged into
NotePad?

Does the XmlTextWriter encode HTML and XML? How does the data in the
<titleelement in the file end up encoded?

<%= Clinton Gallagher
NET csgallagher AT metromilwaukee.com
URL http://clintongallagher.metromilwaukee.com/

May 28 '07 #1
Share this Question
Share on Google+
6 Replies


P: n/a
clintonG wrote:
Does the XmlTextWriter encode HTML and XML? How does the data in the
<titleelement in the file end up encoded?
With XmlWriter respectively XmlTextWriter you can ensure that your XML
markup is well-formed as methods like WriteElementString make sure that
'&' is escaped as &amp; and '<' is escaped as '&lt;' so for example
xmlWriter.WriteElementString("title",
"AT & T, <element>content</element>");
yields
<title>AT &amp; T,&lt;element&gt;content&lt;/element&gt;</title>

That has nothing to do with HTML or HTML entities, rather XML defines
entities like amp or gt or lt itself.

If you wanted that 'title element to have a child 'element' then you
need to use
xmlWriter.WriteStartElement("title");
xmlWriter.WriteString("AT & T");
xmlWriter.WriteElementString("element", "content");
xmlWriter.WriteEndElement();
which yields

<title>AT &amp; T<element>content</element></title>
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
May 29 '07 #2

P: n/a
Thanks for confirming that the XmlTextWriter methods escapes and encodes
specific text characters as HTML character entities. The HTML character
entity naming conventions you attempt to clarify are defined by W3C (24.4.1
The list of characters, Special characters for HTML [1]). My question should
have asked if the method escape and encode "text characters" as HTML
entities. Nitpicker ;-)

Anyhow I didn't observe MSDN documentation make note of this inherent
feature of the class as the escaping and encoded features are not explicitly
documented in any page I have yet to read. There is a pthy comment within
the narrative of the "Writing XML with the XmlWriter" document [2] but the
narrative is poorly written and easily misunderstood.

<%= Clinton Gallagher

[1] http://www.w3.org/TR/html401/sgml/entities.html
[2] http://msdn2.microsoft.com/en-us/lib...hb(VS.80).aspx
"Martin Honnen" <ma*******@yahoo.dewrote in message
news:%2******************@TK2MSFTNGP04.phx.gbl...
clintonG wrote:
>Does the XmlTextWriter encode HTML and XML? How does the data in the
<titleelement in the file end up encoded?

With XmlWriter respectively XmlTextWriter you can ensure that your XML
markup is well-formed as methods like WriteElementString make sure that
'&' is escaped as &amp; and '<' is escaped as '&lt;' so for example
xmlWriter.WriteElementString("title",
"AT & T, <element>content</element>");
yields
<title>AT &amp; T,&lt;element&gt;content&lt;/element&gt;</title>

That has nothing to do with HTML or HTML entities, rather XML defines
entities like amp or gt or lt itself.

If you wanted that 'title element to have a child 'element' then you need
to use
xmlWriter.WriteStartElement("title");
xmlWriter.WriteString("AT & T");
xmlWriter.WriteElementString("element", "content");
xmlWriter.WriteEndElement();
which yields

<title>AT &amp; T<element>content</element></title>
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

May 29 '07 #3

P: n/a
clintonG wrote:
Thanks for confirming that the XmlTextWriter methods escapes and encodes
specific text characters as HTML character entities. The HTML character
entity naming conventions you attempt to clarify are defined by W3C (24.4.1
The list of characters, Special characters for HTML [1]). My question should
have asked if the method escape and encode "text characters" as HTML
entities. Nitpicker ;-)
XML defines its own entities and what XmlWriter does is based on the XML
specification and _not_ on the HTML specification.
See <http://www.w3.org/TR/REC-xml/#sec-predefined-ent>.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
May 29 '07 #4

P: n/a
I kept following links and finally found the arcane documentation: an
XmlWriterSettings.CheckCharacters Property [1]. So it seems to me ASP.NET
developers don't have to fool around with Regular Expressions to validate
and replace text characters that would be illegal when the document is saved
as XML, i.e. RSS feeds for example.

I understand what W3C documents say but XML and HTML derive from SGML and
there are some semantic ambiguities in this context in the W3C documents.
Most of us and most documentation including W3C documentation define &amp;
as an HTML character entity. When we get to the W3C page(s) for XML they
drop the verbiage "HTML" when describing character entities.

As I'm sure you'll have to agree reading the EBNF, the DTDs indicate we're
talking about the same thing using context specific nomenclature.
So we really don't need to quibble about semantics. All I want to do is
write code that will generate valid XML RSS feeds that will be parsed by the
greatest number of aggregators which in itself requires a personal
relationship with all the blessings of Heaven because everybody has been so
FUBAR in their respective implementations.

<%= Clinton Gallagher

[1]
http://msdn2.microsoft.com/en-us/lib...rs(VS.80).aspx
"Martin Honnen" <ma*******@yahoo.dewrote in message
news:eq****************@TK2MSFTNGP04.phx.gbl...
clintonG wrote:
>Thanks for confirming that the XmlTextWriter methods escapes and encodes
specific text characters as HTML character entities. The HTML character
entity naming conventions you attempt to clarify are defined by W3C
(24.4.1 The list of characters, Special characters for HTML [1]). My
question should have asked if the method escape and encode "text
characters" as HTML entities. Nitpicker ;-)

XML defines its own entities and what XmlWriter does is based on the XML
specification and _not_ on the HTML specification.
See <http://www.w3.org/TR/REC-xml/#sec-predefined-ent>.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

May 29 '07 #5

P: n/a
* clintonG wrote in microsoft.public.dotnet.xml:
>I understand what W3C documents say but XML and HTML derive from SGML and
there are some semantic ambiguities in this context in the W3C documents.
Most of us and most documentation including W3C documentation define &amp;
as an HTML character entity. When we get to the W3C page(s) for XML they
drop the verbiage "HTML" when describing character entities.
It would be very confusing otherwise. As an example, &apos; is valid in
XML but not part of HTML, while &ouml; is part of HTML but not of XML;
so if you speak about the pre-defined entities in XML you refer to five,
if you speak about those in HTML you refer to hundreds of them.
--
Björn Höhrmann · mailto:bj****@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
May 29 '07 #6

P: n/a

"Bjoern Hoehrmann" <bj****@hoehrmann.dewrote in message
news:4t********************************@hive.bjoer n.hoehrmann.de...
>* clintonG wrote in microsoft.public.dotnet.xml:
>>I understand what W3C documents say but XML and HTML derive from SGML and
there are some semantic ambiguities in this context in the W3C documents.
Most of us and most documentation including W3C documentation define &amp;
as an HTML character entity. When we get to the W3C page(s) for XML they
drop the verbiage "HTML" when describing character entities.

It would be very confusing otherwise. As an example, &apos; is valid in
XML but not part of HTML, while &ouml; is part of HTML but not of XML;
so if you speak about the pre-defined entities in XML you refer to five,
if you speak about those in HTML you refer to hundreds of them.
--
Björn Höhrmann · mailto:bj****@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Nobody argues that point Björn except to say the correct use of the English
language used in a formal document requires the use of "narrative" and
"expository" use of the grammar which we native speakers of English are
taught in grade school.

I value consistency in technical documentation which is considered a formal
use of the language. Consistency should not be compromised for the sake of
brevity which in this context results in the obfuscation of terminology. I
mean what are we talking about being needed here? A single paragraph of
narrative supported by a single expository table of five rows to resolve an
apparent contradiction which is not a contradiction at all?

Sometimes the people on the W3C working groups do not always make the best
decisions and are not neccessarily known for their mastery of the English
language which is said to be the most difficult language to master. That
said, over the years having observed how software developers will quibble
with one another for weeks or perhaps months about a single term and its
meaning I'm genuinely surprised this discrepancy has become over-looked.

<%= Clinton
Jun 1 '07 #7

This discussion thread is closed

Replies have been disabled for this discussion.