473,240 Members | 1,622 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,240 software developers and data experts.

XmlTextWriter Encodes HTML Entities?

Can anybody make sense of this crazy and inconsistent results?

// IE7 Feed Reading View disabled displays this raw XML
<?xml version="1.0" encoding="utf-8" ?>
<!-- AT&T HTML entities & XML <elementsare displayed -->
<rss version="2.0">
<channel>
<title>AT&T HTML entities & XML <elementsare displayed</title>
....
<description>
<![CDATA[ AT&T HTML entities & XML <elementsusing CDATA ]]>
</description>
....

The XML comment data comes directly from the TextBox on the Form
as text. The XmlTextWriter writer.WriteElementString("title", title)
generates
the <titleelement and writer.WriteCData(description) generates the
<descriptionelement.

// Drag the testRSS.xml file into NotePad displays
<?xml version="1.0" encoding="utf-8"?>
<!--AT&T HTML entities & XML <elementsare displayed-->
<rss version="2.0">
<channel>
<title>AT&amp;T HTML entities &amp; XML &lt;elements&gt; are
displayed</title>
<description><![CDATA[AT&T HTML entities & XML <elementsusing
CDATA]]></description>

// Enable IE7 Feed Reading View and observe that IE7
// either violates XML by encoding HTML entities and XML elements
// or encodes unencoded XML data for display of RSS
AT&T HTML entities & XML <elementsare displayed
Its bad enough IE7 is likely still a sloppy parser and will violate XML
validity rules
by encoding unencoded feed data which really makes life all FUBAR for an
application developer but worse yet what is encoding the HTML entities and
the
XML element in the <titleelement when the testRSS.xml file is dragged into
NotePad?

Does the XmlTextWriter encode HTML and XML? How does the data in the
<titleelement in the file end up encoded?

<%= Clinton Gallagher
NET csgallagher AT metromilwaukee.com
URL http://clintongallagher.metromilwaukee.com/

May 28 '07 #1
6 10094
clintonG wrote:
Does the XmlTextWriter encode HTML and XML? How does the data in the
<titleelement in the file end up encoded?
With XmlWriter respectively XmlTextWriter you can ensure that your XML
markup is well-formed as methods like WriteElementString make sure that
'&' is escaped as &amp; and '<' is escaped as '&lt;' so for example
xmlWriter.WriteElementString("title",
"AT & T, <element>content</element>");
yields
<title>AT &amp; T,&lt;element&gt;content&lt;/element&gt;</title>

That has nothing to do with HTML or HTML entities, rather XML defines
entities like amp or gt or lt itself.

If you wanted that 'title element to have a child 'element' then you
need to use
xmlWriter.WriteStartElement("title");
xmlWriter.WriteString("AT & T");
xmlWriter.WriteElementString("element", "content");
xmlWriter.WriteEndElement();
which yields

<title>AT &amp; T<element>content</element></title>
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
May 29 '07 #2
Thanks for confirming that the XmlTextWriter methods escapes and encodes
specific text characters as HTML character entities. The HTML character
entity naming conventions you attempt to clarify are defined by W3C (24.4.1
The list of characters, Special characters for HTML [1]). My question should
have asked if the method escape and encode "text characters" as HTML
entities. Nitpicker ;-)

Anyhow I didn't observe MSDN documentation make note of this inherent
feature of the class as the escaping and encoded features are not explicitly
documented in any page I have yet to read. There is a pthy comment within
the narrative of the "Writing XML with the XmlWriter" document [2] but the
narrative is poorly written and easily misunderstood.

<%= Clinton Gallagher

[1] http://www.w3.org/TR/html401/sgml/entities.html
[2] http://msdn2.microsoft.com/en-us/lib...hb(VS.80).aspx
"Martin Honnen" <ma*******@yahoo.dewrote in message
news:%2******************@TK2MSFTNGP04.phx.gbl...
clintonG wrote:
>Does the XmlTextWriter encode HTML and XML? How does the data in the
<titleelement in the file end up encoded?

With XmlWriter respectively XmlTextWriter you can ensure that your XML
markup is well-formed as methods like WriteElementString make sure that
'&' is escaped as &amp; and '<' is escaped as '&lt;' so for example
xmlWriter.WriteElementString("title",
"AT & T, <element>content</element>");
yields
<title>AT &amp; T,&lt;element&gt;content&lt;/element&gt;</title>

That has nothing to do with HTML or HTML entities, rather XML defines
entities like amp or gt or lt itself.

If you wanted that 'title element to have a child 'element' then you need
to use
xmlWriter.WriteStartElement("title");
xmlWriter.WriteString("AT & T");
xmlWriter.WriteElementString("element", "content");
xmlWriter.WriteEndElement();
which yields

<title>AT &amp; T<element>content</element></title>
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

May 29 '07 #3
clintonG wrote:
Thanks for confirming that the XmlTextWriter methods escapes and encodes
specific text characters as HTML character entities. The HTML character
entity naming conventions you attempt to clarify are defined by W3C (24.4.1
The list of characters, Special characters for HTML [1]). My question should
have asked if the method escape and encode "text characters" as HTML
entities. Nitpicker ;-)
XML defines its own entities and what XmlWriter does is based on the XML
specification and _not_ on the HTML specification.
See <http://www.w3.org/TR/REC-xml/#sec-predefined-ent>.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
May 29 '07 #4
I kept following links and finally found the arcane documentation: an
XmlWriterSettings.CheckCharacters Property [1]. So it seems to me ASP.NET
developers don't have to fool around with Regular Expressions to validate
and replace text characters that would be illegal when the document is saved
as XML, i.e. RSS feeds for example.

I understand what W3C documents say but XML and HTML derive from SGML and
there are some semantic ambiguities in this context in the W3C documents.
Most of us and most documentation including W3C documentation define &amp;
as an HTML character entity. When we get to the W3C page(s) for XML they
drop the verbiage "HTML" when describing character entities.

As I'm sure you'll have to agree reading the EBNF, the DTDs indicate we're
talking about the same thing using context specific nomenclature.
So we really don't need to quibble about semantics. All I want to do is
write code that will generate valid XML RSS feeds that will be parsed by the
greatest number of aggregators which in itself requires a personal
relationship with all the blessings of Heaven because everybody has been so
FUBAR in their respective implementations.

<%= Clinton Gallagher

[1]
http://msdn2.microsoft.com/en-us/lib...rs(VS.80).aspx
"Martin Honnen" <ma*******@yahoo.dewrote in message
news:eq****************@TK2MSFTNGP04.phx.gbl...
clintonG wrote:
>Thanks for confirming that the XmlTextWriter methods escapes and encodes
specific text characters as HTML character entities. The HTML character
entity naming conventions you attempt to clarify are defined by W3C
(24.4.1 The list of characters, Special characters for HTML [1]). My
question should have asked if the method escape and encode "text
characters" as HTML entities. Nitpicker ;-)

XML defines its own entities and what XmlWriter does is based on the XML
specification and _not_ on the HTML specification.
See <http://www.w3.org/TR/REC-xml/#sec-predefined-ent>.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

May 29 '07 #5
* clintonG wrote in microsoft.public.dotnet.xml:
>I understand what W3C documents say but XML and HTML derive from SGML and
there are some semantic ambiguities in this context in the W3C documents.
Most of us and most documentation including W3C documentation define &amp;
as an HTML character entity. When we get to the W3C page(s) for XML they
drop the verbiage "HTML" when describing character entities.
It would be very confusing otherwise. As an example, &apos; is valid in
XML but not part of HTML, while &ouml; is part of HTML but not of XML;
so if you speak about the pre-defined entities in XML you refer to five,
if you speak about those in HTML you refer to hundreds of them.
--
Björn Höhrmann · mailto:bj****@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
May 29 '07 #6

"Bjoern Hoehrmann" <bj****@hoehrmann.dewrote in message
news:4t********************************@hive.bjoer n.hoehrmann.de...
>* clintonG wrote in microsoft.public.dotnet.xml:
>>I understand what W3C documents say but XML and HTML derive from SGML and
there are some semantic ambiguities in this context in the W3C documents.
Most of us and most documentation including W3C documentation define &amp;
as an HTML character entity. When we get to the W3C page(s) for XML they
drop the verbiage "HTML" when describing character entities.

It would be very confusing otherwise. As an example, &apos; is valid in
XML but not part of HTML, while &ouml; is part of HTML but not of XML;
so if you speak about the pre-defined entities in XML you refer to five,
if you speak about those in HTML you refer to hundreds of them.
--
Björn Höhrmann · mailto:bj****@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Nobody argues that point Björn except to say the correct use of the English
language used in a formal document requires the use of "narrative" and
"expository" use of the grammar which we native speakers of English are
taught in grade school.

I value consistency in technical documentation which is considered a formal
use of the language. Consistency should not be compromised for the sake of
brevity which in this context results in the obfuscation of terminology. I
mean what are we talking about being needed here? A single paragraph of
narrative supported by a single expository table of five rows to resolve an
apparent contradiction which is not a contradiction at all?

Sometimes the people on the W3C working groups do not always make the best
decisions and are not neccessarily known for their mastery of the English
language which is said to be the most difficult language to master. That
said, over the years having observed how software developers will quibble
with one another for weeks or perhaps months about a single term and its
meaning I'm genuinely surprised this discrepancy has become over-looked.

<%= Clinton
Jun 1 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: micha | last post by:
my php script gets delivered text that contains special chars (like german umlauts), and these chars may, may partially or may not be coverted into html entities already. i don't know beforhand. ...
7
by: Robert Oschler | last post by:
Is there a module/function to remove all the HTML entities from an HTML document (e.g. - &nbsp, &amp, &apos, etc.)? If not I'll just write one myself but I figured I'd save myself some time. ...
4
by: Geoff Wilkins | last post by:
I must confess I only come here when I have a problem - so my apologies if this has been raised before: Using my IE v.6 browser, document.write doesn't convert HTML entities (e.g. &apos;, &amp;) to...
2
by: Beat Richli | last post by:
Hello i have following problem with ASP (using Interdev, Win2003 Server): if a special character is entered in a textbox, ASP or the Client Browser (IE 6) seems to convert this character in HTML...
0
by: David W. Fenton | last post by:
Well, today I needed to process some data for upload to a web page and it needed higher ASCII characters encoded as HTML entities. So, I wrote a function to do the job, which works with a table...
2
by: Joergen Bech | last post by:
Is there a function in the .Net 1.1 framework that will take, say, a string containing Scandinavian characters and output the corret HTML entities, such as &aelig; &oslash; &aring; etc.
9
by: darrel | last post by:
I'm trying to get ASP.net to write out some XML including HTML from a DB: The HTML is stored in the DB as encoded HTML. I'm trying to decode it and write it to an XML node (The HTML is valid...
8
by: Steven D'Aprano | last post by:
I have a string containing Latin-1 characters: s = u"© and many more..." I want to convert it to HTML entities: result => "&copy; and many more..." Decimal/hex escapes would be...
6
by: laredotornado | last post by:
Hi, Is there a Javascript way of taking a string of text and encoding it such that its HTML entities are represented? For example, "<" would be represented as "&lt;"? Thanks, - Dave
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 3 Jan 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). For other local times, please check World Time Buddy In...
0
by: jianzs | last post by:
Introduction Cloud-native applications are conventionally identified as those designed and nurtured on cloud infrastructure. Such applications, rooted in cloud technologies, skillfully benefit from...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.