473,395 Members | 1,496 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

XmlTextWriter Encodes HTML Entities?

Can anybody make sense of this crazy and inconsistent results?

// IE7 Feed Reading View disabled displays this raw XML
<?xml version="1.0" encoding="utf-8" ?>
<!-- AT&T HTML entities & XML <elementsare displayed -->
<rss version="2.0">
<channel>
<title>AT&T HTML entities & XML <elementsare displayed</title>
....
<description>
<![CDATA[ AT&T HTML entities & XML <elementsusing CDATA ]]>
</description>
....

The XML comment data comes directly from the TextBox on the Form
as text. The XmlTextWriter writer.WriteElementString("title", title)
generates
the <titleelement and writer.WriteCData(description) generates the
<descriptionelement.

// Drag the testRSS.xml file into NotePad displays
<?xml version="1.0" encoding="utf-8"?>
<!--AT&T HTML entities & XML <elementsare displayed-->
<rss version="2.0">
<channel>
<title>AT&amp;T HTML entities &amp; XML &lt;elements&gt; are
displayed</title>
<description><![CDATA[AT&T HTML entities & XML <elementsusing
CDATA]]></description>

// Enable IE7 Feed Reading View and observe that IE7
// either violates XML by encoding HTML entities and XML elements
// or encodes unencoded XML data for display of RSS
AT&T HTML entities & XML <elementsare displayed
Its bad enough IE7 is likely still a sloppy parser and will violate XML
validity rules
by encoding unencoded feed data which really makes life all FUBAR for an
application developer but worse yet what is encoding the HTML entities and
the
XML element in the <titleelement when the testRSS.xml file is dragged into
NotePad?

Does the XmlTextWriter encode HTML and XML? How does the data in the
<titleelement in the file end up encoded?

<%= Clinton Gallagher
NET csgallagher AT metromilwaukee.com
URL http://clintongallagher.metromilwaukee.com/

May 28 '07 #1
6 10108
clintonG wrote:
Does the XmlTextWriter encode HTML and XML? How does the data in the
<titleelement in the file end up encoded?
With XmlWriter respectively XmlTextWriter you can ensure that your XML
markup is well-formed as methods like WriteElementString make sure that
'&' is escaped as &amp; and '<' is escaped as '&lt;' so for example
xmlWriter.WriteElementString("title",
"AT & T, <element>content</element>");
yields
<title>AT &amp; T,&lt;element&gt;content&lt;/element&gt;</title>

That has nothing to do with HTML or HTML entities, rather XML defines
entities like amp or gt or lt itself.

If you wanted that 'title element to have a child 'element' then you
need to use
xmlWriter.WriteStartElement("title");
xmlWriter.WriteString("AT & T");
xmlWriter.WriteElementString("element", "content");
xmlWriter.WriteEndElement();
which yields

<title>AT &amp; T<element>content</element></title>
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
May 29 '07 #2
Thanks for confirming that the XmlTextWriter methods escapes and encodes
specific text characters as HTML character entities. The HTML character
entity naming conventions you attempt to clarify are defined by W3C (24.4.1
The list of characters, Special characters for HTML [1]). My question should
have asked if the method escape and encode "text characters" as HTML
entities. Nitpicker ;-)

Anyhow I didn't observe MSDN documentation make note of this inherent
feature of the class as the escaping and encoded features are not explicitly
documented in any page I have yet to read. There is a pthy comment within
the narrative of the "Writing XML with the XmlWriter" document [2] but the
narrative is poorly written and easily misunderstood.

<%= Clinton Gallagher

[1] http://www.w3.org/TR/html401/sgml/entities.html
[2] http://msdn2.microsoft.com/en-us/lib...hb(VS.80).aspx
"Martin Honnen" <ma*******@yahoo.dewrote in message
news:%2******************@TK2MSFTNGP04.phx.gbl...
clintonG wrote:
>Does the XmlTextWriter encode HTML and XML? How does the data in the
<titleelement in the file end up encoded?

With XmlWriter respectively XmlTextWriter you can ensure that your XML
markup is well-formed as methods like WriteElementString make sure that
'&' is escaped as &amp; and '<' is escaped as '&lt;' so for example
xmlWriter.WriteElementString("title",
"AT & T, <element>content</element>");
yields
<title>AT &amp; T,&lt;element&gt;content&lt;/element&gt;</title>

That has nothing to do with HTML or HTML entities, rather XML defines
entities like amp or gt or lt itself.

If you wanted that 'title element to have a child 'element' then you need
to use
xmlWriter.WriteStartElement("title");
xmlWriter.WriteString("AT & T");
xmlWriter.WriteElementString("element", "content");
xmlWriter.WriteEndElement();
which yields

<title>AT &amp; T<element>content</element></title>
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

May 29 '07 #3
clintonG wrote:
Thanks for confirming that the XmlTextWriter methods escapes and encodes
specific text characters as HTML character entities. The HTML character
entity naming conventions you attempt to clarify are defined by W3C (24.4.1
The list of characters, Special characters for HTML [1]). My question should
have asked if the method escape and encode "text characters" as HTML
entities. Nitpicker ;-)
XML defines its own entities and what XmlWriter does is based on the XML
specification and _not_ on the HTML specification.
See <http://www.w3.org/TR/REC-xml/#sec-predefined-ent>.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
May 29 '07 #4
I kept following links and finally found the arcane documentation: an
XmlWriterSettings.CheckCharacters Property [1]. So it seems to me ASP.NET
developers don't have to fool around with Regular Expressions to validate
and replace text characters that would be illegal when the document is saved
as XML, i.e. RSS feeds for example.

I understand what W3C documents say but XML and HTML derive from SGML and
there are some semantic ambiguities in this context in the W3C documents.
Most of us and most documentation including W3C documentation define &amp;
as an HTML character entity. When we get to the W3C page(s) for XML they
drop the verbiage "HTML" when describing character entities.

As I'm sure you'll have to agree reading the EBNF, the DTDs indicate we're
talking about the same thing using context specific nomenclature.
So we really don't need to quibble about semantics. All I want to do is
write code that will generate valid XML RSS feeds that will be parsed by the
greatest number of aggregators which in itself requires a personal
relationship with all the blessings of Heaven because everybody has been so
FUBAR in their respective implementations.

<%= Clinton Gallagher

[1]
http://msdn2.microsoft.com/en-us/lib...rs(VS.80).aspx
"Martin Honnen" <ma*******@yahoo.dewrote in message
news:eq****************@TK2MSFTNGP04.phx.gbl...
clintonG wrote:
>Thanks for confirming that the XmlTextWriter methods escapes and encodes
specific text characters as HTML character entities. The HTML character
entity naming conventions you attempt to clarify are defined by W3C
(24.4.1 The list of characters, Special characters for HTML [1]). My
question should have asked if the method escape and encode "text
characters" as HTML entities. Nitpicker ;-)

XML defines its own entities and what XmlWriter does is based on the XML
specification and _not_ on the HTML specification.
See <http://www.w3.org/TR/REC-xml/#sec-predefined-ent>.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

May 29 '07 #5
* clintonG wrote in microsoft.public.dotnet.xml:
>I understand what W3C documents say but XML and HTML derive from SGML and
there are some semantic ambiguities in this context in the W3C documents.
Most of us and most documentation including W3C documentation define &amp;
as an HTML character entity. When we get to the W3C page(s) for XML they
drop the verbiage "HTML" when describing character entities.
It would be very confusing otherwise. As an example, &apos; is valid in
XML but not part of HTML, while &ouml; is part of HTML but not of XML;
so if you speak about the pre-defined entities in XML you refer to five,
if you speak about those in HTML you refer to hundreds of them.
--
Björn Höhrmann · mailto:bj****@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
May 29 '07 #6

"Bjoern Hoehrmann" <bj****@hoehrmann.dewrote in message
news:4t********************************@hive.bjoer n.hoehrmann.de...
>* clintonG wrote in microsoft.public.dotnet.xml:
>>I understand what W3C documents say but XML and HTML derive from SGML and
there are some semantic ambiguities in this context in the W3C documents.
Most of us and most documentation including W3C documentation define &amp;
as an HTML character entity. When we get to the W3C page(s) for XML they
drop the verbiage "HTML" when describing character entities.

It would be very confusing otherwise. As an example, &apos; is valid in
XML but not part of HTML, while &ouml; is part of HTML but not of XML;
so if you speak about the pre-defined entities in XML you refer to five,
if you speak about those in HTML you refer to hundreds of them.
--
Björn Höhrmann · mailto:bj****@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Nobody argues that point Björn except to say the correct use of the English
language used in a formal document requires the use of "narrative" and
"expository" use of the grammar which we native speakers of English are
taught in grade school.

I value consistency in technical documentation which is considered a formal
use of the language. Consistency should not be compromised for the sake of
brevity which in this context results in the obfuscation of terminology. I
mean what are we talking about being needed here? A single paragraph of
narrative supported by a single expository table of five rows to resolve an
apparent contradiction which is not a contradiction at all?

Sometimes the people on the W3C working groups do not always make the best
decisions and are not neccessarily known for their mastery of the English
language which is said to be the most difficult language to master. That
said, over the years having observed how software developers will quibble
with one another for weeks or perhaps months about a single term and its
meaning I'm genuinely surprised this discrepancy has become over-looked.

<%= Clinton
Jun 1 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: micha | last post by:
my php script gets delivered text that contains special chars (like german umlauts), and these chars may, may partially or may not be coverted into html entities already. i don't know beforhand. ...
7
by: Robert Oschler | last post by:
Is there a module/function to remove all the HTML entities from an HTML document (e.g. - &nbsp, &amp, &apos, etc.)? If not I'll just write one myself but I figured I'd save myself some time. ...
4
by: Geoff Wilkins | last post by:
I must confess I only come here when I have a problem - so my apologies if this has been raised before: Using my IE v.6 browser, document.write doesn't convert HTML entities (e.g. &apos;, &amp;) to...
2
by: Beat Richli | last post by:
Hello i have following problem with ASP (using Interdev, Win2003 Server): if a special character is entered in a textbox, ASP or the Client Browser (IE 6) seems to convert this character in HTML...
0
by: David W. Fenton | last post by:
Well, today I needed to process some data for upload to a web page and it needed higher ASCII characters encoded as HTML entities. So, I wrote a function to do the job, which works with a table...
2
by: Joergen Bech | last post by:
Is there a function in the .Net 1.1 framework that will take, say, a string containing Scandinavian characters and output the corret HTML entities, such as &aelig; &oslash; &aring; etc.
9
by: darrel | last post by:
I'm trying to get ASP.net to write out some XML including HTML from a DB: The HTML is stored in the DB as encoded HTML. I'm trying to decode it and write it to an XML node (The HTML is valid...
8
by: Steven D'Aprano | last post by:
I have a string containing Latin-1 characters: s = u"© and many more..." I want to convert it to HTML entities: result => "&copy; and many more..." Decimal/hex escapes would be...
6
by: laredotornado | last post by:
Hi, Is there a Javascript way of taking a string of text and encoding it such that its HTML entities are represented? For example, "<" would be represented as "&lt;"? Thanks, - Dave
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.