* Glen wrote in microsoft.public.dotnet.xml:
I'm new to XML, so this is a newbie question. I'm reading in XML docs via a
VB.NET application and extracting node data and I find that one of my blocks
or lines has a copyright character in it. I'm using the .NET XMLTextReader
class and the Reader won't parse this character at all; just throws an
exception and boom the application quits. The character always appears in
the same place in the document so I thought to detect the node and skip over
it using the skip function. Nope, it still blows up.
Here's the node. Notice the copyright char shows up as a block. Can anyone
point me in the right direction? TIA...
If you do not declare a different encoding, all XML processors assume
that XML documents are UTF-8 encoded, if that causes any trouble it
would seem that the document is actually encoded in a different
encoding. You can solve your problem either by fixing the document
before passing it to the XML processor (by declaring the right encoding)
or by transcoding the document to UTF-8 (see System.Text.Encoding for
routines that could do that). I suspect the documents are actually in
the Windows-1252 encoding, so declaring the encoding would like like
<?xml version="1.0" encoding="windows-1252"?>
<x>...</x>
--
Björn Höhrmann · mailto:bj****@hoehrmann.de ·
http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·
http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·
http://www.websitedev.de/