Thanks for the heads-up. Turns out the encoding is actually
ISO 8859-1.
"Bjoern Hoehrmann" <bjoern@hoehrmann.de> wrote in message
news:41f8890b.283516140@news.bjoern.hoehrmann.de.. .[color=blue]
> * Glen wrote in microsoft.public.dotnet.xml:[color=green]
> >I'm new to XML, so this is a newbie question. I'm reading in XML docs via[/color][/color]
a[color=blue][color=green]
> >VB.NET application and extracting node data and I find that one of my[/color][/color]
blocks[color=blue][color=green]
> >or lines has a copyright character in it. I'm using the .NET[/color][/color]
XMLTextReader[color=blue][color=green]
> >class and the Reader won't parse this character at all; just throws an
> >exception and boom the application quits. The character always appears in
> >the same place in the document so I thought to detect the node and skip[/color][/color]
over[color=blue][color=green]
> >it using the skip function. Nope, it still blows up.
> >
> >Here's the node. Notice the copyright char shows up as a block. Can[/color][/color]
anyone[color=blue][color=green]
> >point me in the right direction? TIA...[/color]
>
> If you do not declare a different encoding, all XML processors assume
> that XML documents are UTF-8 encoded, if that causes any trouble it
> would seem that the document is actually encoded in a different
> encoding. You can solve your problem either by fixing the document
> before passing it to the XML processor (by declaring the right encoding)
> or by transcoding the document to UTF-8 (see System.Text.Encoding for
> routines that could do that). I suspect the documents are actually in
> the Windows-1252 encoding, so declaring the encoding would like like
>
> <?xml version="1.0" encoding="windows-1252"?>
> <x>...</x>
> --
> Björn Höhrmann · mailto:bjoern@hoehrmann.de ·
http://bjoern.hoehrmann.de
> Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·
http://www.bjoernsworld.de
> 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·
http://www.websitedev.de/[/color]