sign in | join about | help | sitemap
Connecting Tech Pros Worldwide
Glen's Avatar

Invalid charachters in a XML Doc


Question posted by: Glen (Guest) on November 12th, 2005 04:13 AM
I'm new to XML, so this is a newbie question. I'm reading in XML docs via a
VB.NET application and extracting node data and I find that one of my blocks
or lines has a copyright character in it. I'm using the .NET XMLTextReader
class and the Reader won't parse this character at all; just throws an
exception and boom the application quits. The character always appears in
the same place in the document so I thought to detect the node and skip over
it using the skip function. Nope, it still blows up.

Here's the node. Notice the copyright char shows up as a block. Can anyone
point me in the right direction? TIA...
<body.end>

<tagline>Copyright ? 2005 </tagline> the cp

</body.end>






2 Answers Posted
Bjoern Hoehrmann's Avatar
Bjoern Hoehrmann November 12th, 2005 04:13 AM
Guest - n/a Posts
#2: Re: Invalid charachters in a XML Doc

* Glen wrote in microsoft.public.dotnet.xml:[color=blue]
>I'm new to XML, so this is a newbie question. I'm reading in XML docs via a
>VB.NET application and extracting node data and I find that one of my blocks
>or lines has a copyright character in it. I'm using the .NET XMLTextReader
>class and the Reader won't parse this character at all; just throws an
>exception and boom the application quits. The character always appears in
>the same place in the document so I thought to detect the node and skip over
>it using the skip function. Nope, it still blows up.
>
>Here's the node. Notice the copyright char shows up as a block. Can anyone
>point me in the right direction? TIA...[/color]

If you do not declare a different encoding, all XML processors assume
that XML documents are UTF-8 encoded, if that causes any trouble it
would seem that the document is actually encoded in a different
encoding. You can solve your problem either by fixing the document
before passing it to the XML processor (by declaring the right encoding)
or by transcoding the document to UTF-8 (see System.Text.Encoding for
routines that could do that). I suspect the documents are actually in
the Windows-1252 encoding, so declaring the encoding would like like

<?xml version="1.0" encoding="windows-1252"?>
<x>...</x>
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Glen's Avatar
Guest - n/a Posts
#3: Re: Invalid charachters in a XML Doc

Thanks for the heads-up. Turns out the encoding is actually
ISO 8859-1.


"Bjoern Hoehrmann" <bjoern@hoehrmann.de> wrote in message
news:41f8890b.283516140@news.bjoern.hoehrmann.de.. .[color=blue]
> * Glen wrote in microsoft.public.dotnet.xml:[color=green]
> >I'm new to XML, so this is a newbie question. I'm reading in XML docs via[/color][/color]
a[color=blue][color=green]
> >VB.NET application and extracting node data and I find that one of my[/color][/color]
blocks[color=blue][color=green]
> >or lines has a copyright character in it. I'm using the .NET[/color][/color]
XMLTextReader[color=blue][color=green]
> >class and the Reader won't parse this character at all; just throws an
> >exception and boom the application quits. The character always appears in
> >the same place in the document so I thought to detect the node and skip[/color][/color]
over[color=blue][color=green]
> >it using the skip function. Nope, it still blows up.
> >
> >Here's the node. Notice the copyright char shows up as a block. Can[/color][/color]
anyone[color=blue][color=green]
> >point me in the right direction? TIA...[/color]
>
> If you do not declare a different encoding, all XML processors assume
> that XML documents are UTF-8 encoded, if that causes any trouble it
> would seem that the document is actually encoded in a different
> encoding. You can solve your problem either by fixing the document
> before passing it to the XML processor (by declaring the right encoding)
> or by transcoding the document to UTF-8 (see System.Text.Encoding for
> routines that could do that). I suspect the documents are actually in
> the Windows-1252 encoding, so declaring the encoding would like like
>
> <?xml version="1.0" encoding="windows-1252"?>
> <x>...</x>
> --
> Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
> Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
> 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/[/color]


 
Not the answer you were looking for? Post your question . . .
196,820 members ready to help you find a solution.
Join Bytes.com

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 196,820 network members.
Post your question now . . .
It's fast and it's free

Popular Articles

Top Community Contributors