By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,278 Members | 1,086 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,278 IT Pros & Developers. It's quick & easy.

Can XmlDocument.Load() method handle unicode characters?

P: n/a
Dear all,

I've spent a long time to try to get the xmldocument.load method
to handle UTF-8 characters, but no luck. Every time it loads a
document contains european characters (such as the one below, output
from google map API), it always said invalid character at position
229, which I believe is the "ß" character.

Can anyone point me to the right direction of how to load such
documents using the xmldocument.load() method, or some other better
ways to do this?

Thanks!

---------------sample XML file------------------
<?xml version="1.0" encoding="UTF-8" ?>
- <kml xmlns="http://earth.google.com/kml/2.0">
- <Response>
<name>germaniastr 134, berlin berlin</name>
- <Status>
<code>200</code>
<request>geocode</request>
</Status>
- <Placemark>
<address>Germaniastraße 134, 12099 Tempelhof, Berlin, Germany</
address>
- <AddressDetails Accuracy="8"
xmlns="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0">
- <Country>
<CountryNameCode>DE</CountryNameCode>
- <AdministrativeArea>
<AdministrativeAreaName>Berlin</AdministrativeAreaName>
- <SubAdministrativeArea>
<SubAdministrativeAreaName>Berlin</SubAdministrativeAreaName>
- <Locality>
<LocalityName>Berlin</LocalityName>
- <DependentLocality>
<DependentLocalityName>Tempelhof</DependentLocalityName>
- <Thoroughfare>
<ThoroughfareName>Germaniastraße 134</ThoroughfareName>
</Thoroughfare>
- <PostalCode>
<PostalCodeNumber>12099</PostalCodeNumber>
</PostalCode>
</DependentLocality>
</Locality>
</SubAdministrativeArea>
</AdministrativeArea>
</Country>
</AddressDetails>
- <Point>
<coordinates>13.399486,52.464476,0</coordinates>
</Point>
</Placemark>
</Response>
</kml>

Jan 30 '07 #1
Share this Question
Share on Google+
10 Replies


P: n/a
* la*****@gmail.com wrote in microsoft.public.dotnet.xml:
I've spent a long time to try to get the xmldocument.load method
to handle UTF-8 characters, but no luck. Every time it loads a
document contains european characters (such as the one below, output
from google map API), it always said invalid character at position
229, which I believe is the "ß" character.
Then it is most likely that your document is not UTF-8 encoded. You will
have to check which bytes are actually at that position, e.g. using a
hex editor (e.g., use File.OpenFile ... /e:Binary in Visual Studio). If
the ß is encoded as two bytes C3 9F then that's either not the offending
character, or you have other encoding problems (for example, you might
have told the XML processor the document is US-ASCII encoded).

Note that loading XML documents in Internet Explorer and copying and
pasting the results does not help in any way to debug this kind of
problem, compressing the document and loading it up to some web server
is a more sensible approach.
--
Björn Höhrmann · mailto:bj****@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Jan 31 '07 #2

P: n/a
Thanks for your reply, Björn. Since this file is coming from a
dynamic URL online, I just used the XmlDocument.Load(URL) method to
load the xml file. In this case, how do I tell the XML processor what
encoding the file would be before I load the document? I've saved the
sample XML file (dynamicaly generated from google map) from IE's File-
>Save As... , and uploaded the file to http://www.usctimes.com/gmap/
geo.xml . It seems to open fine in the browser, does that means
anything?
Then it is most likely that your document is not UTF-8 encoded. You will
have to check which bytes are actually at that position, e.g. using a
hex editor (e.g., use File.OpenFile ... /e:Binary in Visual Studio). If
the ß is encoded as two bytes C3 9F then that's either not the offending
character, or you have other encoding problems (for example, you might
have told the XML processor the document is US-ASCII encoded).

Note that loading XML documents in Internet Explorer and copying and
pasting the results does not help in any way to debug this kind of
problem, compressing the document and loading it up to some web server
is a more sensible approach.
--
Björn Höhrmann · mailto:bjo...@hoehrmann.de ·http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/

Jan 31 '07 #3

P: n/a
la*****@gmail.com wrote:
>
Since this file is coming from a
dynamic URL online, I just used the XmlDocument.Load(URL) method to
load the xml file. In this case, how do I tell the XML processor what
encoding the file would be before I load the document?
You don't have to tell the encoding, pass in the URL to the Load method
and the XML parser will check the XML declaration for the declared
encoding or will check for byte order mark and will then based on that
information decode the bytes served to characters. If that is not
possible you get an error.
I've saved the
sample XML file (dynamicaly generated from google map) from IE's File-
>Save As... , and uploaded the file to http://www.usctimes.com/gmap/
geo.xml . It seems to open fine in the browser, does that means
anything?
It also loads fine with .NET and the Load method of
System.Xml.XmlDocument so that file is properly encoded. And .NET parses
it just fine (tested with .NET 1.x and 2.0).

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Jan 31 '07 #4

P: n/a
Hi Martin,

Thanks for the test result. It seems that if I load the file I
saved earlier using XmlDocument.Load(), it worked fine. But when I
tried to load the dynamic generated file directly from google map's
server, it will cause that "invalid character in the given encoding,
line 1, position 228" error. Does that mean google map uses the wrong
encoding for that XML file? I don't think I can post the complete
google map link here as the URL contains the google map API key. But
the URL goes something like this:
http://maps.google.com/maps/geo?q=ge...&key=GOOGLEKEY

Any thoughts?
Chris

On Jan 31, 5:09 am, Martin Honnen <mahotr...@yahoo.dewrote:
lamx...@gmail.com wrote:
Since this file is coming from a
dynamic URL online, I just used the XmlDocument.Load(URL) method to
load the xml file. In this case, how do I tell the XML processor what
encoding the file would be before I load the document?

You don't have to tell the encoding, pass in the URL to the Load method
and the XML parser will check the XML declaration for the declared
encoding or will check for byte order mark and will then based on that
information decode the bytes served to characters. If that is not
possible you get an error.
I've saved the
sample XML file (dynamicaly generated from google map) from IE's File-
Save As... , and uploaded the file tohttp://www.usctimes.com/gmap/
geo.xml . It seems to open fine in the browser, does that means
anything?

It also loads fine with .NET and the Load method of
System.Xml.XmlDocument so that file is properly encoded. And .NET parses
it just fine (tested with .NET 1.x and 2.0).

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

Jan 31 '07 #5

P: n/a
la*****@gmail.com wrote:
It seems that if I load the file I
saved earlier using XmlDocument.Load(), it worked fine. But when I
tried to load the dynamic generated file directly from google map's
server, it will cause that "invalid character in the given encoding,
line 1, position 228" error. Does that mean google map uses the wrong
encoding for that XML file?
It means that the XML is not properly encoded.
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Jan 31 '07 #6

P: n/a
On Jan 31, 8:58 am, Martin Honnen <mahotr...@yahoo.dewrote:
lamx...@gmail.com wrote:
It seems that if I load the file I
saved earlier using XmlDocument.Load(), it worked fine. But when I
tried to load the dynamic generated file directly from google map's
server, it will cause that "invalid character in the given encoding,
line 1, position 228" error. Does that mean google map uses the wrong
encoding for that XML file?

It means that the XML is not properly encoded.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Martin,

Do you have any suggestion on how can I load this dynamic file, or how
to make the xml document properly encoded?

Thanks!

Jan 31 '07 #7

P: n/a
* la*****@gmail.com wrote in microsoft.public.dotnet.xml:
>Do you have any suggestion on how can I load this dynamic file, or how
to make the xml document properly encoded?
If the XML document is really not properly encoded, you should contact
Google to have their service fixed. Until then all you can do is try to
fix the XML document before parsing. For example, you could remove all
non-ASCII octets or you could transcode the document from Windows-1252
to UTF-8 using System.Text.Encoding.
--
Björn Höhrmann · mailto:bj****@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Feb 1 '07 #8

P: n/a
On Jan 31, 4:15 pm, Bjoern Hoehrmann <bjo...@hoehrmann.dewrote:
* lamx...@gmail.com wrote in microsoft.public.dotnet.xml:
Do you have any suggestion on how can I load this dynamic file, or how
to make the xml document properly encoded?

If the XML document is really not properly encoded, you should contact
Google to have their service fixed. Until then all you can do is try to
fix the XML document before parsing. For example, you could remove all
non-ASCII octets or you could transcode the document from Windows-1252
to UTF-8 using System.Text.Encoding.
--
Björn Höhrmann · mailto:bjo...@hoehrmann.de ·http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/

Hi Björn, Can you provide an example of how to save an online xml
document and transcode it to UTF-8 with System.Text.Encoding? Thanks!

Feb 1 '07 #9

P: n/a
First you have to find out which encoding does the dynamic document use.
XmlDocument/XmlTextReader by default uses UTF-8 unless there is a BOM mark or
encoding attribute in the XML declaration that says something else. Once you
find out the encoding, create a StreamReader over the input stream and
specify the document's encoding in its constructor. Then create an XmlReader
over this StreamReader and use XmlDocument.Load to load the document.

If you are sure that the document's encoding is indeed UTF-8 and there is an
invalid character in it, you can create an instance of UTF8Encoding that will
ignore invalid characters (see the UTF8Encoding constuctor).

-Helena
"la*****@gmail.com" wrote:
On Jan 31, 4:15 pm, Bjoern Hoehrmann <bjo...@hoehrmann.dewrote:
* lamx...@gmail.com wrote in microsoft.public.dotnet.xml:
>Do you have any suggestion on how can I load this dynamic file, or how
>to make the xml document properly encoded?
If the XML document is really not properly encoded, you should contact
Google to have their service fixed. Until then all you can do is try to
fix the XML document before parsing. For example, you could remove all
non-ASCII octets or you could transcode the document from Windows-1252
to UTF-8 using System.Text.Encoding.
--
Björn Höhrmann · mailto:bjo...@hoehrmann.de ·http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/


Hi Björn, Can you provide an example of how to save an online xml
document and transcode it to UTF-8 with System.Text.Encoding? Thanks!

Feb 6 '07 #10

P: n/a
Help !
I have the same problem and need to remove funny characters from my
source xml file. Please can someone supply an example..

Tim Heap
Software & Database Manager
POSTAR Ltd
www.postar.co.uk
ti*@postar.co.uk

*** Sent via Developersdex http://www.developersdex.com ***
Mar 22 '07 #11

This discussion thread is closed

Replies have been disabled for this discussion.