473,386 Members | 1,830 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Can XmlDocument.Load() method handle unicode characters?

Dear all,

I've spent a long time to try to get the xmldocument.load method
to handle UTF-8 characters, but no luck. Every time it loads a
document contains european characters (such as the one below, output
from google map API), it always said invalid character at position
229, which I believe is the "ß" character.

Can anyone point me to the right direction of how to load such
documents using the xmldocument.load() method, or some other better
ways to do this?

Thanks!

---------------sample XML file------------------
<?xml version="1.0" encoding="UTF-8" ?>
- <kml xmlns="http://earth.google.com/kml/2.0">
- <Response>
<name>germaniastr 134, berlin berlin</name>
- <Status>
<code>200</code>
<request>geocode</request>
</Status>
- <Placemark>
<address>Germaniastraße 134, 12099 Tempelhof, Berlin, Germany</
address>
- <AddressDetails Accuracy="8"
xmlns="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0">
- <Country>
<CountryNameCode>DE</CountryNameCode>
- <AdministrativeArea>
<AdministrativeAreaName>Berlin</AdministrativeAreaName>
- <SubAdministrativeArea>
<SubAdministrativeAreaName>Berlin</SubAdministrativeAreaName>
- <Locality>
<LocalityName>Berlin</LocalityName>
- <DependentLocality>
<DependentLocalityName>Tempelhof</DependentLocalityName>
- <Thoroughfare>
<ThoroughfareName>Germaniastraße 134</ThoroughfareName>
</Thoroughfare>
- <PostalCode>
<PostalCodeNumber>12099</PostalCodeNumber>
</PostalCode>
</DependentLocality>
</Locality>
</SubAdministrativeArea>
</AdministrativeArea>
</Country>
</AddressDetails>
- <Point>
<coordinates>13.399486,52.464476,0</coordinates>
</Point>
</Placemark>
</Response>
</kml>

Jan 30 '07 #1
10 13892
* la*****@gmail.com wrote in microsoft.public.dotnet.xml:
I've spent a long time to try to get the xmldocument.load method
to handle UTF-8 characters, but no luck. Every time it loads a
document contains european characters (such as the one below, output
from google map API), it always said invalid character at position
229, which I believe is the "ß" character.
Then it is most likely that your document is not UTF-8 encoded. You will
have to check which bytes are actually at that position, e.g. using a
hex editor (e.g., use File.OpenFile ... /e:Binary in Visual Studio). If
the ß is encoded as two bytes C3 9F then that's either not the offending
character, or you have other encoding problems (for example, you might
have told the XML processor the document is US-ASCII encoded).

Note that loading XML documents in Internet Explorer and copying and
pasting the results does not help in any way to debug this kind of
problem, compressing the document and loading it up to some web server
is a more sensible approach.
--
Björn Höhrmann · mailto:bj****@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Jan 31 '07 #2
Thanks for your reply, Björn. Since this file is coming from a
dynamic URL online, I just used the XmlDocument.Load(URL) method to
load the xml file. In this case, how do I tell the XML processor what
encoding the file would be before I load the document? I've saved the
sample XML file (dynamicaly generated from google map) from IE's File-
>Save As... , and uploaded the file to http://www.usctimes.com/gmap/
geo.xml . It seems to open fine in the browser, does that means
anything?
Then it is most likely that your document is not UTF-8 encoded. You will
have to check which bytes are actually at that position, e.g. using a
hex editor (e.g., use File.OpenFile ... /e:Binary in Visual Studio). If
the ß is encoded as two bytes C3 9F then that's either not the offending
character, or you have other encoding problems (for example, you might
have told the XML processor the document is US-ASCII encoded).

Note that loading XML documents in Internet Explorer and copying and
pasting the results does not help in any way to debug this kind of
problem, compressing the document and loading it up to some web server
is a more sensible approach.
--
Björn Höhrmann · mailto:bjo...@hoehrmann.de ·http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/

Jan 31 '07 #3
la*****@gmail.com wrote:
>
Since this file is coming from a
dynamic URL online, I just used the XmlDocument.Load(URL) method to
load the xml file. In this case, how do I tell the XML processor what
encoding the file would be before I load the document?
You don't have to tell the encoding, pass in the URL to the Load method
and the XML parser will check the XML declaration for the declared
encoding or will check for byte order mark and will then based on that
information decode the bytes served to characters. If that is not
possible you get an error.
I've saved the
sample XML file (dynamicaly generated from google map) from IE's File-
>Save As... , and uploaded the file to http://www.usctimes.com/gmap/
geo.xml . It seems to open fine in the browser, does that means
anything?
It also loads fine with .NET and the Load method of
System.Xml.XmlDocument so that file is properly encoded. And .NET parses
it just fine (tested with .NET 1.x and 2.0).

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Jan 31 '07 #4
Hi Martin,

Thanks for the test result. It seems that if I load the file I
saved earlier using XmlDocument.Load(), it worked fine. But when I
tried to load the dynamic generated file directly from google map's
server, it will cause that "invalid character in the given encoding,
line 1, position 228" error. Does that mean google map uses the wrong
encoding for that XML file? I don't think I can post the complete
google map link here as the URL contains the google map API key. But
the URL goes something like this:
http://maps.google.com/maps/geo?q=ge...&key=GOOGLEKEY

Any thoughts?
Chris

On Jan 31, 5:09 am, Martin Honnen <mahotr...@yahoo.dewrote:
lamx...@gmail.com wrote:
Since this file is coming from a
dynamic URL online, I just used the XmlDocument.Load(URL) method to
load the xml file. In this case, how do I tell the XML processor what
encoding the file would be before I load the document?

You don't have to tell the encoding, pass in the URL to the Load method
and the XML parser will check the XML declaration for the declared
encoding or will check for byte order mark and will then based on that
information decode the bytes served to characters. If that is not
possible you get an error.
I've saved the
sample XML file (dynamicaly generated from google map) from IE's File-
Save As... , and uploaded the file tohttp://www.usctimes.com/gmap/
geo.xml . It seems to open fine in the browser, does that means
anything?

It also loads fine with .NET and the Load method of
System.Xml.XmlDocument so that file is properly encoded. And .NET parses
it just fine (tested with .NET 1.x and 2.0).

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

Jan 31 '07 #5
la*****@gmail.com wrote:
It seems that if I load the file I
saved earlier using XmlDocument.Load(), it worked fine. But when I
tried to load the dynamic generated file directly from google map's
server, it will cause that "invalid character in the given encoding,
line 1, position 228" error. Does that mean google map uses the wrong
encoding for that XML file?
It means that the XML is not properly encoded.
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Jan 31 '07 #6
On Jan 31, 8:58 am, Martin Honnen <mahotr...@yahoo.dewrote:
lamx...@gmail.com wrote:
It seems that if I load the file I
saved earlier using XmlDocument.Load(), it worked fine. But when I
tried to load the dynamic generated file directly from google map's
server, it will cause that "invalid character in the given encoding,
line 1, position 228" error. Does that mean google map uses the wrong
encoding for that XML file?

It means that the XML is not properly encoded.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Martin,

Do you have any suggestion on how can I load this dynamic file, or how
to make the xml document properly encoded?

Thanks!

Jan 31 '07 #7
* la*****@gmail.com wrote in microsoft.public.dotnet.xml:
>Do you have any suggestion on how can I load this dynamic file, or how
to make the xml document properly encoded?
If the XML document is really not properly encoded, you should contact
Google to have their service fixed. Until then all you can do is try to
fix the XML document before parsing. For example, you could remove all
non-ASCII octets or you could transcode the document from Windows-1252
to UTF-8 using System.Text.Encoding.
--
Björn Höhrmann · mailto:bj****@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Feb 1 '07 #8
On Jan 31, 4:15 pm, Bjoern Hoehrmann <bjo...@hoehrmann.dewrote:
* lamx...@gmail.com wrote in microsoft.public.dotnet.xml:
Do you have any suggestion on how can I load this dynamic file, or how
to make the xml document properly encoded?

If the XML document is really not properly encoded, you should contact
Google to have their service fixed. Until then all you can do is try to
fix the XML document before parsing. For example, you could remove all
non-ASCII octets or you could transcode the document from Windows-1252
to UTF-8 using System.Text.Encoding.
--
Björn Höhrmann · mailto:bjo...@hoehrmann.de ·http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/

Hi Björn, Can you provide an example of how to save an online xml
document and transcode it to UTF-8 with System.Text.Encoding? Thanks!

Feb 1 '07 #9
First you have to find out which encoding does the dynamic document use.
XmlDocument/XmlTextReader by default uses UTF-8 unless there is a BOM mark or
encoding attribute in the XML declaration that says something else. Once you
find out the encoding, create a StreamReader over the input stream and
specify the document's encoding in its constructor. Then create an XmlReader
over this StreamReader and use XmlDocument.Load to load the document.

If you are sure that the document's encoding is indeed UTF-8 and there is an
invalid character in it, you can create an instance of UTF8Encoding that will
ignore invalid characters (see the UTF8Encoding constuctor).

-Helena
"la*****@gmail.com" wrote:
On Jan 31, 4:15 pm, Bjoern Hoehrmann <bjo...@hoehrmann.dewrote:
* lamx...@gmail.com wrote in microsoft.public.dotnet.xml:
>Do you have any suggestion on how can I load this dynamic file, or how
>to make the xml document properly encoded?
If the XML document is really not properly encoded, you should contact
Google to have their service fixed. Until then all you can do is try to
fix the XML document before parsing. For example, you could remove all
non-ASCII octets or you could transcode the document from Windows-1252
to UTF-8 using System.Text.Encoding.
--
Björn Höhrmann · mailto:bjo...@hoehrmann.de ·http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/


Hi Björn, Can you provide an example of how to save an online xml
document and transcode it to UTF-8 with System.Text.Encoding? Thanks!

Feb 6 '07 #10
Help !
I have the same problem and need to remove funny characters from my
source xml file. Please can someone supply an example..

Tim Heap
Software & Database Manager
POSTAR Ltd
www.postar.co.uk
ti*@postar.co.uk

*** Sent via Developersdex http://www.developersdex.com ***
Mar 22 '07 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: Matthew Wieder | last post by:
In my C# application, I have class which has method that opens an XML document, modifies it and saves it out. I run that method for several different XML documents. What I've found is that the...
3
by: Uldis V. | last post by:
I get an unhandled exception, when I try to execute XmlDocument.Load(...) in my C# Windows application: -------------------------------------------- "Common Language Runtime Debugging Services"...
4
by: Foo | last post by:
Hi I have a problem, while retrieving xml data through network. I use Load(Stream) method for this, but this doesn't work: NetworkStream ns = client.GetStream(); StreamReader sreader = new...
2
by: Nikhil Patel | last post by:
Hi all, I use a Word Xml Template in ASP.Net application. Basically when a user clicks on a certain button on a web form, I open the template using XmlDocument.Load method; replace some strings in...
3
by: Mohammad-Reza | last post by:
We are writing an application for a specific culture(Arabic or Farsi). This application involves using DataAdapter, OLEDB Connection and the DataSet. We didn't use the .NET data binding, just field...
1
by: =?Utf-8?B?U2hhd24gU2VzbmE=?= | last post by:
The description of the XMLDocument.Load method doesn't quite answer the question. When passing in a FileStream object to the Load method, does it load the entire document into memory? For...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.