473,748 Members | 3,604 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Can XmlDocument.Loa d() method handle unicode characters?

Dear all,

I've spent a long time to try to get the xmldocument.loa d method
to handle UTF-8 characters, but no luck. Every time it loads a
document contains european characters (such as the one below, output
from google map API), it always said invalid character at position
229, which I believe is the "ß" character.

Can anyone point me to the right direction of how to load such
documents using the xmldocument.loa d() method, or some other better
ways to do this?

Thanks!

---------------sample XML file------------------
<?xml version="1.0" encoding="UTF-8" ?>
- <kml xmlns="http://earth.google.co m/kml/2.0">
- <Response>
<name>germanias tr 134, berlin berlin</name>
- <Status>
<code>200</code>
<request>geocod e</request>
</Status>
- <Placemark>
<address>German iastraße 134, 12099 Tempelhof, Berlin, Germany</
address>
- <AddressDetai ls Accuracy="8"
xmlns="urn:oasi s:names:tc:ciq: xsdschema:xAL:2 .0">
- <Country>
<CountryNameCod e>DE</CountryNameCode >
- <Administrative Area>
<Administrative AreaName>Berlin </AdministrativeA reaName>
- <SubAdministrat iveArea>
<SubAdministrat iveAreaName>Ber lin</SubAdministrati veAreaName>
- <Locality>
<LocalityName>B erlin</LocalityName>
- <DependentLocal ity>
<DependentLocal ityName>Tempelh of</DependentLocali tyName>
- <Thoroughfare >
<ThoroughfareNa me>Germaniastra ße 134</ThoroughfareNam e>
</Thoroughfare>
- <PostalCode>
<PostalCodeNumb er>12099</PostalCodeNumbe r>
</PostalCode>
</DependentLocali ty>
</Locality>
</SubAdministrati veArea>
</AdministrativeA rea>
</Country>
</AddressDetails>
- <Point>
<coordinates>13 .399486,52.4644 76,0</coordinates>
</Point>
</Placemark>
</Response>
</kml>

Jan 30 '07 #1
10 13975
* la*****@gmail.c om wrote in microsoft.publi c.dotnet.xml:
I've spent a long time to try to get the xmldocument.loa d method
to handle UTF-8 characters, but no luck. Every time it loads a
document contains european characters (such as the one below, output
from google map API), it always said invalid character at position
229, which I believe is the "ß" character.
Then it is most likely that your document is not UTF-8 encoded. You will
have to check which bytes are actually at that position, e.g. using a
hex editor (e.g., use File.OpenFile ... /e:Binary in Visual Studio). If
the ß is encoded as two bytes C3 9F then that's either not the offending
character, or you have other encoding problems (for example, you might
have told the XML processor the document is US-ASCII encoded).

Note that loading XML documents in Internet Explorer and copying and
pasting the results does not help in any way to debug this kind of
problem, compressing the document and loading it up to some web server
is a more sensible approach.
--
Björn Höhrmann · mailto:bj****@h oehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Jan 31 '07 #2
Thanks for your reply, Björn. Since this file is coming from a
dynamic URL online, I just used the XmlDocument.Loa d(URL) method to
load the xml file. In this case, how do I tell the XML processor what
encoding the file would be before I load the document? I've saved the
sample XML file (dynamicaly generated from google map) from IE's File-
>Save As... , and uploaded the file to http://www.usctimes.com/gmap/
geo.xml . It seems to open fine in the browser, does that means
anything?
Then it is most likely that your document is not UTF-8 encoded. You will
have to check which bytes are actually at that position, e.g. using a
hex editor (e.g., use File.OpenFile ... /e:Binary in Visual Studio). If
the ß is encoded as two bytes C3 9F then that's either not the offending
character, or you have other encoding problems (for example, you might
have told the XML processor the document is US-ASCII encoded).

Note that loading XML documents in Internet Explorer and copying and
pasting the results does not help in any way to debug this kind of
problem, compressing the document and loading it up to some web server
is a more sensible approach.
--
Björn Höhrmann · mailto:bjo...@h oehrmann.de ·http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/

Jan 31 '07 #3
la*****@gmail.c om wrote:
>
Since this file is coming from a
dynamic URL online, I just used the XmlDocument.Loa d(URL) method to
load the xml file. In this case, how do I tell the XML processor what
encoding the file would be before I load the document?
You don't have to tell the encoding, pass in the URL to the Load method
and the XML parser will check the XML declaration for the declared
encoding or will check for byte order mark and will then based on that
information decode the bytes served to characters. If that is not
possible you get an error.
I've saved the
sample XML file (dynamicaly generated from google map) from IE's File-
>Save As... , and uploaded the file to http://www.usctimes.com/gmap/
geo.xml . It seems to open fine in the browser, does that means
anything?
It also loads fine with .NET and the Load method of
System.Xml.XmlD ocument so that file is properly encoded. And .NET parses
it just fine (tested with .NET 1.x and 2.0).

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Jan 31 '07 #4
Hi Martin,

Thanks for the test result. It seems that if I load the file I
saved earlier using XmlDocument.Loa d(), it worked fine. But when I
tried to load the dynamic generated file directly from google map's
server, it will cause that "invalid character in the given encoding,
line 1, position 228" error. Does that mean google map uses the wrong
encoding for that XML file? I don't think I can post the complete
google map link here as the URL contains the google map API key. But
the URL goes something like this:
http://maps.google.com/maps/geo?q=ge...&key=GOOGLEKEY

Any thoughts?
Chris

On Jan 31, 5:09 am, Martin Honnen <mahotr...@yaho o.dewrote:
lamx...@gmail.c om wrote:
Since this file is coming from a
dynamic URL online, I just used the XmlDocument.Loa d(URL) method to
load the xml file. In this case, how do I tell the XML processor what
encoding the file would be before I load the document?

You don't have to tell the encoding, pass in the URL to the Load method
and the XML parser will check the XML declaration for the declared
encoding or will check for byte order mark and will then based on that
information decode the bytes served to characters. If that is not
possible you get an error.
I've saved the
sample XML file (dynamicaly generated from google map) from IE's File-
Save As... , and uploaded the file tohttp://www.usctimes.co m/gmap/
geo.xml . It seems to open fine in the browser, does that means
anything?

It also loads fine with .NET and the Load method of
System.Xml.XmlD ocument so that file is properly encoded. And .NET parses
it just fine (tested with .NET 1.x and 2.0).

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

Jan 31 '07 #5
la*****@gmail.c om wrote:
It seems that if I load the file I
saved earlier using XmlDocument.Loa d(), it worked fine. But when I
tried to load the dynamic generated file directly from google map's
server, it will cause that "invalid character in the given encoding,
line 1, position 228" error. Does that mean google map uses the wrong
encoding for that XML file?
It means that the XML is not properly encoded.
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Jan 31 '07 #6
On Jan 31, 8:58 am, Martin Honnen <mahotr...@yaho o.dewrote:
lamx...@gmail.c om wrote:
It seems that if I load the file I
saved earlier using XmlDocument.Loa d(), it worked fine. But when I
tried to load the dynamic generated file directly from google map's
server, it will cause that "invalid character in the given encoding,
line 1, position 228" error. Does that mean google map uses the wrong
encoding for that XML file?

It means that the XML is not properly encoded.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Martin,

Do you have any suggestion on how can I load this dynamic file, or how
to make the xml document properly encoded?

Thanks!

Jan 31 '07 #7
* la*****@gmail.c om wrote in microsoft.publi c.dotnet.xml:
>Do you have any suggestion on how can I load this dynamic file, or how
to make the xml document properly encoded?
If the XML document is really not properly encoded, you should contact
Google to have their service fixed. Until then all you can do is try to
fix the XML document before parsing. For example, you could remove all
non-ASCII octets or you could transcode the document from Windows-1252
to UTF-8 using System.Text.Enc oding.
--
Björn Höhrmann · mailto:bj****@h oehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Feb 1 '07 #8
On Jan 31, 4:15 pm, Bjoern Hoehrmann <bjo...@hoehrma nn.dewrote:
* lamx...@gmail.c om wrote in microsoft.publi c.dotnet.xml:
Do you have any suggestion on how can I load this dynamic file, or how
to make the xml document properly encoded?

If the XML document is really not properly encoded, you should contact
Google to have their service fixed. Until then all you can do is try to
fix the XML document before parsing. For example, you could remove all
non-ASCII octets or you could transcode the document from Windows-1252
to UTF-8 using System.Text.Enc oding.
--
Björn Höhrmann · mailto:bjo...@h oehrmann.de ·http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/

Hi Björn, Can you provide an example of how to save an online xml
document and transcode it to UTF-8 with System.Text.Enc oding? Thanks!

Feb 1 '07 #9
First you have to find out which encoding does the dynamic document use.
XmlDocument/XmlTextReader by default uses UTF-8 unless there is a BOM mark or
encoding attribute in the XML declaration that says something else. Once you
find out the encoding, create a StreamReader over the input stream and
specify the document's encoding in its constructor. Then create an XmlReader
over this StreamReader and use XmlDocument.Loa d to load the document.

If you are sure that the document's encoding is indeed UTF-8 and there is an
invalid character in it, you can create an instance of UTF8Encoding that will
ignore invalid characters (see the UTF8Encoding constuctor).

-Helena
"la*****@gmail. com" wrote:
On Jan 31, 4:15 pm, Bjoern Hoehrmann <bjo...@hoehrma nn.dewrote:
* lamx...@gmail.c om wrote in microsoft.publi c.dotnet.xml:
>Do you have any suggestion on how can I load this dynamic file, or how
>to make the xml document properly encoded?
If the XML document is really not properly encoded, you should contact
Google to have their service fixed. Until then all you can do is try to
fix the XML document before parsing. For example, you could remove all
non-ASCII octets or you could transcode the document from Windows-1252
to UTF-8 using System.Text.Enc oding.
--
Björn Höhrmann · mailto:bjo...@h oehrmann.de ·http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/


Hi Björn, Can you provide an example of how to save an online xml
document and transcode it to UTF-8 with System.Text.Enc oding? Thanks!

Feb 6 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
23003
by: Matthew Wieder | last post by:
In my C# application, I have class which has method that opens an XML document, modifies it and saves it out. I run that method for several different XML documents. What I've found is that the Load emthod on the MXLDocument isntance loads the document into memory (as it should) but I have no way of releaseing that memory throughout the application, and I run out of memory. The documents I'm loading are abou 10 MB in size, and I run out...
3
1521
by: Uldis V. | last post by:
I get an unhandled exception, when I try to execute XmlDocument.Load(...) in my C# Windows application: -------------------------------------------- "Common Language Runtime Debugging Services" Process id... thread id... Click OK to terminate ... Click Cancel to Debug ... -------------------------------------------- Running environment:
4
2547
by: Foo | last post by:
Hi I have a problem, while retrieving xml data through network. I use Load(Stream) method for this, but this doesn't work: NetworkStream ns = client.GetStream(); StreamReader sreader = new StreamReader(ns); XmlDocument xmldoc = new XmlDocument(); xmldoc.Load(sreader); Everything goes right while retrieving data to string and then loading to
2
6018
by: Nikhil Patel | last post by:
Hi all, I use a Word Xml Template in ASP.Net application. Basically when a user clicks on a certain button on a web form, I open the template using XmlDocument.Load method; replace some strings in the document with user selected values; and save the document with a unique name generated at runtime. Everything works fine in the test environment. But before I deploy the application, I would like to ask one question - If two users...
3
5622
by: Mohammad-Reza | last post by:
We are writing an application for a specific culture(Arabic or Farsi). This application involves using DataAdapter, OLEDB Connection and the DataSet. We didn't use the .NET data binding, just field TextBoxes with the data retrieved from the DataSet but whole system seems to be unable to update the database (.mdb file) with Unicode characters. Instead of displaying the correct characters the application displays question marks (?). Isn't...
1
3081
by: =?Utf-8?B?U2hhd24gU2VzbmE=?= | last post by:
The description of the XMLDocument.Load method doesn't quite answer the question. When passing in a FileStream object to the Load method, does it load the entire document into memory? For example, if I have a 1 gigabyte file, would the Load method attempt read the entire 1 gigabyte file into memory when using a FileStream object?
0
8991
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9544
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9372
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8243
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6796
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6074
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4874
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3313
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2215
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.