>What's wrong with "Schnbhl"?
Well, I see I was oversimplifying my question, in an attempt to avoid
discussing about other issues. The point is that the Xml file is provided by
guys from another company. These guys seem to be very fond of UTF-8. So they
encode everything into UTF-8. They make UTF-8 files and they leave them in a
FTP server.
I'm writing an application logging into the FTP server, getting the XML
files. Then I end up with the file contained in a string. I don't have any
control over the content of the file or its format. So I have to accept it
"as is".
Now I have a very long string containing lots of datas (about 20kb), its
header is
<?xml version="1.0" encoding="utf-8"> standalone="yes"?>
And there are plenty of elements, some of them containing special chars like
those in the "Schönbühl" city name.
So I want to load this Xml file into a Document. To do this I thought that
the easiest way was to load it from memory:
[... Lot of code here ...]
[... A lot of code here too ...]
[... And a lot more, just to ...]
[... obtain a string with ...]
[... the content of my Xml file in it ...]
XmlDocument doc=new XmlDocument();
doc.LoadXml(s);
At this point I had some issues with bad formated strings. So I started to
investigate. I investigate the file in the FTP server, I investigate the TCP
communication between the server and my application, I investigate the
Encoding in my application. At the end I found that everything seemed to be
correct, BUT still the special chars were broken.
So, as I didn't had any clue about what was wrong, I started reducing my
application, in order to obtain the simplest and shortest code reproducing
my error. And I end up with this:
----------------------------------------------
XmlDocument doc=new XmlDocument();
String s="<?xml version=\"1.0\" encoding=\"utf-8\">
standalone=\"yes\"?><a>Schönbühl</a>";
doc.LoadXml(s);
doc.Save("d:\\temp\\test.xml");
----------------------------------------------
My objective is not having a file called "test.xml", in a folder called
"temp", and containing the name of a bizarre city. It is just that this
small piece of code happens to behave in a way I'm not sure to understand.
In my mind, if you load the string s as a xml document, then you save it to
an xml file, the file and the string should have the same identical content.
Or, at least, an equivalent content. I still believe it.
But instead, if I use a text editor to read the "test.xml" file, I obtain
this:
----------------------------------------------
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<a>Schf¶nbf¼hl</a>
----------------------------------------------
where I expected to see
----------------------------------------------
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<a>Schönbühl</a>
----------------------------------------------
Thanks to you, now I understand the meaning of the "" at the beginning.
But I'm still puzzled about the double encoding of the "Schönbühl"'s
special chars.
Regards,
Jean-Michel Gonet.
"Oleg Tkachenko [MVP]" <oleg@NO!SPAM!PLEASEtkachenko.com> wrote in message
news:us**************@TK2MSFTNGP12.phx.gbl...
jmgonet wrote:
By double encoded I mean that the chars are encoded twice to UTF-8:
Originally the string contained in the xml was "Schnbhl".
To put it into the UTF standard, I've transformed it to "Schönbühl",
where the ="ö" and the ="ü".
I think that's a bad idea. UTF-8 defines how Unicode characters are
represented in bytes. By doubling characters you get just two characters.
What's wrong with "Schnbhl"? Just use it as is.
--
Oleg Tkachenko [XML MVP, MCP]
http://blog.tkachenko.com