Jon Skeet [C# MVP] <skeet@pobox.com> wrote in
news:MPG.1a52adec48ab0b15989cbf@msnews.microsoft.c om:
Ok Thanks Jon, for clearing that up. :)
Frans
[color=blue]
> Frans Bouma <perseus.usenetNOSPAM@xs4all.nl> wrote:[color=green][color=darkred]
>> > What do you mean by this, exactly?[/color]
>>
>> that I had the same XML data in the file, one written away[/color][/color]
with[color=blue][color=green]
>> Encoding.Default and the other with Encoding.Unicode. Both looked the[/color][/color]
sam[color=blue]
> e[color=green]
>> in notepad, I had NO encoding specifcation. however one couldn't be[/color][/color]
loade[color=blue]
> d[color=green]
>> due to a an 'ae' character, the other one could be loaded (or better:[/color][/color]
be[color=blue]
>[color=green]
>> serialized back). I found this very odd, because there was NO encoding
>> specifier in the XML, so the encoding has to be stored somewhere else.[/color]
>
> It's not odd add all - it would have been preferable to have the
> encoding specifier in the XML, but Notepad wouldn't have used it
> anyway.
>
> In fact, it seem that Notepad on XP *does* read UTF-8 files. If you use
> the following code:
>
> using System;
> using System.IO;
> using System.Text;
>
> public class Test
> {
> static void Main()
> {
> using (StreamWriter sw = new StreamWriter ("test.txt"))
> {
> sw.WriteLine ("\u00e9");
> }
> }
> }
>
> to generate a file test.txt, which has contents 0xc9 0xa9 0x0d 0x0a,
> then if you open it in Notepad with encoding UTF-8, it correctly
> displays an e-acute. If you open it in Notepad with encoding ANSI, it
> displays é (again, correctly).
>
> Now, if your XML didn't include an encoding specifier, the XML parser
> should have assumed UTF-8. If you used Encoding.Default (instead of
> UTF-8) then you would indeed get an error if the file was not a valid
> UTF-8 file. From the XML specification:
>
> <quote>
> In the absence of information provided by an external transport
> protocol (e.g. HTTP or MIME), it is an error for an entity including an
> encoding declaration to be presented to the XML processor in an
> encoding other than that named in the declaration, or for an entity
> which begins with neither a Byte Order Mark nor an encoding declaration
> to use an encoding other than UTF-8.
> </quote>
>
> When you used the Unicode encoding, I suspect you got a byte-order mark
> which allowed the parser to tell that it was using that encoding.
>[color=green][color=darkred]
>> >> I was bitten by the same thing. I had
>> >> to explicitly state Encoding.Unicode. WHen I used Encoding.Default,[/color][/color][/color]
it[color=blue]
>[color=green][color=darkred]
>> >> should work according to the docs, but it didn't. It did save stuff[/color]
>> like[color=darkred]
>> >> scandinavian characters away in the file, but it couldn't read it[/color][/color][/color]
back[color=blue]
>[color=green][color=darkred]
>> >> correctly, even if I stated UTF-8 as encoding or whatever in the xml[/color][/color]
>[color=green][color=darkred]
>> >> header. So I think he's right.
>> >
>> > I really don't think so - please provide a complete example stating
>> > *exactly* what you expected, and what you got.[/color]
>>
>> write:
>> XmlTextWriter writer = new XmlTextWriter(Path.Combine
>> (Application.StartupPath, ApplicationConstants.PreferencesFilename),
>> System.Text.Encoding.Unicode);
>>
>> try
>> {
>> writer.WriteStartElement("Preferences");
>> writer.WriteStartElement("preferedProjectFolder");
>> writer.WriteAttributeString("value",
>> _preferences.PreferedProjectFolder);
>> writer.WriteEndElement();
>> // etc.
>>
>>
>> THIS works. (the Unicode encoding).
>> However when I change that to Default, it doesn't. I even added UTF-8
>> encoding specification to the XML file, no luck.[/color]
>
> No, it wouldn't - for the reasons given above.
>[color=green]
>> Now, the docs state that
>> the codepage of the local system is used with 'default'. I did set the
>> codepage of my system to all kinds of wicked pages, but also no luck.
>> Unicode solved it (obviously). However, 'Default' THUS doesn't work for[/color]
>[color=green]
>> characters other than plain ASCII.[/color]
>
> It does, but not when you've told the XML parser to expect UTF-8 and
> then don't give it UTF-8!
>[color=green][color=darkred]
>> >> In both cases, the files do
>> >> NOT have an XML heading explaining the encoding.
>> >
>> > Notepad isn't going to look at the XML header anyway, of course. I
>> > don't see what the XML header has to do with anything, here, to be
>> > honest. What relevance do you think it has to how a file is opened in[/color][/color]
>[color=green][color=darkred]
>> > notepad?[/color]
>>
>> I wasn't talking about notepad :) I write an XML file and read[/color][/color]
it[color=blue]
>[color=green]
>> back the next time the app starts. It crashed then (it didn't while[/color][/color]
savin[color=blue]
> g[color=green]
>> the XML). However because it is XML, I thought an encoding[/color][/color]
specification[color=blue]
>[color=green]
>> would be better in the XML header. But if you add that (UTF-8) and[/color][/color]
you've[color=blue]
>[color=green]
>> saved with 'Default' the file can't be opened with the XmlTextReader
>> because of some byte encoding issue. (IIRC).[/color]
>
> Yup, that makes perfect sense, in the same way that if you tell someone
> that you're going to talk English and then you talk French they may
> well get confused. You've got to actually use the encoding you specify
> in the XML header.
>[color=green][color=darkred]
>> >> The actual encoding is in
>> >> the bytes in the file (and probably in a meta-data property in[/color][/color][/color]
NTFS).[color=blue]
>[color=green][color=darkred]
>> >
>> > The encoding isn't "in" the bytes of the file - it's perfectly[/color][/color][/color]
possible[color=blue]
>[color=green][color=darkred]
>> > to have a file which means two different things when considered as
>> > being in two different encodings. How would it be in the meta-data
>> > anyway? As far as the file system is concerned, it's just a stream of[/color][/color]
>[color=green][color=darkred]
>> > bytes.[/color]
>>
>> that's what I was thinking too, however the errors I had made[/color][/color]
me[color=blue]
>[color=green]
>> draw that conclusion. However I can be wrong, what I DO know is that
>> characters in extended ascii can't be handled with Encoding.Default.[/color]
>
> a) There's no such thing as "extended ASCII". There are various
> encodings which are 8-bit extensions to ASCII, but they are all
> different, and there's no one true "extended ASCII".
> b) Characters within an ANSI code-page *can* be used if you correctly
> specify the character encoding in the XML header. I suspect that an
> encoding of "windows-1252" would have worked. I haven't tried it
> and I wouldn't recommend it though - I'd just stick to UTF-8.
>[/color]
--
Get LLBLGen Pro, the new O/R mapper for .NET:
http://www.llblgen.com