473,383 Members | 1,874 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

BUG in StreamWriter

Hi,

When constructing StreamWriter with the following..
FileStream f = new FileStream(..);
StreamWriter s = new StreamWriter(f);

Then attempt to write out åäö letters they become garbage.

BUT

If we call StreamWriter as follows...
FileStream f = new FileStream(..);
StreamWriter s = new StreamWriter(f, System.Text.Encoding.Default);

Its ok. So why is default not the actual DEFAULT as it says on the ctor?

It seems to me either the ctor is wrong or the name .Default is misleading.

Thanks.
Jul 21 '05 #1
33 4309
<di********@discussion.microsoft.com> wrote:

When constructing StreamWriter with the following..
FileStream f = new FileStream(..);
StreamWriter s = new StreamWriter(f);

Then attempt to write out åäö letters they become garbage.

BUT

If we call StreamWriter as follows...
FileStream f = new FileStream(..);
StreamWriter s = new StreamWriter(f, System.Text.Encoding.Default);

Its ok. So why is default not the actual DEFAULT as it says on the ctor?

It seems to me either the ctor is wrong or the name .Default is misleading.


..Default is *slightly* misleading, although all the information is in
the documentation. The docs for new StreamWriter(Stream) say:

<quote>
This constructor creates a StreamWriter with UTF-8 encoding whose
GetPreamble method returns an empty byte array. The BaseStream property
is initialized using the stream parameter.
</quote>

However, the brief summary saying that it uses "the default" encoding
is misleading (I'll mail MS about it).

..Default means the default *platform* encoding - but pretty much
everything in .NET itself uses UTF-8 by default.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #2
So UTF8 cant handle umlaut characters it seems then
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
<di********@discussion.microsoft.com> wrote:

When constructing StreamWriter with the following..
FileStream f = new FileStream(..);
StreamWriter s = new StreamWriter(f);

Then attempt to write out åäö letters they become garbage.

BUT

If we call StreamWriter as follows...
FileStream f = new FileStream(..);
StreamWriter s = new StreamWriter(f, System.Text.Encoding.Default);

Its ok. So why is default not the actual DEFAULT as it says on the ctor?

It seems to me either the ctor is wrong or the name .Default is

misleading.

..Default is *slightly* misleading, although all the information is in
the documentation. The docs for new StreamWriter(Stream) say:

<quote>
This constructor creates a StreamWriter with UTF-8 encoding whose
GetPreamble method returns an empty byte array. The BaseStream property
is initialized using the stream parameter.
</quote>

However, the brief summary saying that it uses "the default" encoding
is misleading (I'll mail MS about it).

..Default means the default *platform* encoding - but pretty much
everything in .NET itself uses UTF-8 by default.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #3
<di********@discussion.microsoft.com> wrote:
So UTF8 cant handle umlaut characters it seems then


Yes it can. It's just that whatever you were using to read the file
presumably wasn't aware that it was encoded in UTF-8.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #4
According to windows file system it says ASCII :D

I thought that was standard enough :D Because I used the same format all the
way thru the code and its umlauted ok but when its writing (using the
default ctors) its garbled. I wiped the file, changed it to construct the
SR with Encoding.Default and its saving the umlat charset now, howcome the
usual ctor with FileStream doesnt save umlaut chars then as nowwhere else
did I specify any form of encoding until this change to fix it.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
<di********@discussion.microsoft.com> wrote:
So UTF8 cant handle umlaut characters it seems then


Yes it can. It's just that whatever you were using to read the file
presumably wasn't aware that it was encoded in UTF-8.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #5
<?xml version="1.0" encoding="utf-8"?>

was even defined in the XML file that I got the string from, its even stored
in the String type correctly its just when writing to the file.

Normal calls specified WITHOUT encoding parameters did NOT save the umlaut
chars.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
<di********@discussion.microsoft.com> wrote:
So UTF8 cant handle umlaut characters it seems then


Yes it can. It's just that whatever you were using to read the file
presumably wasn't aware that it was encoded in UTF-8.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #6
Opening the text file in notepad and selecting save as shows its ANSI, not
UTF8- how come the file create when appending does not store the file as
UTF8 then as thats suppost to be the default that you state?

That would cause the mixmatch if the file create is creating as ANSI and all
methods default to UTF8.


<di********@discussion.microsoft.com> wrote in message
news:ev**************@TK2MSFTNGP10.phx.gbl...
<?xml version="1.0" encoding="utf-8"?>

was even defined in the XML file that I got the string from, its even stored in the String type correctly its just when writing to the file.

Normal calls specified WITHOUT encoding parameters did NOT save the umlaut
chars.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
<di********@discussion.microsoft.com> wrote:
So UTF8 cant handle umlaut characters it seems then


Yes it can. It's just that whatever you were using to read the file
presumably wasn't aware that it was encoded in UTF-8.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too


Jul 21 '05 #7
<di********@discussion.microsoft.com> wrote:
According to windows file system it says ASCII :D
What do you mean by "according to the Windows file system"?
I thought that was standard enough :D
ASCII doesn't have any characters with accents.
Because I used the same format all the
way thru the code and its umlauted ok but when its writing (using the
default ctors) its garbled. I wiped the file, changed it to construct the
SR with Encoding.Default and its saving the umlat charset now, howcome the
usual ctor with FileStream doesnt save umlaut chars then as nowwhere else
did I specify any form of encoding until this change to fix it.
It *does* save umlaut characters, it's just that what you're using to
read the file isn't recognising that it's UTF-8. You later say:
Opening the text file in notepad and selecting save as shows its ANSI,
not UTF8


That's just notepad being confused.

UTF-8 works fine, the framework works fine - but some of your tools may
not be doing what you want them to.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information about encodings.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #8
You're right, because notepad isnt standard at all for reading text files.
Nobody in theyre right mind uses it or Wintail etc to view logs. No no not
at all :D

Its fine when i specify Encoding.Default on StreamWriter yet its NOT when I
dont specify ANY encoding anywhere in the app.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
<di********@discussion.microsoft.com> wrote:
According to windows file system it says ASCII :D


What do you mean by "according to the Windows file system"?
I thought that was standard enough :D


ASCII doesn't have any characters with accents.
Because I used the same format all the
way thru the code and its umlauted ok but when its writing (using the
default ctors) its garbled. I wiped the file, changed it to construct the SR with Encoding.Default and its saving the umlat charset now, howcome the usual ctor with FileStream doesnt save umlaut chars then as nowwhere else did I specify any form of encoding until this change to fix it.


It *does* save umlaut characters, it's just that what you're using to
read the file isn't recognising that it's UTF-8. You later say:
Opening the text file in notepad and selecting save as shows its ANSI,
not UTF8


That's just notepad being confused.

UTF-8 works fine, the framework works fine - but some of your tools may
not be doing what you want them to.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information about encodings.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #9
Jon Skeet [C# MVP] <sk***@pobox.com> wrote in
news:MP************************@msnews.microsoft.c om:
<di********@discussion.microsoft.com> wrote:
Because I used the same format all the
way thru the code and its umlauted ok but when its writing (using the
default ctors) its garbled. I wiped the file, changed it to construct
the SR with Encoding.Default and its saving the umlat charset now,
howcome the usual ctor with FileStream doesnt save umlaut chars then as
nowwhere else did I specify any form of encoding until this change to
fix it.


It *does* save umlaut characters, it's just that what you're using to
read the file isn't recognising that it's UTF-8. You later say:


The byte specification in the actual raw data misses UTF-8
specification when you use Default. I was bitten by the same thing. I had
to explicitly state Encoding.Unicode. WHen I used Encoding.Default, it
should work according to the docs, but it didn't. It did save stuff like
scandinavian characters away in the file, but it couldn't read it back
correctly, even if I stated UTF-8 as encoding or whatever in the xml
header. So I think he's right.
Opening the text file in notepad and selecting save as shows its ANSI,
not UTF8


That's just notepad being confused.
UTF-8 works fine, the framework works fine - but some of your tools may
not be doing what you want them to.


If you specify Encoding.Unicode, it will work, if you specify
Encoding.Default it will not in some cases. In both cases, the files do
NOT have an XML heading explaining the encoding. The actual encoding is in
the bytes in the file (and probably in a meta-data property in NTFS). That
specification is not read back / or written correctly when you use
Default. I think that's the reason for his complaint and I have to admit,
he's right, I had exactly the same thing.

Frans

--
Get LLBLGen Pro, the new O/R mapper for .NET: http://www.llblgen.com
Jul 21 '05 #10
I've experienced similar problems too using the default encoding.

--
venlig hilsen / with regards
anders borum
--
Jul 21 '05 #11
And it works if you explicitly state UTF-8 as encoding?

--
venlig hilsen / with regards
anders borum
--
Jul 21 '05 #12
<di********@discussion.microsoft.com> wrote:
You're right, because notepad isnt standard at all for reading text files.
Nobody in theyre right mind uses it or Wintail etc to view logs. No no not
at all :D
That doesn't mean that notepad will automatically detect UTF-8 encoded
files. (I don't know whether or not it can cope with UTF-8 at all.)
Its fine when i specify Encoding.Default on StreamWriter yet its NOT when I
dont specify ANY encoding anywhere in the app.


Yes, as you keep saying. That's because Encoding.Default is the default
ANSI encoding for the platform, but the default if you don't specify
any encoding is UTF-8, as I keep saying.

We seem to be going round and round here - which part are you not
understanding?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #13
Frans Bouma <pe******************@xs4all.nl> wrote:
It *does* save umlaut characters, it's just that what you're using to
read the file isn't recognising that it's UTF-8. You later say:
The byte specification in the actual raw data misses UTF-8
specification when you use Default.


What do you mean by this, exactly?
I was bitten by the same thing. I had
to explicitly state Encoding.Unicode. WHen I used Encoding.Default, it
should work according to the docs, but it didn't. It did save stuff like
scandinavian characters away in the file, but it couldn't read it back
correctly, even if I stated UTF-8 as encoding or whatever in the xml
header. So I think he's right.
I really don't think so - please provide a complete example stating
*exactly* what you expected, and what you got.
Opening the text file in notepad and selecting save as shows its ANSI,
not UTF8


That's just notepad being confused.
UTF-8 works fine, the framework works fine - but some of your tools may
not be doing what you want them to.


If you specify Encoding.Unicode, it will work, if you specify
Encoding.Default it will not in some cases.


That's because notepad can cope with UCS-2 (Unicode) encoding but not
UTF-8.
In both cases, the files do
NOT have an XML heading explaining the encoding.
Notepad isn't going to look at the XML header anyway, of course. I
don't see what the XML header has to do with anything, here, to be
honest. What relevance do you think it has to how a file is opened in
notepad?
The actual encoding is in
the bytes in the file (and probably in a meta-data property in NTFS).
The encoding isn't "in" the bytes of the file - it's perfectly possible
to have a file which means two different things when considered as
being in two different encodings. How would it be in the meta-data
anyway? As far as the file system is concerned, it's just a stream of
bytes.
That specification is not read back / or written correctly when you use
Default. I think that's the reason for his complaint and I have to admit,
he's right, I had exactly the same thing.


I don't think he's write at all. When you say "Default" do you mean
"the default encoding if you don't specify one" or "Encoding.Default"?
I believe both work exactly as intended - but I suspect you're missing
something about the intention.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #14
Anders Borum <na@na.na> wrote:
I've experienced similar problems too using the default encoding.


What problems, exactly? People are being very woolly about what they're
seeing and how they're testing it.

To recap:

o If you don't specify an encoding, you'll get UTF-8
o If you specify Encoding.Default, you'll get the platform's default
encoding (eg Cp437)
o Notepad doesn't understand UTF-8 files, so if you open a UTF-8 file
in it you'll see garbage. This doesn't mean it's not a perfectly
valid UTF-8 file, it just means Notepad is pretty poor.

Now, given the above, what exactly do you think is wrong?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #15
originally I did NOT specify any encoding anywhere and the umlaut åäö chars
where ok everywhere except on the file save.

When I specify Encoding.Default on the StreamWriter with a fresh file ,
everything is ok. If .net defaults to UTF8 if i specify NO encoding, how
come it cant save the chars then?
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Anders Borum <na@na.na> wrote:
I've experienced similar problems too using the default encoding.


What problems, exactly? People are being very woolly about what they're
seeing and how they're testing it.

To recap:

o If you don't specify an encoding, you'll get UTF-8
o If you specify Encoding.Default, you'll get the platform's default
encoding (eg Cp437)
o Notepad doesn't understand UTF-8 files, so if you open a UTF-8 file
in it you'll see garbage. This doesn't mean it's not a perfectly
valid UTF-8 file, it just means Notepad is pretty poor.

Now, given the above, what exactly do you think is wrong?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #16
It affects wintail also, www.wintail.com
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
<di********@discussion.microsoft.com> wrote:
You're right, because notepad isnt standard at all for reading text files. Nobody in theyre right mind uses it or Wintail etc to view logs. No no not at all :D


That doesn't mean that notepad will automatically detect UTF-8 encoded
files. (I don't know whether or not it can cope with UTF-8 at all.)
Its fine when i specify Encoding.Default on StreamWriter yet its NOT when I dont specify ANY encoding anywhere in the app.


Yes, as you keep saying. That's because Encoding.Default is the default
ANSI encoding for the platform, but the default if you don't specify
any encoding is UTF-8, as I keep saying.

We seem to be going round and round here - which part are you not
understanding?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #17
Its not XML file, the XML file only is used as the input string, the actual
output thats being corrupted is a normal text file.

The program had NO reference to encoding (thereby using the default .NET
mechanism) and that was corrupting the output using StreamWriter with
FileStream. The solution to this was to construct the StreamWriter with the
Encoding.Default yet this was my actual issue, why is this default when
infact its not. It was confusing to me and why can the default .NET
mechanism (not specifying encoding) handle umlaut chars correctly (if its
UTF8 as you say).
"Frans Bouma" <pe******************@xs4all.nl> wrote in message
news:Xn*********************************@207.46.24 8.16...
Jon Skeet [C# MVP] <sk***@pobox.com> wrote in
news:MP************************@msnews.microsoft.c om:
<di********@discussion.microsoft.com> wrote:
Because I used the same format all the
way thru the code and its umlauted ok but when its writing (using the
default ctors) its garbled. I wiped the file, changed it to construct
the SR with Encoding.Default and its saving the umlat charset now,
howcome the usual ctor with FileStream doesnt save umlaut chars then as
nowwhere else did I specify any form of encoding until this change to
fix it.


It *does* save umlaut characters, it's just that what you're using to
read the file isn't recognising that it's UTF-8. You later say:


The byte specification in the actual raw data misses UTF-8
specification when you use Default. I was bitten by the same thing. I had
to explicitly state Encoding.Unicode. WHen I used Encoding.Default, it
should work according to the docs, but it didn't. It did save stuff like
scandinavian characters away in the file, but it couldn't read it back
correctly, even if I stated UTF-8 as encoding or whatever in the xml
header. So I think he's right.
Opening the text file in notepad and selecting save as shows its ANSI,
not UTF8


That's just notepad being confused.
UTF-8 works fine, the framework works fine - but some of your tools may
not be doing what you want them to.


If you specify Encoding.Unicode, it will work, if you specify
Encoding.Default it will not in some cases. In both cases, the files do
NOT have an XML heading explaining the encoding. The actual encoding is in
the bytes in the file (and probably in a meta-data property in NTFS). That
specification is not read back / or written correctly when you use
Default. I think that's the reason for his complaint and I have to admit,
he's right, I had exactly the same thing.

Frans

--
Get LLBLGen Pro, the new O/R mapper for .NET: http://www.llblgen.com

Jul 21 '05 #18
Actually with specifying Encoding.Default wintail and notepad correctly show
this characters.

Its the actual C# save that it doesnt.

<di********@discussion.microsoft.com> wrote in message
news:%2******************@TK2MSFTNGP10.phx.gbl...
It affects wintail also, www.wintail.com
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
<di********@discussion.microsoft.com> wrote:
You're right, because notepad isnt standard at all for reading text files. Nobody in theyre right mind uses it or Wintail etc to view logs. No no not at all :D


That doesn't mean that notepad will automatically detect UTF-8 encoded
files. (I don't know whether or not it can cope with UTF-8 at all.)
Its fine when i specify Encoding.Default on StreamWriter yet its NOT when I dont specify ANY encoding anywhere in the app.


Yes, as you keep saying. That's because Encoding.Default is the default
ANSI encoding for the platform, but the default if you don't specify
any encoding is UTF-8, as I keep saying.

We seem to be going round and round here - which part are you not
understanding?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too


Jul 21 '05 #19
<di********@discussion.microsoft.com> wrote:
originally I did NOT specify any encoding anywhere and the umlaut åäö chars
where ok everywhere except on the file save.

When I specify Encoding.Default on the StreamWriter with a fresh file ,
everything is ok. If .net defaults to UTF8 if i specify NO encoding, how
come it cant save the chars then?


It can. It's just that the tool you're using to check for them can't
read them.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #20
<di********@discussion.microsoft.com> wrote:
It affects wintail also, www.wintail.com


That doesn't mean .NET isn't writing it properly though...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #21
<di********@discussion.microsoft.com> wrote:
Its not XML file, the XML file only is used as the input string, the actual
output thats being corrupted is a normal text file.

The program had NO reference to encoding (thereby using the default .NET
mechanism) and that was corrupting the output using StreamWriter with
FileStream. The solution to this was to construct the StreamWriter with the
Encoding.Default yet this was my actual issue, why is this default when
infact its not.
It's not the default for StreamWriter, it's the default encoding for
the Windows box you're running it on.
It was confusing to me and why can the default .NET
mechanism (not specifying encoding) handle umlaut chars correctly (if its
UTF8 as you say).


It can. You just can't read it properly.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #22
Internet explorer displays it as äåö

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
<di********@discussion.microsoft.com> wrote:
Its not XML file, the XML file only is used as the input string, the actual output thats being corrupted is a normal text file.

The program had NO reference to encoding (thereby using the default .NET
mechanism) and that was corrupting the output using StreamWriter with
FileStream. The solution to this was to construct the StreamWriter with the Encoding.Default yet this was my actual issue, why is this default when
infact its not.


It's not the default for StreamWriter, it's the default encoding for
the Windows box you're running it on.
It was confusing to me and why can the default .NET
mechanism (not specifying encoding) handle umlaut chars correctly (if its UTF8 as you say).


It can. You just can't read it properly.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #23
Ok, notepad shows it ok, so does the VS editor

Wintail and INTERNET EXPLORER (which is suprising) does not.
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
<di********@discussion.microsoft.com> wrote:
Its not XML file, the XML file only is used as the input string, the actual output thats being corrupted is a normal text file.

The program had NO reference to encoding (thereby using the default .NET
mechanism) and that was corrupting the output using StreamWriter with
FileStream. The solution to this was to construct the StreamWriter with the Encoding.Default yet this was my actual issue, why is this default when
infact its not.


It's not the default for StreamWriter, it's the default encoding for
the Windows box you're running it on.
It was confusing to me and why can the default .NET
mechanism (not specifying encoding) handle umlaut chars correctly (if its UTF8 as you say).


It can. You just can't read it properly.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #24
<di********@discussion.microsoft.com> wrote:
Actually with specifying Encoding.Default wintail and notepad correctly show
this characters.
Yes, because they're assuming the Windows default encoding.
Its the actual C# save that it doesnt.


<sigh>

How many times do I need to explain it? C# is working fine - it's just
that your tools don't understand UTF-8. Find a text editor which lets
you pick a UTF-8 encoding, and load the file - you'll see the
characters just fine.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #25
<di********@discussion.microsoft.com> wrote:
Internet explorer displays it as äåö


Internet Explorer is probably assuming the Windows default encoding as
well.

How well do you actually understand encodings? You might like to read
http://www.pobox.com/~skeet/csharp/unicode.html for more information.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #26
Nothing like plugging your own site.

<sigh> if you get tired of explaining nobody forces you to reply to each and
every post out there, no need to step on others to make your ego bigger. Ive
seen you post before, you do the same every time.
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
<di********@discussion.microsoft.com> wrote:
Internet explorer displays it as äåö


Internet Explorer is probably assuming the Windows default encoding as
well.

How well do you actually understand encodings? You might like to read
http://www.pobox.com/~skeet/csharp/unicode.html for more information.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #27
<di********@discussion.microsoft.com> wrote:
Nothing like plugging your own site.
I wrote that page (and various others) to save me from explaining
things in detail repeatedly. It's not like I get money from them or
anything - they're just meant to be helpful.
<sigh> if you get tired of explaining nobody forces you to reply to each and
every post out there, no need to step on others to make your ego bigger. Ive
seen you post before, you do the same every time.


I don't reply to each and every post out there, but it *is*
disconcerting when people clearly don't really read answers. This
thread is pointless - no-one's really saying what .NET is supposedly
doing wrong except in terms of what Notepad/Wintail etc can cope with.
I've explained what's going on numerous times now...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #28
<sigh>
<di********@discussion.microsoft.com> wrote in message
news:el**************@TK2MSFTNGP12.phx.gbl...
Nothing like plugging your own site.

<sigh> if you get tired of explaining nobody forces you to reply to each and every post out there, no need to step on others to make your ego bigger. Ive seen you post before, you do the same every time.
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
<di********@discussion.microsoft.com> wrote:
Internet explorer displays it as äåö


Internet Explorer is probably assuming the Windows default encoding as
well.

How well do you actually understand encodings? You might like to read
http://www.pobox.com/~skeet/csharp/unicode.html for more information.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #29
Discussion,

if all you want to do is complain about the people who are here to help you,
perhaps you would feel more at home at this forum instead:
alt.complainers.bitch-n-moan

Others here use this forum as help, and is invaluable for their jobs. I for
one find you quite distasteful to say the least and am asking nicely for you
to be, at the very least respectfull, to the people who take the time to
answer your questions.

Marco.
<di********@discussion.microsoft.com> wrote in message
news:el**************@TK2MSFTNGP12.phx.gbl...
Nothing like plugging your own site.

<sigh> if you get tired of explaining nobody forces you to reply to each and every post out there, no need to step on others to make your ego bigger. Ive seen you post before, you do the same every time.
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
<di********@discussion.microsoft.com> wrote:
Internet explorer displays it as äåö


Internet Explorer is probably assuming the Windows default encoding as
well.

How well do you actually understand encodings? You might like to read
http://www.pobox.com/~skeet/csharp/unicode.html for more information.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #30
Jon Skeet [C# MVP] <sk***@pobox.com> wrote in
news:MP************************@msnews.microsoft.c om:
Frans Bouma <pe******************@xs4all.nl> wrote:
> It *does* save umlaut characters, it's just that what you're using to
> read the file isn't recognising that it's UTF-8. You later say:


The byte specification in the actual raw data misses UTF-8
specification when you use Default.


What do you mean by this, exactly?


that I had the same XML data in the file, one written away with
Encoding.Default and the other with Encoding.Unicode. Both looked the same
in notepad, I had NO encoding specifcation. however one couldn't be loaded
due to a an 'ae' character, the other one could be loaded (or better: be
serialized back). I found this very odd, because there was NO encoding
specifier in the XML, so the encoding has to be stored somewhere else.
I was bitten by the same thing. I had
to explicitly state Encoding.Unicode. WHen I used Encoding.Default, it
should work according to the docs, but it didn't. It did save stuff like scandinavian characters away in the file, but it couldn't read it back
correctly, even if I stated UTF-8 as encoding or whatever in the xml
header. So I think he's right.


I really don't think so - please provide a complete example stating
*exactly* what you expected, and what you got.


write:
XmlTextWriter writer = new XmlTextWriter(Path.Combine
(Application.StartupPath, ApplicationConstants.PreferencesFilename),
System.Text.Encoding.Unicode);

try
{
writer.WriteStartElement("Preferences");
writer.WriteStartElement("preferedProjectFolder");
writer.WriteAttributeString("value",
_preferences.PreferedProjectFolder);
writer.WriteEndElement();
// etc.
THIS works. (the Unicode encoding).
However when I change that to Default, it doesn't. I even added UTF-8
encoding specification to the XML file, no luck. Now, the docs state that
the codepage of the local system is used with 'default'. I did set the
codepage of my system to all kinds of wicked pages, but also no luck.
Unicode solved it (obviously). However, 'Default' THUS doesn't work for
characters other than plain ASCII.

read:
XmlTextReader reader = new XmlTextReader(Path.Combine
(Application.StartupPath, ApplicationConstants.PreferencesFilename));

try
{
// Read the nodes and store the values as they are found in the
preferences object.
while(reader.Read())
{
switch(reader.Name)
{
case "preferedProjectFolder":
_preferences.PreferedProjectFolder =
reader.GetAttribute("value"); // <-- crash here, character could not be
loaded. Character was a scandinavian character 'ae' (combined to 1 char).
break;
// etc..
In both cases, the files do
NOT have an XML heading explaining the encoding.


Notepad isn't going to look at the XML header anyway, of course. I
don't see what the XML header has to do with anything, here, to be
honest. What relevance do you think it has to how a file is opened in
notepad?


I wasn't talking about notepad :) I write an XML file and read it
back the next time the app starts. It crashed then (it didn't while saving
the XML). However because it is XML, I thought an encoding specification
would be better in the XML header. But if you add that (UTF-8) and you've
saved with 'Default' the file can't be opened with the XmlTextReader
because of some byte encoding issue. (IIRC).
The actual encoding is in
the bytes in the file (and probably in a meta-data property in NTFS).


The encoding isn't "in" the bytes of the file - it's perfectly possible
to have a file which means two different things when considered as
being in two different encodings. How would it be in the meta-data
anyway? As far as the file system is concerned, it's just a stream of
bytes.


that's what I was thinking too, however the errors I had made me
draw that conclusion. However I can be wrong, what I DO know is that
characters in extended ascii can't be handled with Encoding.Default.

FB
--
Get LLBLGen Pro, the new O/R mapper for .NET: http://www.llblgen.com
Jul 21 '05 #31
Cor
Hi Jon,

A question to you.
I was seeing (not following) your impossible strugle to do this right.
I complete agree with the message from Marco Martin

But all this post is cross posted.
And although you did your best and maybe it is usefull but because of the
reactions became trash.

Maybe you can delete the next time the newsgroups from which you are not
answering when it is this kind of answers you get.

:-))

Cor
Jul 21 '05 #32
Frans Bouma <pe******************@xs4all.nl> wrote:
What do you mean by this, exactly?
that I had the same XML data in the file, one written away with
Encoding.Default and the other with Encoding.Unicode. Both looked the same
in notepad, I had NO encoding specifcation. however one couldn't be loaded
due to a an 'ae' character, the other one could be loaded (or better: be
serialized back). I found this very odd, because there was NO encoding
specifier in the XML, so the encoding has to be stored somewhere else.


It's not odd add all - it would have been preferable to have the
encoding specifier in the XML, but Notepad wouldn't have used it
anyway.

In fact, it seem that Notepad on XP *does* read UTF-8 files. If you use
the following code:

using System;
using System.IO;
using System.Text;

public class Test
{
static void Main()
{
using (StreamWriter sw = new StreamWriter ("test.txt"))
{
sw.WriteLine ("\u00e9");
}
}
}

to generate a file test.txt, which has contents 0xc9 0xa9 0x0d 0x0a,
then if you open it in Notepad with encoding UTF-8, it correctly
displays an e-acute. If you open it in Notepad with encoding ANSI, it
displays é (again, correctly).

Now, if your XML didn't include an encoding specifier, the XML parser
should have assumed UTF-8. If you used Encoding.Default (instead of
UTF-8) then you would indeed get an error if the file was not a valid
UTF-8 file. From the XML specification:

<quote>
In the absence of information provided by an external transport
protocol (e.g. HTTP or MIME), it is an error for an entity including an
encoding declaration to be presented to the XML processor in an
encoding other than that named in the declaration, or for an entity
which begins with neither a Byte Order Mark nor an encoding declaration
to use an encoding other than UTF-8.
</quote>

When you used the Unicode encoding, I suspect you got a byte-order mark
which allowed the parser to tell that it was using that encoding.
I was bitten by the same thing. I had
to explicitly state Encoding.Unicode. WHen I used Encoding.Default, it
should work according to the docs, but it didn't. It did save stuff like scandinavian characters away in the file, but it couldn't read it back
correctly, even if I stated UTF-8 as encoding or whatever in the xml
header. So I think he's right.


I really don't think so - please provide a complete example stating
*exactly* what you expected, and what you got.


write:
XmlTextWriter writer = new XmlTextWriter(Path.Combine
(Application.StartupPath, ApplicationConstants.PreferencesFilename),
System.Text.Encoding.Unicode);

try
{
writer.WriteStartElement("Preferences");
writer.WriteStartElement("preferedProjectFolder");
writer.WriteAttributeString("value",
_preferences.PreferedProjectFolder);
writer.WriteEndElement();
// etc.


THIS works. (the Unicode encoding).
However when I change that to Default, it doesn't. I even added UTF-8
encoding specification to the XML file, no luck.


No, it wouldn't - for the reasons given above.
Now, the docs state that
the codepage of the local system is used with 'default'. I did set the
codepage of my system to all kinds of wicked pages, but also no luck.
Unicode solved it (obviously). However, 'Default' THUS doesn't work for
characters other than plain ASCII.


It does, but not when you've told the XML parser to expect UTF-8 and
then don't give it UTF-8!
In both cases, the files do
NOT have an XML heading explaining the encoding.


Notepad isn't going to look at the XML header anyway, of course. I
don't see what the XML header has to do with anything, here, to be
honest. What relevance do you think it has to how a file is opened in
notepad?


I wasn't talking about notepad :) I write an XML file and read it
back the next time the app starts. It crashed then (it didn't while saving
the XML). However because it is XML, I thought an encoding specification
would be better in the XML header. But if you add that (UTF-8) and you've
saved with 'Default' the file can't be opened with the XmlTextReader
because of some byte encoding issue. (IIRC).


Yup, that makes perfect sense, in the same way that if you tell someone
that you're going to talk English and then you talk French they may
well get confused. You've got to actually use the encoding you specify
in the XML header.
The actual encoding is in
the bytes in the file (and probably in a meta-data property in NTFS).


The encoding isn't "in" the bytes of the file - it's perfectly possible
to have a file which means two different things when considered as
being in two different encodings. How would it be in the meta-data
anyway? As far as the file system is concerned, it's just a stream of
bytes.


that's what I was thinking too, however the errors I had made me
draw that conclusion. However I can be wrong, what I DO know is that
characters in extended ascii can't be handled with Encoding.Default.


a) There's no such thing as "extended ASCII". There are various
encodings which are 8-bit extensions to ASCII, but they are all
different, and there's no one true "extended ASCII".
b) Characters within an ANSI code-page *can* be used if you correctly
specify the character encoding in the XML header. I suspect that an
encoding of "windows-1252" would have worked. I haven't tried it
and I wouldn't recommend it though - I'd just stick to UTF-8.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #33
Jon Skeet [C# MVP] <sk***@pobox.com> wrote in
news:MP************************@msnews.microsoft.c om:

Ok Thanks Jon, for clearing that up. :)

Frans

Frans Bouma <pe******************@xs4all.nl> wrote:
> What do you mean by this, exactly?
that I had the same XML data in the file, one written away with
Encoding.Default and the other with Encoding.Unicode. Both looked the

sam e
in notepad, I had NO encoding specifcation. however one couldn't be
loade d
due to a an 'ae' character, the other one could be loaded (or better:
be
serialized back). I found this very odd, because there was NO encoding
specifier in the XML, so the encoding has to be stored somewhere else.
It's not odd add all - it would have been preferable to have the
encoding specifier in the XML, but Notepad wouldn't have used it
anyway.

In fact, it seem that Notepad on XP *does* read UTF-8 files. If you use
the following code:

using System;
using System.IO;
using System.Text;

public class Test
{
static void Main()
{
using (StreamWriter sw = new StreamWriter ("test.txt"))
{
sw.WriteLine ("\u00e9");
}
}
}

to generate a file test.txt, which has contents 0xc9 0xa9 0x0d 0x0a,
then if you open it in Notepad with encoding UTF-8, it correctly
displays an e-acute. If you open it in Notepad with encoding ANSI, it
displays é (again, correctly).

Now, if your XML didn't include an encoding specifier, the XML parser
should have assumed UTF-8. If you used Encoding.Default (instead of
UTF-8) then you would indeed get an error if the file was not a valid
UTF-8 file. From the XML specification:

<quote>
In the absence of information provided by an external transport
protocol (e.g. HTTP or MIME), it is an error for an entity including an
encoding declaration to be presented to the XML processor in an
encoding other than that named in the declaration, or for an entity
which begins with neither a Byte Order Mark nor an encoding declaration
to use an encoding other than UTF-8.
</quote>

When you used the Unicode encoding, I suspect you got a byte-order mark
which allowed the parser to tell that it was using that encoding.
>> I was bitten by the same thing. I had
>> to explicitly state Encoding.Unicode. WHen I used Encoding.Default,
it >> should work according to the docs, but it didn't. It did save stuff like
>> scandinavian characters away in the file, but it couldn't read it
back >> correctly, even if I stated UTF-8 as encoding or whatever in the xml >> header. So I think he's right.
>
> I really don't think so - please provide a complete example stating
> *exactly* what you expected, and what you got.
write:
XmlTextWriter writer = new XmlTextWriter(Path.Combine
(Application.StartupPath, ApplicationConstants.PreferencesFilename),
System.Text.Encoding.Unicode);

try
{
writer.WriteStartElement("Preferences");
writer.WriteStartElement("preferedProjectFolder");
writer.WriteAttributeString("value",
_preferences.PreferedProjectFolder);
writer.WriteEndElement();
// etc.
THIS works. (the Unicode encoding).
However when I change that to Default, it doesn't. I even added UTF-8
encoding specification to the XML file, no luck.


No, it wouldn't - for the reasons given above.
Now, the docs state that
the codepage of the local system is used with 'default'. I did set the
codepage of my system to all kinds of wicked pages, but also no luck.
Unicode solved it (obviously). However, 'Default' THUS doesn't work for

characters other than plain ASCII.


It does, but not when you've told the XML parser to expect UTF-8 and
then don't give it UTF-8!
>> In both cases, the files do
>> NOT have an XML heading explaining the encoding.
>
> Notepad isn't going to look at the XML header anyway, of course. I
> don't see what the XML header has to do with anything, here, to be
> honest. What relevance do you think it has to how a file is opened in > notepad?


I wasn't talking about notepad :) I write an XML file and read it
back the next time the app starts. It crashed then (it didn't while
savin g
the XML). However because it is XML, I thought an encoding
specification
would be better in the XML header. But if you add that (UTF-8) and
you've
saved with 'Default' the file can't be opened with the XmlTextReader
because of some byte encoding issue. (IIRC).
Yup, that makes perfect sense, in the same way that if you tell someone
that you're going to talk English and then you talk French they may
well get confused. You've got to actually use the encoding you specify
in the XML header.
>> The actual encoding is in
>> the bytes in the file (and probably in a meta-data property in
NTFS). >
> The encoding isn't "in" the bytes of the file - it's perfectly
possible
> to have a file which means two different things when considered as
> being in two different encodings. How would it be in the meta-data
> anyway? As far as the file system is concerned, it's just a stream of > bytes.
that's what I was thinking too, however the errors I had made

me
draw that conclusion. However I can be wrong, what I DO know is that
characters in extended ascii can't be handled with Encoding.Default.


a) There's no such thing as "extended ASCII". There are various
encodings which are 8-bit extensions to ASCII, but they are all
different, and there's no one true "extended ASCII".
b) Characters within an ANSI code-page *can* be used if you correctly
specify the character encoding in the XML header. I suspect that an
encoding of "windows-1252" would have worked. I haven't tried it
and I wouldn't recommend it though - I'd just stick to UTF-8.


--
Get LLBLGen Pro, the new O/R mapper for .NET: http://www.llblgen.com
Jul 21 '05 #34

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Vladimir Bezugliy | last post by:
I have a server that listen a socket. And I have next client: TcpClient socketForServer = new TcpClient("IVBEZUGLIY", 21000); NetworkStream networkStream = socketForServer.GetStream();...
4
by: Majed | last post by:
Hi , all I'm trying to write unicode to a file for another app (not developed with vs2003) to read it. I used StreamWriter with unicode encoding.but I was surprised that the streamwriter adds FFFE...
1
by: Daniel | last post by:
i would like to konw when the data sent so that i can close the streamwriter and networkstream is there some sort of call backs/events i have to implement for this to work? if so how? can i just...
1
by: Lars Hansen | last post by:
Hi This is probably pretty basic, but I have a problem with the access-level (local variable), when creating a new StreamWriter. I am trying to write some price information to a textfile - which...
9
by: ShadowOfTheBeast | last post by:
Hi, I have got a major headache understanding streamReader and streamWriter relationship. I know how to use the streamreader and streamwriter independently. but how do you write out using the...
4
by: rex64 | last post by:
I am getting an error message and I have not been able to figure hot how to fix it. I have done some research with no answers yet. I found this code that may help? Not sure what to do with it....
10
by: Oscar Thornell | last post by:
Hi, I generate and temporary saves a text file to disk. Later I upload this file to Microsoft MapPoint (not so important). The file needs to be in UTF-8 encoding and I explicitly use the...
1
by: Max Powers | last post by:
Hello, I have a VB.NET code that creates a StreamWriter file first and then it does a series of procedures that write lines for this StreamWriter. When all the procedures are done, the...
4
by: Heron | last post by:
Hi, Could someone explain me why the following code doesn't work? The memorystream always remains with length 0. MemoryStream input = new MemoryStream();
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.