By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,474 Members | 966 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,474 IT Pros & Developers. It's quick & easy.

Encoding.Default - reliable ???

P: n/a
I am having alot of difficulty with text files in .NET when they have special
characters like *, ó, ç etc...

When i read a text file with them and then write it back out it ignores all
of those characters completely.

I tried all the encoding types and it seems only Encoding.Default does it
right... but it sounds dangerous... can I rely on Encoding.Default behaving
like this for all other machines?
Nov 2 '06 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Thus wrote MrNobody,
I am having alot of difficulty with text files in .NET when they have
special characters like , , etc...

When i read a text file with them and then write it back out it
ignores all of those characters completely.

I tried all the encoding types and it seems only Encoding.Default does
it right...
The question is how do you check the resulting file? It's pretty likely that
your editor wasn't up to the task to decode the file correctly.
but it sounds dangerous... can I rely on Encoding.Default
behaving like this for all other machines?
Depends on what reach you require, but around the globe certainly no. UTF-8
or UTF-16 are much better choices.

Cheers,
--
Joerg Jooss
ne********@joergjooss.de
Nov 2 '06 #2

P: n/a
Hi Nobody,

From the Docs of the .NET Framework Encoding.Default is the current
ANSI-CodePage. So it can easily change, for all if the application runs on
another system.
Encoding.Unicode and Encoding.UTF8 can encode any Character so they should
work fine.

"MrNobody" <Mr******@discussions.microsoft.comschrieb im Newsbeitrag
news:90**********************************@microsof t.com...
>I am having alot of difficulty with text files in .NET when they have
special
characters like , , etc...

When i read a text file with them and then write it back out it ignores
all
of those characters completely.

I tried all the encoding types and it seems only Encoding.Default does it
right... but it sounds dangerous... can I rely on Encoding.Default
behaving
like this for all other machines?

Nov 2 '06 #3

P: n/a
OK, but I have a problem when I use either Encoding.Unicode or Encoding.UTF8.

string src = // path to source file
string tgt = // path where to write new file to

string txt = System.IO.File.ReadAllText(src, Encoding.Unicode);

Console.WriteLine("index = " + txt.IndexOf("something"));

System.IO.File.WriteAllText(tgt, txt, Encoding.UTF8); // or Encoding.Unicode
When I run that code, the index is always -1 for a string which is
definitely in the file and the file it prints out has complete data loss, it
is just full of these little black boxes...
Nov 2 '06 #4

P: n/a
How did you generate the textfile beforehand? Obviously it is not stored in
UTF-16 encoding. If you used Nodepad an simply hit 'save' i guess it is
stored in the standard codpage of your system. So Encoding.Default would be
right here. But if the textfile was created on a machine with a different
default codepage even this will not work. But in Nodepad you con choose the
encoding in the 'save as'-dialog. Other editors will have similar features.

In any case you will first have to know, in wich encoding the file was
stored. Alas there is no general way to detect it. At least i don't know.

The best situation is, when you can agree with the creator of the sourcefile
about the encoding. The next best situation is, someone knows the encoding
used while creating the file.

"MrNobody" <Mr******@discussions.microsoft.comschrieb im Newsbeitrag
news:6C**********************************@microsof t.com...
OK, but I have a problem when I use either Encoding.Unicode or
Encoding.UTF8.

string src = // path to source file
string tgt = // path where to write new file to

string txt = System.IO.File.ReadAllText(src, Encoding.Unicode);

Console.WriteLine("index = " + txt.IndexOf("something"));

System.IO.File.WriteAllText(tgt, txt, Encoding.UTF8); // or
Encoding.Unicode
When I run that code, the index is always -1 for a string which is
definitely in the file and the file it prints out has complete data loss,
it
is just full of these little black boxes...


Nov 2 '06 #5

P: n/a
Well, all I know is the files are created in Windows machines only., using
such programs as Notepad.

And this app is tagretted for Windows machines only, but it will be used by
people accross the globe who may need those special characters.

So is it safe then given these restrictions to rely on Encoding.Default?

I still really don't understand why I can get the files to read/write OK
using Encoding.Default but using any of the specific encodings fail...
Nov 2 '06 #6

P: n/a
MrNobody <Mr******@discussions.microsoft.comwrote:
Well, all I know is the files are created in Windows machines only., using
such programs as Notepad.

And this app is tagretted for Windows machines only, but it will be used by
people accross the globe who may need those special characters.

So is it safe then given these restrictions to rely on Encoding.Default?

I still really don't understand why I can get the files to read/write OK
using Encoding.Default but using any of the specific encodings fail...
You won't be able to correctly read a file unless you know its
encoding. For instance, if you try to read a UTF-8 encoded file using
Encoding.Default, then any characters outside the ASCII range are
likely to end up being corrupted.

It sounds like you might be a bit fuzzy on what encodings are about.
See if
http://www.pobox.com/~skeet/csharp/unicode.html helps.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 2 '06 #7

P: n/a
Thus wrote MrNobody,
OK, but I have a problem when I use either Encoding.Unicode or
Encoding.UTF8.

string src = // path to source file
string tgt = // path where to write new file to
string txt = System.IO.File.ReadAllText(src, Encoding.Unicode);

Console.WriteLine("index = " + txt.IndexOf("something"));

System.IO.File.WriteAllText(tgt, txt, Encoding.UTF8); // or
Encoding.Unicode

When I run that code, the index is always -1 for a string which is
definitely in the file and the file it prints out has complete data
loss, it is just full of these little black boxes...
If you're consuming text files that have been authored outside of your application,
you have to use the encoding that was used to create the file in order to
read it.

Notepad for example can create both UTF-8 and UTF-16 encoded files, but neither
is its default encoding. So if you've created your test files in Notepad
without considering the encoding, they will end up encoded as something that
is compatible with or equal to Encoding.Default.

Cheers,
--
Joerg Jooss
ne********@joergjooss.de
Nov 3 '06 #8

This discussion thread is closed

Replies have been disabled for this discussion.