473,322 Members | 1,405 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

Encoding.Default - reliable ???

I am having alot of difficulty with text files in .NET when they have special
characters like Ã*, ó, ç etc...

When i read a text file with them and then write it back out it ignores all
of those characters completely.

I tried all the encoding types and it seems only Encoding.Default does it
right... but it sounds dangerous... can I rely on Encoding.Default behaving
like this for all other machines?
Nov 2 '06 #1
7 5748
Thus wrote MrNobody,
I am having alot of difficulty with text files in .NET when they have
special characters like í, ó, ç etc...

When i read a text file with them and then write it back out it
ignores all of those characters completely.

I tried all the encoding types and it seems only Encoding.Default does
it right...
The question is how do you check the resulting file? It's pretty likely that
your editor wasn't up to the task to decode the file correctly.
but it sounds dangerous... can I rely on Encoding.Default
behaving like this for all other machines?
Depends on what reach you require, but around the globe certainly no. UTF-8
or UTF-16 are much better choices.

Cheers,
--
Joerg Jooss
ne********@joergjooss.de
Nov 2 '06 #2
Hi Nobody,

From the Docs of the .NET Framework Encoding.Default is the current
ANSI-CodePage. So it can easily change, for all if the application runs on
another system.
Encoding.Unicode and Encoding.UTF8 can encode any Character so they should
work fine.

"MrNobody" <Mr******@discussions.microsoft.comschrieb im Newsbeitrag
news:90**********************************@microsof t.com...
>I am having alot of difficulty with text files in .NET when they have
special
characters like í, ó, ç etc...

When i read a text file with them and then write it back out it ignores
all
of those characters completely.

I tried all the encoding types and it seems only Encoding.Default does it
right... but it sounds dangerous... can I rely on Encoding.Default
behaving
like this for all other machines?

Nov 2 '06 #3
OK, but I have a problem when I use either Encoding.Unicode or Encoding.UTF8.

string src = // path to source file
string tgt = // path where to write new file to

string txt = System.IO.File.ReadAllText(src, Encoding.Unicode);

Console.WriteLine("index = " + txt.IndexOf("something"));

System.IO.File.WriteAllText(tgt, txt, Encoding.UTF8); // or Encoding.Unicode
When I run that code, the index is always -1 for a string which is
definitely in the file and the file it prints out has complete data loss, it
is just full of these little black boxes...
Nov 2 '06 #4
How did you generate the textfile beforehand? Obviously it is not stored in
UTF-16 encoding. If you used Nodepad an simply hit 'save' i guess it is
stored in the standard codpage of your system. So Encoding.Default would be
right here. But if the textfile was created on a machine with a different
default codepage even this will not work. But in Nodepad you con choose the
encoding in the 'save as'-dialog. Other editors will have similar features.

In any case you will first have to know, in wich encoding the file was
stored. Alas there is no general way to detect it. At least i don't know.

The best situation is, when you can agree with the creator of the sourcefile
about the encoding. The next best situation is, someone knows the encoding
used while creating the file.

"MrNobody" <Mr******@discussions.microsoft.comschrieb im Newsbeitrag
news:6C**********************************@microsof t.com...
OK, but I have a problem when I use either Encoding.Unicode or
Encoding.UTF8.

string src = // path to source file
string tgt = // path where to write new file to

string txt = System.IO.File.ReadAllText(src, Encoding.Unicode);

Console.WriteLine("index = " + txt.IndexOf("something"));

System.IO.File.WriteAllText(tgt, txt, Encoding.UTF8); // or
Encoding.Unicode
When I run that code, the index is always -1 for a string which is
definitely in the file and the file it prints out has complete data loss,
it
is just full of these little black boxes...


Nov 2 '06 #5
Well, all I know is the files are created in Windows machines only., using
such programs as Notepad.

And this app is tagretted for Windows machines only, but it will be used by
people accross the globe who may need those special characters.

So is it safe then given these restrictions to rely on Encoding.Default?

I still really don't understand why I can get the files to read/write OK
using Encoding.Default but using any of the specific encodings fail...
Nov 2 '06 #6
MrNobody <Mr******@discussions.microsoft.comwrote:
Well, all I know is the files are created in Windows machines only., using
such programs as Notepad.

And this app is tagretted for Windows machines only, but it will be used by
people accross the globe who may need those special characters.

So is it safe then given these restrictions to rely on Encoding.Default?

I still really don't understand why I can get the files to read/write OK
using Encoding.Default but using any of the specific encodings fail...
You won't be able to correctly read a file unless you know its
encoding. For instance, if you try to read a UTF-8 encoded file using
Encoding.Default, then any characters outside the ASCII range are
likely to end up being corrupted.

It sounds like you might be a bit fuzzy on what encodings are about.
See if
http://www.pobox.com/~skeet/csharp/unicode.html helps.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 2 '06 #7
Thus wrote MrNobody,
OK, but I have a problem when I use either Encoding.Unicode or
Encoding.UTF8.

string src = // path to source file
string tgt = // path where to write new file to
string txt = System.IO.File.ReadAllText(src, Encoding.Unicode);

Console.WriteLine("index = " + txt.IndexOf("something"));

System.IO.File.WriteAllText(tgt, txt, Encoding.UTF8); // or
Encoding.Unicode

When I run that code, the index is always -1 for a string which is
definitely in the file and the file it prints out has complete data
loss, it is just full of these little black boxes...
If you're consuming text files that have been authored outside of your application,
you have to use the encoding that was used to create the file in order to
read it.

Notepad for example can create both UTF-8 and UTF-16 encoded files, but neither
is its default encoding. So if you've created your test files in Notepad
without considering the encoding, they will end up encoded as something that
is compatible with or equal to Encoding.Default.

Cheers,
--
Joerg Jooss
ne********@joergjooss.de
Nov 3 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
by: Sebastian Meyer | last post by:
Hi newsgroup, i am trying to replace german special characters in strings like str = re.sub('ö', 'oe', str) When i work with this, i always get the message UniCode Error: ASCII decoding error...
6
by: Duncan M | last post by:
Hi all, I have seen a similar post to this in the past but no resolution. I will explain fully my problem: I am writing a text editor that will be used in several regions around the world but...
2
by: CMan | last post by:
Hi, I am reading a text file using a StreamReader in C# but the reader is unable to handle some of the characheters. Using the default encoding the program cannot handle accented characters. I...
88
by: Mike | last post by:
Is there a way to determine what a user's default email client is? I read a post from 3 years ago that said no. I guess I'm hoping something has come along since then.
4
by: fitsch | last post by:
Hi, I am trying to write a generic RSS/Atom/OPML feed client. The problem is, that those xml feeds may have different encodings: - <?xml version="1.0" encoding="ISO-8859-1" ?>... - <?xml...
9
by: Mark | last post by:
I've run a few simple tests looking at how query string encoding/decoding gets handled in asp.net, and it seems like the situation is even messier than it was in asp... Can't say I think much of the...
4
by: I.Charitopoulos | last post by:
The reason I want to do so, is that I am sending to DOS and I am pretty certain that it will not work. Everything I've tried so far hasnt. In my test environment (Windows to Windows) this works...
11
by: LucaJonny | last post by:
Hi, I've got a problem using StreamReader in VB.NET. I try to read a txt file that contains extended characters and theese are removed from the line that is being read. I've read a lot of...
4
by: George | last post by:
Hi, I am puzzled by the following and seeking some assistance to help me understand what happened. I have very limited encoding knowledge. Our SAP system writes out a text file which includes...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.