Bug in CurrentEncoding.EncodingName?

Nick

Hi,

I have used the following code to test the encoding of a file :
public string DetermineFileType(string aFileName)
{
string sEncoding = string.Empty;

StreamReader oSR = new StreamReader(aFileName, true);
oSR.ReadToEnd(); // Add this line to read the file.
sEncoding = oSR.CurrentEncoding.EncodingName;

return sEncoding;
}
from:
http://groups.google.com.hk/groups?h...3DN%26tab%3Dwg

But the encoding is always showing Unicode? What's wrong?

Thanks

Nick

Nov 16 '05 #1

Subscribe Post Reply

3923

Jon Skeet [C# MVP]

Nick <ni*****@heha.net.tw> wrote:

I have used the following code to test the encoding of a file :

public string DetermineFileType(string aFileName)
{
string sEncoding = string.Empty;

StreamReader oSR = new StreamReader(aFileName, true);
oSR.ReadToEnd(); // Add this line to read the file.
sEncoding = oSR.CurrentEncoding.EncodingName;

return sEncoding;
}
from:
http://groups.google.com.hk/groups?h...8&threadm=258t
005q76lpof86nsbqv4f0o2d66sba20%404ax.com&rnum=1&pr ev=/groups%3Fq%3Dc%2
523%2520detect%2520file%2520encoding%26hl%3Dzh-TW%26lr%3D%26ie%3DUTF-8
%26sa%3DN%26tab%3Dwg

But the encoding is always showing Unicode? What's wrong?

As I replied in the thread you quoted there, you shouldn't expect code
like that to correctly determine a file's encoding.

It may be able to work out byte order and encoding for Unicode/UTF-8
files which include byte order marks, but it's unlikely to work for
other files and other encodings.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #2

Jay B. Harlow [MVP - Outlook]

Nick,
In addition to the other comments:

Read the help for StreamReader(path,detectEncodingFromByteOrderMarks )
closer. ;-)

"The detectEncodingFromByteOrderMarks parameter detects the
encoding by looking at the first three bytes of the stream. It
automatically recognizes UTF-8, little-endian Unicode, and
big-endian Unicode text if the file starts with the appropriate
byte order marks. Otherwise, the user-provided encoding is used.
See the Encoding.GetPreamble method for more information."

Remember that if you do not call the constructor of StreamReader with an
Encoding object that UTF8Encoding is used. I would expect the same rule to
apply here. In other words Encoding.Default is not considered unless you
pass it to the constructor.

Ergo your code always returns a Unicode encoding.

Have you tried the StreamReader(path,encoding,
detectEncodingFromByteOrderMarks) constructor?

Hope this helps
Jay
"Nick" <ni*****@heha.net.tw> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...

Hi,

I have used the following code to test the encoding of a file :
public string DetermineFileType(string aFileName)
{
string sEncoding = string.Empty;

StreamReader oSR = new StreamReader(aFileName, true);
oSR.ReadToEnd(); // Add this line to read the file.
sEncoding = oSR.CurrentEncoding.EncodingName;

return sEncoding;
}
from:
http://groups.google.com.hk/groups?h...3DN%26tab%3Dwg
But the encoding is always showing Unicode? What's wrong?

Thanks

Nick

Nov 16 '05 #3

Nick

Hi Jon,

So any other method can do that?

Thanks?

Nick

Jon Skeet [C# MVP] wrote:

Nick <ni*****@heha.net.tw> wrote:
I have used the following code to test the encoding of a file :

public string DetermineFileType(string aFileName)
{
string sEncoding = string.Empty;

StreamReader oSR = new StreamReader(aFileName, true);
oSR.ReadToEnd(); // Add this line to read the file.
sEncoding = oSR.CurrentEncoding.EncodingName;

return sEncoding;
}
from:
http://groups.google.com.hk/groups?h...8&threadm=258t
005q76lpof86nsbqv4f0o2d66sba20%404ax.com&rnum=1& prev=/groups%3Fq%3Dc%2
523%2520detect%2520file%2520encoding%26hl%3Dzh-TW%26lr%3D%26ie%3DUTF-8
%26sa%3DN%26tab%3Dwg

But the encoding is always showing Unicode? What's wrong?

As I replied in the thread you quoted there, you shouldn't expect code
like that to correctly determine a file's encoding.

It may be able to work out byte order and encoding for Unicode/UTF-8
files which include byte order marks, but it's unlikely to work for
other files and other encodings.

Nov 16 '05 #4

Jon Skeet [C# MVP]

Nick <ni*****@heha.net.tw> wrote:

So any other method can do that?

You can't do it reliably - there's no way to tell (for instance)
whether something is using one 8-bit code page or another. The best you
can do is make heuristic guesses, to be honest. For instance, if every
other byte is 0 for most of the time, that *probably* means it's a
Unicode encoding. If the whole file is valid in UTF-8, that may be
indicated - but it's still very dodgy, to be honest.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #5

by: Bill Cohagan | last post by:

I'm constructing an ASP page that I'd like to test by writing a program that simulates "many" users hitting the submit button on a form. I assume it's possible to manually construct an httprequest...

ASP.NET

submit form via HttpWebRequest or WebClient

by: John A Grandy | last post by:

has anyone successfully used HttpWebRequest or WebClient class to simulate submission of a simple HTML form? for example: a very simple plain-vanilla form with a textbox and a button. when the...

ASP.NET

Problem in passing encoding information to WebServices 2

by: James Wong | last post by:

Dear all, I've a web service function and it contains a parameter in System.Text.Encoding. I found that the data type of this parameter in caller application becomes MyWebSvcName.Encoding...

.NET Framework

StreamReader and encoding (making me crazy!)

by: MattB | last post by:

Hi. I'm going around and around with an issue that I can't seem to get around. I have a function I wrote that uses a StreamReader to read a text file into a string variable. It's been working well...

Visual Basic .NET

Change character encoding in the Request

by: 6kjfsyg02 | last post by:

I have written a client to a web service. I use ASP.NET 1.1 for the client. It worked until I tried to send accented characters. Then the service answered that my signature is not valid. I was...

.NET Framework

Using System.web.dll to send mail ?

by: Husam | last post by:

Hi EveryBody: I made windows application project as e-mail sender. This project consist of 13 textbox and one label and one button. I add {system.web.dll} as refrance to this project to help me...

Visual Basic .NET

sending data to another server

by: pintu | last post by:

Hi..I posted my message earlier but it was not properly described..so am posting again. I am working in an application in which i hav to send the contents of an xml file(from my local machine)...

ASP.NET

Mailing System

by: palanivel | last post by:

hi frinendz, how to create the mail project. i am senting the mail throw .net (c# asp)

.NET Framework

Using detectEncodingFromByteOrderMarks while copying a text file

by: Claire | last post by:

I've noticed after copying a text file line by line and comparing, that the original had several bytes of data at the beginning denoting its encoding. How do I use that in my copy? My original...

C# / C Sharp

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Bug in CurrentEncoding.EncodingName?

Similar topics