Help!! Convert file encoding

Sun

Hi everyone

. I have two files named a.txt and b.txt.
I open a.txt with ultraeditor.exe. here is the first row of the file:
neu für

then I switch to the HEX mode:
00000000h: FF FE 6E 00 65 00 75 00 20 00 66 00 FC 00 72 00 20 00 0A
00 0D 00
I open b.txt with ultraeditor.exe as well. first row of b.txt
neu für

switch to the HEX mode:
00000000h: 6E 65 75 20 66 FC 72 20 0A 0D

the header byte of a.txt is FF FE, so I think this should be the
Unicode(little endian) encoded file. the header of b.txt has no BOM,
so I think this file is ANSI encoded.

then I use follow C# code to catch the each byte of the file:
DirectoryInfo di = new
DirectoryInfo(System.IO.Directory.GetCurrentDirect ory());
foreach( FileInfo fi in di.GetFiles("*.txt"))
{
FileStream fs = new FileStream(fi.Name, FileMode.Open,
FileAccess.Read, FileShare.Read);
fs.Seek(0,SeekOrigin.Begin);
Console.WriteLine(fi.Name);
for(int i=0; i < 10; i++)
{
byte b = Convert.ToByte(fs.ReadByte());
Console.WriteLine(i.ToString() + " : " + b);
}

fs.Close();
}

here is the result: ( I only display the first row)
a.txt
0 : 110
1 : 101
2 : 117
3 : 32
4 : 102
5 : 195
6 : 188
7 : 114
8 : 32
9 : 13

b.txt
0 : 110
1 : 101
2 : 117
3 : 32
4 : 102
5 : 252
6 : 114
7 : 32
8 : 13
9 : 10
So, I have three questions here:
1 why the ultraeditor show the BOM header of a.txt and the code can
not. every character is two bytes length. but the C# stream can not
read the high byte of the character.
2 the character ü is an extended code of ASCII with codepage 1252
in a.txt. But I really don't know why the bytes I get from the code is
195(dec) and 188(dec), one byte turn to two bytes. How the byte
252(dec) become byte 195(dec) and 188(dec). I really don't know how it
comes.
3 Anyway, I want to convert both files to utf-8 encoded. How should I
do? each character in the file should be converted correctly, also
the characters should be shown correctly as well by opened with the
notepad.

If any one has any suggestion, very thanks.

Sep 2 '08 #1

Subscribe Reply

2933

Peter Duniho

On Mon, 01 Sep 2008 20:09:21 -0700, Sun <Su******@gmail.comwrote:

[...]
So, I have three questions here:
1 why the ultraeditor show the BOM header of a.txt and the code can
not. every character is two bytes length. but the C# stream can not
read the high byte of the character.
2 the character Ã¼ is an extended code of ASCII with codepage 1252
in a.txt. But I really don't know why the bytes I get from the code is
195(dec) and 188(dec), one byte turn to two bytes. How the byte
252(dec) become byte 195(dec) and 188(dec). I really don't know how it
comes.
3 Anyway, I want to convert both files to utf-8 encoded. How should I
do? each character in the file should be converted correctly, also
the characters should be shown correctly as well by opened with the
notepad.

If any one has any suggestion, very thanks.

Unless you can provide links where the actual files can be downloaded, I'm
not sure anyone here will be able to offer much information.

I agree that your observations are inconsistent. But there's no reason
that a FileStream shouldn't return the exact bytes found in the file. So
that leaves a few possibilities: 1) the code you posted isn't actually the
exact code you're using to read the files, 2) the files you're reading
with the code aren't the same files being opened in this "UltraEditor"
program, or 3) the "UltraEditor" program is doing something unexpected (at
least by you...it's possible whatever it's doing is completely intentional
and expected for other users) with the files.

Just a wild guess: the first file is already UTF-8, and "UltraEditor"
detects that based on the 2-byte Ã¼ character and internally converts that
to a plain UTF-16 file with BOM at the beginning.

You'll need to post the exact code, a concise-but-complete code sample
that reproduces the issue you're seeing, as well as provide a couple of
links to copies of the files so people can use the exact data you're
using. Alternatively, include in your sample some setup code to create
the files appropriately; but I'm guessing that if you could do that
easily, you wouldn't have the question in the first place. :)

As far as your third question goes: the simplest approach to converting
the file would be to used the .NET text i/o classes, StreamReader and
StreamWriter. Let .NET auto-detect the input file encoding, or specify it
yourself, and then explicitly specify the encoding for the output (though,
actually...my recollection is that the default is already UTF-8, and if so
you don't really have to specify it). Then just use ReadLine() and
WriteLine() to go through the file and convert it.

Pete

Sep 2 '08 #2

Ken Foskey

Yes, one file is ascii and the other is UTF as you suggest.

I would assume that streams understand the differences and open the files
converting into UTF characters.

Sep 2 '08 #3

=?UTF-8?B?QXJuZSBWYWpow7hq?=

Ken Foskey wrote:

Yes, one file is ascii and the other is UTF as you suggest.

CP-1252/ISO-8859-1/similar not ASCII.

Arne

Sep 3 '08 #4

Similar topics

file.encoding different on same machine

by: Merav | last post by:

I'm running a java application from Eclipse. Looking at the system properties I get the following values: file.encoding = Cp1255 user.language = iw user.country = IL Than I'm building a jar...

Java

Finding out a file encoding

by: Gaia C via .NET 247 | last post by:

Hi All, How can i found out at what encoding the file was saved? I tried usign GetPreamble(), but for this i should already create a stream which get an encoding... StreamReader sr = new...

C# / C Sharp

File Encoding Styles

by: Xarky | last post by:

Hi, I am downloading a GIF file(as a mail attachement) with this file format, Content-Transfer-Encoding: base64; Now I am writing the downloaded data to a file with this technique: ...

C# / C Sharp

Determine File Encoding

by: Marc Jennings | last post by:

Hi there, Can anyone point out any really obvious flaws in the methodology below to determine the likely encoding of a file, please? I know the number of types of encoding is small, but that is...

C# / C Sharp

Convert file

by: Alan T | last post by:

I want to convert file type another such as convert from HTML to .txt convert from Excel to .txt convert from Word to .txt convert from PPT to .txt so that I can run it programmatically in...

Visual Basic .NET

CONVERT FILE.ASP TO HTM or HTML

by: TREVOR SILKSTONE | last post by:

I need to CONVERT file.ASP TO HTM or HTML

ASP / Active Server Pages

Convert file .doc to .html in Unix/Linux OS

by: LeoTD | last post by:

Dear all, I want to convert file type .doc to .html in Linux. But I can't be get it, please show me How to coding for convert it ? Thanks a lot.

PHP

how can i convert file system in usb drive.

by: gaurav92K | last post by:

sir, is it possible that convert file system fat, fat32 to ntfs in usb drive.give the best answer. gaurav92k.

Microsoft Windows

Help with file search

by: Sutharsan Nagasun | last post by:

Hi, I am new to Perl. I need help with file search for the following scenario. Currently as part of the archiving process, we have archived the files under /$rootdir/Archive/yyyy directory where...

Perl

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration