By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,972 Members | 1,092 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,972 IT Pros & Developers. It's quick & easy.

Read Input, Write Output (File) with Umlaute

P: n/a
I really don't achieve to read a simple 'input.txt' with the following content:
Jürg (Hex: 4a fc 72 67)
to an identical 'output.txt'

I do the following (and tried with tons of different encodings):
private static void WriteFile() {
StreamWriter sr = File.CreateText("Output.txt");
try
{
using (TextReader tr = new StreamReader(new
FileStream("Input.txt",FileMode.Open),Encoding.ASC II ))
{
string iniLine = "";
while ((iniLine = tr.ReadLine()) != null)
{
if (iniLine.Length > 0)
sr.WriteLine(iniLine);
}
tr.Close();
}
}
catch
{
sr.Close();
}
sr.Flush();
sr.Close();
}
But in Output I NEVER have exactly the same Hex values as in Input. Isn't
there a way to say "take the same encoding as the input" ?
Thanks for your help
Nov 19 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Carlo Marchesoni wrote:
I really don't achieve to read a simple 'input.txt' with the
following content: Jürg (Hex: 4a fc 72 67)
to an identical 'output.txt'

I do the following (and tried with tons of different encodings):
private static void WriteFile() {
StreamWriter sr = File.CreateText("Output.txt");
try
{
using (TextReader tr = new StreamReader(new
FileStream("Input.txt",FileMode.Open),Encoding.ASC II ))
{
string iniLine = "";
while ((iniLine = tr.ReadLine()) != null)
{
if (iniLine.Length > 0)
sr.WriteLine(iniLine);
}
tr.Close();
}
}
catch
{
sr.Close();
}
sr.Flush();
sr.Close();
}
But in Output I NEVER have exactly the same Hex values as in Input.
Isn't there a way to say "take the same encoding as the input" ?


There's no way of identifying a text file's character encoding (save
for a few exceptions). And regarding your code sample, note that ASCII
doesn't include Umlaut characters. Thus, your StreamReader simply loses
them in this case.

But the real issue is that File.OpenText() always uses UTF-8, but your
sample text 0x4a 0xfc 0x72 0x67 is an 8 bit encoding, most likely
Windows-1252 or ISO-8859-1. Even if you open the source file with the
correct encoding, the output will always differ at the byte level,
because UTF-8 encodes Umlaut characters differently.

But why decode and encode anyway? Your code is a simple file copy. If
that's all you need, File.Copy() or using FileStreams will work just
fine with all encoding combinations.

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Nov 19 '05 #2

P: n/a

Thank you for yous answer. I know that for this sample the File.Copy() would
be much better, but in my real application I obviousely have a much larger
Input file and I have to change a couple of things before writing it to
output.

"Joerg Jooss" wrote:
Carlo Marchesoni wrote:
I really don't achieve to read a simple 'input.txt' with the
following content: Jürg (Hex: 4a fc 72 67)
to an identical 'output.txt'

I do the following (and tried with tons of different encodings):
private static void WriteFile() {
StreamWriter sr = File.CreateText("Output.txt");
try
{
using (TextReader tr = new StreamReader(new
FileStream("Input.txt",FileMode.Open),Encoding.ASC II ))
{
string iniLine = "";
while ((iniLine = tr.ReadLine()) != null)
{
if (iniLine.Length > 0)
sr.WriteLine(iniLine);
}
tr.Close();
}
}
catch
{
sr.Close();
}
sr.Flush();
sr.Close();
}
But in Output I NEVER have exactly the same Hex values as in Input.
Isn't there a way to say "take the same encoding as the input" ?


There's no way of identifying a text file's character encoding (save
for a few exceptions). And regarding your code sample, note that ASCII
doesn't include Umlaut characters. Thus, your StreamReader simply loses
them in this case.

But the real issue is that File.OpenText() always uses UTF-8, but your
sample text 0x4a 0xfc 0x72 0x67 is an 8 bit encoding, most likely
Windows-1252 or ISO-8859-1. Even if you open the source file with the
correct encoding, the output will always differ at the byte level,
because UTF-8 encodes Umlaut characters differently.

But why decode and encode anyway? Your code is a simple file copy. If
that's all you need, File.Copy() or using FileStreams will work just
fine with all encoding combinations.

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de

Nov 19 '05 #3

P: n/a
Carlo Marchesoni wrote:

Thank you for yous answer. I know that for this sample the
File.Copy() would be much better, but in my real application I
obviousely have a much larger Input file and I have to change a
couple of things before writing it to output.


In this case, make sure to create a StreamReader and a StreamWriter
that use the same encoding.

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Nov 19 '05 #4

P: n/a
Thanks a lot for your hint - now it works .

"Carlo Marchesoni" wrote:
I really don't achieve to read a simple 'input.txt' with the following content:
Jürg (Hex: 4a fc 72 67)
to an identical 'output.txt'

I do the following (and tried with tons of different encodings):
private static void WriteFile() {
StreamWriter sr = File.CreateText("Output.txt");
try
{
using (TextReader tr = new StreamReader(new
FileStream("Input.txt",FileMode.Open),Encoding.ASC II ))
{
string iniLine = "";
while ((iniLine = tr.ReadLine()) != null)
{
if (iniLine.Length > 0)
sr.WriteLine(iniLine);
}
tr.Close();
}
}
catch
{
sr.Close();
}
sr.Flush();
sr.Close();
}
But in Output I NEVER have exactly the same Hex values as in Input. Isn't
there a way to say "take the same encoding as the input" ?
Thanks for your help

Nov 19 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.