471,339 Members | 1,193 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,339 software developers and data experts.

Using detectEncodingFromByteOrderMarks while copying a text file

I've noticed after copying a text file line by line and comparing, that the
original had several bytes of data at the beginning denoting its encoding.
How do I use that in my copy?
My original code shown below, didn't produce a perfect copy, so I used the
StreamReader construct that includes detectEncodingFromByteOrderMarks. But I
need to pass that to the construct for my StreamWriter so I need to be able
to work out the encoding type somehow. How please?

string InputPath = Path.GetDirectoryName(Application.ExecutablePath) +
@"\intext.txt";
string OutputPath = Path.GetDirectoryName(Application.ExecutablePath)
+ @"\outtext.txt";
string In;
string Out;

using (StreamReader Input = new StreamReader(InputPath))
// using (StreamReader Input = new StreamReader(InputPath, true)) <<
construct
{
using (StreamWriter Output = new StreamWriter(OutputPath))
{
while ((In = Input.ReadLine()) != null)
{
Out = DoSomethingTo(In);
Output.WriteLine(Out);
}
}
}

Jun 27 '08 #1
6 4577
I'm guessing - tell the writer about it?

using (StreamWriter Output = new StreamWriter(OutputPath, false,
Input.CurrentEncoding)) {...}

Marc
Jun 27 '08 #2
Correction - the CurrentEncoding is not valid until it has read some
data; perhaps something like below; note that it also can't detect every
encoding possible...

Marc

using (StreamReader reader = new StreamReader(path1, true))
{
string line = reader.ReadLine();
using (StreamWriter writer = new StreamWriter(path2, false,
reader.CurrentEncoding))
{
Console.WriteLine("Reading {0} with {1}", path1,
reader.CurrentEncoding.EncodingName);
Console.WriteLine("Writing {0} with {1}", path2,
writer.Encoding.EncodingName);

while (line != null)
{
string t = Transform(line);
Console.WriteLine(t);
writer.WriteLine(t);
line = reader.ReadLine();
}
}
}
Jun 27 '08 #3
"Marc Gravell" <ma**********@gmail.comwrote in message
news:u4**************@TK2MSFTNGP03.phx.gbl...
Correction - the CurrentEncoding is not valid until it has read some data;
perhaps something like below; note that it also can't detect every
encoding possible...
That's great! thank you :)

Jun 27 '08 #4
Using detectEncodingFromByteOrderMarks while copying a text file
Unless you process the text somehow, it is not worth the trouble to
copy a text file as text file (with encoding detection, line ending,
and so on).
Just copy it as a binary. The routine can also be reused for any type
of files, and there is no risk of data corruption if you "guess" the
encoding wrong.
--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Jun 27 '08 #5
I very nearly said the same thing - but if you look carefully, there is
a transform hidden in the code:

Out = DoSomethingTo(In);
Output.WriteLine(Out);

Marc
Jun 27 '08 #6
I very nearly said the same thing - but if you look carefully, there is
a transform hidden in the code:
Right, I missed that one. Got fouled by the subject :-)
--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Jun 27 '08 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

11 posts views Thread by Grant Edwards | last post: by
2 posts views Thread by Bernd Lambertz | last post: by
22 posts views Thread by Matt | last post: by
14 posts views Thread by Tony Johansson | last post: by
reply views Thread by Richard Taylor | last post: by
121 posts views Thread by typingcat | last post: by
reply views Thread by Grant Edwards | last post: by
6 posts views Thread by kimiraikkonen | last post: by
reply views Thread by rosydwin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.