Connecting Tech Pros Worldwide Forums | Help | Site Map

Reading UTF data from c#

Sivaraj G via .NET 247
Guest
 
Posts: n/a
#1: Nov 16 '05
We created a unicode file using java application. It usesmethods like writeUTF(), writeInt() of java.io.DataOutputStreamclass to write the content of the file. We are able to read datausing java.io.DataInputStream.readUTF() method. It's workingwell in java environment.

When we tried to read the above unicode file in .net environment.We received junk content not the original content.

Actually we tried a sample program in c# and usedSystem.Text.UTF8Encoding(true) option also. Any help highlyappreciated.

-----------------------
Posted by a user from .NET 247 (http://www.dotnet247.com/)

<Id>VGTUU+5aHEiNrMv6ZYlwRA==</Id>

Jon Skeet [C# MVP]
Guest
 
Posts: n/a
#2: Nov 16 '05

re: Reading UTF data from c#


Sivaraj G via .NET 247 <anonymous@dotnet247.com> wrote:[color=blue]
> We created a unicode file using java application. It uses methods
> like writeUTF(), writeInt() of java.io.DataOutputStream class to
> write the content of the file. We are able to read data using
> java.io.DataInputStream.readUTF() method. It's working well in java
> environment.
>
> When we tried to read the above unicode file in .net environment. We
> received junk content not the original content.
>
> Actually we tried a sample program in c# and used
> System.Text.UTF8Encoding(true) option also. Any help highly
> appreciated.[/color]

writeUTF first writes a pair of bytes to give the number of bytes to
follow. Those aren't UTF-8 characters, but .NET would be expecting them
to be.

Effectively, writeUTF and readUTF are only designed to work with
DataInputStream/DataOutputStream. You can probably read that pair of
bytes before reading the rest, but it's not ideal. If you're just
creating a text file in Java, I suggest you use OutputStreamWriter
wrapped round a FileStream, and specify UTF-8 as the encoding. If
you're writing a file with mixed binary and text data, you need to make
sure you know *exactly* what you're writing, and then read it very
carefully from the other platform.

--
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nurchi BECHED
Guest
 
Posts: n/a
#3: Nov 16 '05

re: Reading UTF data from c#


Hello, Sivaraj!

Here is a block of code that might help (or give some ideas),
if I understood you correctly:
string
s=string.Empty,
path="c:\file.txt", //... or whatever file you'll be using
NewLine=string.Empty; //this variable doesn't have to be initialized, but
it is a good habit

System.IO.StreamReader MyStreamReader=
new System.IO.StreamReader(path, System.Text.Encoding.UTF8);

while((NewLine=MyStreamReader.ReadLine())!=null)
s+=NewLine+"\r\n";

MyStreamReader.Close();

//Now do something with 's'

Regards

You wrote on Fri, 06 Aug 2004 06:51:09 -0700:

SGN> When we tried to read the above unicode file in .net environment. We
SGN> received junk content not the original content.

SGN> Actually we tried a sample program in c# and used
SGN> System.Text.UTF8Encoding(true) option also. Any help highly
SGN> appreciated.

SGN> -----------------------
SGN> Posted by a user from .NET 247 (http://www.dotnet247.com/)


With best regards, Nurchi BECHED.


Jon Skeet [C# MVP]
Guest
 
Posts: n/a
#4: Nov 16 '05

re: Reading UTF data from c#


Nurchi BECHED <nurchi@telus.net> wrote:[color=blue]
> Here is a block of code that might help (or give some ideas),
> if I understood you correctly:
> string
> s=string.Empty,
> path="c:\file.txt", //... or whatever file you'll be using
> NewLine=string.Empty; //this variable doesn't have to be initialized, but
> it is a good habit[/color]

I disagree on that point - if an assignment isn't required because
there'll be another assignment before the first "read" of the variable,
I'd rather the extraneous assignment isn't present in the first place.
It implies that the assigned value has some purpose, when it doesn't.
[color=blue]
> System.IO.StreamReader MyStreamReader=
> new System.IO.StreamReader(path, System.Text.Encoding.UTF8);
>
> while((NewLine=MyStreamReader.ReadLine())!=null)
> s+=NewLine+"\r\n";[/color]

That's a horrible way of building up a string. Use StringBuilder
instead.

In this case though, just MyStreamReader.ReadToEnd() would be a better
solution still.
[color=blue]
> MyStreamReader.Close();[/color]

You should use a using statement instead - that way if an exception is
thrown, the StreamReader still gets disposed.

--
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Closed Thread