Connecting Tech Pros Worldwide Help | Site Map

Need help reading UTF-16 files ...

nnimod@gmail.com
Guest
 
Posts: n/a
#1: Jan 13 '06
Hi. I'm having trouble reading some unicode files. Basically, I have to
parse certain files. Some of those files are being input in Japanese,
Chinese etc. The easiest way, I figured, to distinguish between plain
ASCII files I receive and the Unicode ones would be to check if the
first two bytes read 0xFFFE.

But nothing I do seems to be able to do that.

I tried reading it in binary mode and reading two characters in:

FILE *fin; char ch [2];
fin.open (filename, "rb");
if (fin) { fopen (ch, sizeof (char), 2, fin); ......

I tried reading it in binary mode and read a wchar_t in:

FILE *fin; wchar_t wch;
fin.open (filename, "rb");
if (fin) { fopen (&wch, sizeof (wchar_t), 1, fin); ....

I tried using ifstream for two characters/wifstream for wchar_t but to
no avail.

All of them seems to skip the so-called byte-order-mask. I am quite
lost for ideas. I saw a few examples using MFC Class CStdioFile etc.
but I don't want to use those. I'm sure there's a perfectly simple
method to do this.

Sorry about the long msg for such a simple problem, but it is getting
quite frustrating.... Any help would be very much appreciated.

Cheers,
Nemo.

PS. I know the mask is there. I viewed the files using a hex editor.

P.J. Plauger
Guest
 
Posts: n/a
#2: Jan 13 '06

re: Need help reading UTF-16 files ...


<nnimod@gmail.com> wrote in message
news:1137127103.941480.294450@f14g2000cwb.googlegr oups.com...
[color=blue]
> Hi. I'm having trouble reading some unicode files. Basically, I have to
> parse certain files. Some of those files are being input in Japanese,
> Chinese etc. The easiest way, I figured, to distinguish between plain
> ASCII files I receive and the Unicode ones would be to check if the
> first two bytes read 0xFFFE.
>
> But nothing I do seems to be able to do that.
>
> I tried reading it in binary mode and reading two characters in:
>
> FILE *fin; char ch [2];
> fin.open (filename, "rb");
> if (fin) { fopen (ch, sizeof (char), 2, fin); ......
>
> I tried reading it in binary mode and read a wchar_t in:
>
> FILE *fin; wchar_t wch;
> fin.open (filename, "rb");
> if (fin) { fopen (&wch, sizeof (wchar_t), 1, fin); ....
>
> I tried using ifstream for two characters/wifstream for wchar_t but to
> no avail.
>
> All of them seems to skip the so-called byte-order-mask. I am quite
> lost for ideas. I saw a few examples using MFC Class CStdioFile etc.
> but I don't want to use those. I'm sure there's a perfectly simple
> method to do this.[/color]

See our CoreX library, at our web site. It has exactly what you need.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com


Richard Herring
Guest
 
Posts: n/a
#3: Jan 17 '06

re: Need help reading UTF-16 files ...


In message <1137127103.941480.294450@f14g2000cwb.googlegroups .com>,
nnimod@gmail.com writes[color=blue]
>Hi. I'm having trouble reading some unicode files. Basically, I have to
>parse certain files. Some of those files are being input in Japanese,
>Chinese etc. The easiest way, I figured, to distinguish between plain
>ASCII files I receive and the Unicode ones would be to check if the
>first two bytes read 0xFFFE.
>
>But nothing I do seems to be able to do that.
>
>I tried reading it in binary mode and reading two characters in:
>
>FILE *fin; char ch [2];
>fin.open (filename, "rb");
>if (fin) { fopen (ch, sizeof (char), 2, fin); ......[/color]

Try posting the *actual* code that causes the problem. The above is
clearly not it.

--
Richard Herring
Closed Thread