473,287 Members | 1,413 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,287 software developers and data experts.

Need help reading UTF-16 files ...

Hi. I'm having trouble reading some unicode files. Basically, I have to
parse certain files. Some of those files are being input in Japanese,
Chinese etc. The easiest way, I figured, to distinguish between plain
ASCII files I receive and the Unicode ones would be to check if the
first two bytes read 0xFFFE.

But nothing I do seems to be able to do that.

I tried reading it in binary mode and reading two characters in:

FILE *fin; char ch [2];
fin.open (filename, "rb");
if (fin) { fopen (ch, sizeof (char), 2, fin); ......

I tried reading it in binary mode and read a wchar_t in:

FILE *fin; wchar_t wch;
fin.open (filename, "rb");
if (fin) { fopen (&wch, sizeof (wchar_t), 1, fin); ....

I tried using ifstream for two characters/wifstream for wchar_t but to
no avail.

All of them seems to skip the so-called byte-order-mask. I am quite
lost for ideas. I saw a few examples using MFC Class CStdioFile etc.
but I don't want to use those. I'm sure there's a perfectly simple
method to do this.

Sorry about the long msg for such a simple problem, but it is getting
quite frustrating.... Any help would be very much appreciated.

Cheers,
Nemo.

PS. I know the mask is there. I viewed the files using a hex editor.

Jan 13 '06 #1
2 3218
<nn****@gmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...
Hi. I'm having trouble reading some unicode files. Basically, I have to
parse certain files. Some of those files are being input in Japanese,
Chinese etc. The easiest way, I figured, to distinguish between plain
ASCII files I receive and the Unicode ones would be to check if the
first two bytes read 0xFFFE.

But nothing I do seems to be able to do that.

I tried reading it in binary mode and reading two characters in:

FILE *fin; char ch [2];
fin.open (filename, "rb");
if (fin) { fopen (ch, sizeof (char), 2, fin); ......

I tried reading it in binary mode and read a wchar_t in:

FILE *fin; wchar_t wch;
fin.open (filename, "rb");
if (fin) { fopen (&wch, sizeof (wchar_t), 1, fin); ....

I tried using ifstream for two characters/wifstream for wchar_t but to
no avail.

All of them seems to skip the so-called byte-order-mask. I am quite
lost for ideas. I saw a few examples using MFC Class CStdioFile etc.
but I don't want to use those. I'm sure there's a perfectly simple
method to do this.


See our CoreX library, at our web site. It has exactly what you need.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Jan 13 '06 #2
In message <11**********************@f14g2000cwb.googlegroups .com>,
nn****@gmail.com writes
Hi. I'm having trouble reading some unicode files. Basically, I have to
parse certain files. Some of those files are being input in Japanese,
Chinese etc. The easiest way, I figured, to distinguish between plain
ASCII files I receive and the Unicode ones would be to check if the
first two bytes read 0xFFFE.

But nothing I do seems to be able to do that.

I tried reading it in binary mode and reading two characters in:

FILE *fin; char ch [2];
fin.open (filename, "rb");
if (fin) { fopen (ch, sizeof (char), 2, fin); ......


Try posting the *actual* code that causes the problem. The above is
clearly not it.

--
Richard Herring
Jan 17 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

38
by: lawrence | last post by:
I'm just now trying to give my site a character encoding of UTF-8. The site has been built in a hodge-podge way over the last 6 years. The validator tells me I've lots of characters that don't...
1
by: Ldaled | last post by:
Okay, I had a previous post called reading an XML document. Since this post I have revised my code and got it to work. Now, Like Derek had mentioned in answer to my previous post, I am getting an...
2
by: jassi | last post by:
Hi, i have an app.config file as follows : <?xml version="1.0" encoding="utf-8"> <configuration> <appSettings> <add key="button1.Text" value="cc1"/>
1
by: Chua Wen Ching | last post by:
Hi there, I have some problems when reading XML file. 1. First this, is what i did, cause i can't seem to read "sub elements or tags" values, so i place those values into attributes like this....
3
by: spacekid | last post by:
Hi there I am exposing a c# assembly as a COM component (regasm /codebase) and calling it from classic asp. When I try to call the ConfigurationSettings.AppSettings function in the c# assembly,...
7
by: Drew Berkemeyer | last post by:
Hello, I'm using the following code to read a text file in VB.NET. Dim sr As StreamReader = File.OpenText(strFilePath) Dim input As String = sr.ReadLine() While Not input Is Nothing...
6
by: cj | last post by:
I'm doing something wrong in the reading of this file. I think the rest will work but it keeps telling me something else is using the file. Nothing is. Any ideas? Private Sub...
6
by: HaggMan | last post by:
I'm creating a page that: - accepts user input in whatever language - saves that input to a file - reads the file and displays the original input The following code successfully writes the user...
4
by: ramyakrishnakumar | last post by:
Hi All, I am facing some problem with basic file operation... I have one xml file looks like <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <x:recording> <udf3>Gélin</udf3> ...
7
by: Elliot | last post by:
My XML is using encoding UTF-8 and its content contains Chinese character. When debug the following codes: string strXmlFile = "xml.xml"; XmlDocument objXml = new XmlDocument(); ...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.