472,354 Members | 1,218 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,354 software developers and data experts.

Need help reading UTF-16 files ...

Hi. I'm having trouble reading some unicode files. Basically, I have to
parse certain files. Some of those files are being input in Japanese,
Chinese etc. The easiest way, I figured, to distinguish between plain
ASCII files I receive and the Unicode ones would be to check if the
first two bytes read 0xFFFE.

But nothing I do seems to be able to do that.

I tried reading it in binary mode and reading two characters in:

FILE *fin; char ch [2];
fin.open (filename, "rb");
if (fin) { fopen (ch, sizeof (char), 2, fin); ......

I tried reading it in binary mode and read a wchar_t in:

FILE *fin; wchar_t wch;
fin.open (filename, "rb");
if (fin) { fopen (&wch, sizeof (wchar_t), 1, fin); ....

I tried using ifstream for two characters/wifstream for wchar_t but to
no avail.

All of them seems to skip the so-called byte-order-mask. I am quite
lost for ideas. I saw a few examples using MFC Class CStdioFile etc.
but I don't want to use those. I'm sure there's a perfectly simple
method to do this.

Sorry about the long msg for such a simple problem, but it is getting
quite frustrating.... Any help would be very much appreciated.

Cheers,
Nemo.

PS. I know the mask is there. I viewed the files using a hex editor.

Jan 13 '06 #1
2 3134
<nn****@gmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...
Hi. I'm having trouble reading some unicode files. Basically, I have to
parse certain files. Some of those files are being input in Japanese,
Chinese etc. The easiest way, I figured, to distinguish between plain
ASCII files I receive and the Unicode ones would be to check if the
first two bytes read 0xFFFE.

But nothing I do seems to be able to do that.

I tried reading it in binary mode and reading two characters in:

FILE *fin; char ch [2];
fin.open (filename, "rb");
if (fin) { fopen (ch, sizeof (char), 2, fin); ......

I tried reading it in binary mode and read a wchar_t in:

FILE *fin; wchar_t wch;
fin.open (filename, "rb");
if (fin) { fopen (&wch, sizeof (wchar_t), 1, fin); ....

I tried using ifstream for two characters/wifstream for wchar_t but to
no avail.

All of them seems to skip the so-called byte-order-mask. I am quite
lost for ideas. I saw a few examples using MFC Class CStdioFile etc.
but I don't want to use those. I'm sure there's a perfectly simple
method to do this.


See our CoreX library, at our web site. It has exactly what you need.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Jan 13 '06 #2
In message <11**********************@f14g2000cwb.googlegroups .com>,
nn****@gmail.com writes
Hi. I'm having trouble reading some unicode files. Basically, I have to
parse certain files. Some of those files are being input in Japanese,
Chinese etc. The easiest way, I figured, to distinguish between plain
ASCII files I receive and the Unicode ones would be to check if the
first two bytes read 0xFFFE.

But nothing I do seems to be able to do that.

I tried reading it in binary mode and reading two characters in:

FILE *fin; char ch [2];
fin.open (filename, "rb");
if (fin) { fopen (ch, sizeof (char), 2, fin); ......


Try posting the *actual* code that causes the problem. The above is
clearly not it.

--
Richard Herring
Jan 17 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

38
by: lawrence | last post by:
I'm just now trying to give my site a character encoding of UTF-8. The site has been built in a hodge-podge way over the last 6 years. The validator tells me I've lots of characters that don't...
1
by: Ldaled | last post by:
Okay, I had a previous post called reading an XML document. Since this post I have revised my code and got it to work. Now, Like Derek had mentioned in answer to my previous post, I am getting an...
2
by: jassi | last post by:
Hi, i have an app.config file as follows : <?xml version="1.0" encoding="utf-8"> <configuration> <appSettings> <add key="button1.Text" value="cc1"/>
1
by: Chua Wen Ching | last post by:
Hi there, I have some problems when reading XML file. 1. First this, is what i did, cause i can't seem to read "sub elements or tags" values, so i place those values into attributes like this....
3
by: spacekid | last post by:
Hi there I am exposing a c# assembly as a COM component (regasm /codebase) and calling it from classic asp. When I try to call the ConfigurationSettings.AppSettings function in the c# assembly,...
7
by: Drew Berkemeyer | last post by:
Hello, I'm using the following code to read a text file in VB.NET. Dim sr As StreamReader = File.OpenText(strFilePath) Dim input As String = sr.ReadLine() While Not input Is Nothing...
6
by: cj | last post by:
I'm doing something wrong in the reading of this file. I think the rest will work but it keeps telling me something else is using the file. Nothing is. Any ideas? Private Sub...
6
by: HaggMan | last post by:
I'm creating a page that: - accepts user input in whatever language - saves that input to a file - reads the file and displays the original input The following code successfully writes the user...
4
by: ramyakrishnakumar | last post by:
Hi All, I am facing some problem with basic file operation... I have one xml file looks like <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <x:recording> <udf3>Gélin</udf3> ...
7
by: Elliot | last post by:
My XML is using encoding UTF-8 and its content contains Chinese character. When debug the following codes: string strXmlFile = "xml.xml"; XmlDocument objXml = new XmlDocument(); ...
2
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
2
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it so the python app could use a http request to get...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and credentials and received a successful connection...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific technical details, Gmail likely implements measures...
1
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web server and have made sure to enable curl. I get a...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
0
by: Ricardo de Mila | last post by:
Dear people, good afternoon... I have a form in msAccess with lots of controls and a specific routine must be triggered if the mouse_down event happens in any control. Than I need to discover what...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.