473,397 Members | 1,960 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

iostream question: How to open unicode file name.

I'm trying to upgrade some old code that used old iostreams.

At one place in the code, I have a path/filename in a wchar_t string
(unicode utf-16).

I need to open an ifstream to that file. But the open() on ifstream only
takes char * strings (mbcs?).

In old iostreams, I could _wopen() the file, get the filedesc, and call
attach() on the ifstream.
But filedescs and attach() don't exist in standard iostreams.

I can't just convert the unicode string to a mbcs string, because the user's
default codepage might not allow for all the characters that are in the
unicode string.

I can of course convert the unicode string to utf-8, but I can't find any
way to get the iostream open() to believe that the string is utf-8.

trying to ibue() a stream with a custom locale that has a custom codecvt
seems to only affect data written to the stream, not how file names passed
to open are interpreted.

Does anyone know how I can accomplish opening an ifstream to a unicode named
file?

Why doesn't ifstream have a wopen() ?

Nov 16 '05 #1
9 13394
Charles F McDevitt wrote:
I'm trying to upgrade some old code that used old iostreams.

At one place in the code, I have a path/filename in a wchar_t string
(unicode utf-16).

I need to open an ifstream to that file. But the open() on ifstream
only takes char * strings (mbcs?).

In old iostreams, I could _wopen() the file, get the filedesc, and
call attach() on the ifstream.
But filedescs and attach() don't exist in standard iostreams.

I can't just convert the unicode string to a mbcs string, because the
user's default codepage might not allow for all the characters that
are in the unicode string.

I can of course convert the unicode string to utf-8, but I can't find
any way to get the iostream open() to believe that the string is
utf-8.

trying to ibue() a stream with a custom locale that has a custom
codecvt seems to only affect data written to the stream, not how file
names passed to open are interpreted.

Does anyone know how I can accomplish opening an ifstream to a
unicode named file?

Why doesn't ifstream have a wopen() ?


Because the C++ community can't agree on how such a function should work.
For a system such as Windows that has a Unicode file system, the desired
behavior seems obvious: just pass the string to the filesystem. But since
C++ tries to address a much wider range of systems, many of which don't
support unicode filesystems, we're left with a C++ standard in which there
is no standard compliant, portable way to open a file given a unicode file
name.

The library implementation supplied with VC does provide a way to do it,
however. Open the file using the C runtime library _wfopen, then create the
ifstream by passing the FILE* that _wfopen() returns to the ifstream
constructor.

#include <fstream>

void foo(const wchar_t* pwsz)
{
std::ifstream stm(_wfopen(pwsz,L"rb"));

// Do stuff with the stream
}

-cd
Nov 16 '05 #2

"Carl Daniel [VC++ MVP]" <cp******@nospam.mvps.org> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...
Charles F McDevitt wrote:
I'm trying to upgrade some old code that used old iostreams.

At one place in the code, I have a path/filename in a wchar_t string
(unicode utf-16).

I need to open an ifstream to that file. But the open() on ifstream
only takes char * strings (mbcs?).

In old iostreams, I could _wopen() the file, get the filedesc, and
call attach() on the ifstream.
But filedescs and attach() don't exist in standard iostreams.

I can't just convert the unicode string to a mbcs string, because the
user's default codepage might not allow for all the characters that
are in the unicode string.

I can of course convert the unicode string to utf-8, but I can't find
any way to get the iostream open() to believe that the string is
utf-8.

trying to ibue() a stream with a custom locale that has a custom
codecvt seems to only affect data written to the stream, not how file
names passed to open are interpreted.

Does anyone know how I can accomplish opening an ifstream to a
unicode named file?

Why doesn't ifstream have a wopen() ?
Because the C++ community can't agree on how such a function should work.
For a system such as Windows that has a Unicode file system, the desired
behavior seems obvious: just pass the string to the filesystem. But

since C++ tries to address a much wider range of systems, many of which don't
support unicode filesystems, we're left with a C++ standard in which there
is no standard compliant, portable way to open a file given a unicode file
name.

The library implementation supplied with VC does provide a way to do it,
however. Open the file using the C runtime library _wfopen, then create the ifstream by passing the FILE* that _wfopen() returns to the ifstream
constructor.

#include <fstream>

void foo(const wchar_t* pwsz)
{
std::ifstream stm(_wfopen(pwsz,L"rb"));

// Do stuff with the stream
}

-cd


Thanks.. I guess that's my only choice.
Nov 16 '05 #3
"Carl Daniel [VC++ MVP]" <cp******@nospam.mvps.org> wrote:
[...]
Why doesn't ifstream have a wopen() ?
Because the C++ community can't agree on how such a function should work.
For a system such as Windows that has a Unicode file system, the desired
behavior seems obvious: just pass the string to the filesystem. But since
C++ tries to address a much wider range of systems, many of which don't
support unicode filesystems, we're left with a C++ standard in which there
is no standard compliant, portable way to open a file given a unicode file
name.


I consider that a weak argument. After all,
C++ also tries to address systems where there's
no file system at all.
(Carl, I know you're not the one to argue with
about that. However, I just couldn't let that
pass uncommented. <g>)
[...]
-cd


Schobi

--
Sp******@gmx.de is never read
I'm Schobi at suespammers org

"And why should I know better by now/When I'm old enough not to?"
Beth Orton
Nov 16 '05 #4
Hendrik Schober wrote:
"Carl Daniel [VC++ MVP]" <cp******@nospam.mvps.org> wrote:
[...]
Why doesn't ifstream have a wopen() ?


Because the C++ community can't agree on how such a function should
work. For a system such as Windows that has a Unicode file system,
the desired behavior seems obvious: just pass the string to the
filesystem. But since C++ tries to address a much wider range of
systems, many of which don't support unicode filesystems, we're left
with a C++ standard in which there is no standard compliant,
portable way to open a file given a unicode file name.


I consider that a weak argument. After all,
C++ also tries to address systems where there's
no file system at all.
(Carl, I know you're not the one to argue with
about that. However, I just couldn't let that
pass uncommented. <g>)


Quite so - I consider it a very weak argument as well, but that's the state
of affairs, unfortunately. At least there is a workaround, however ugly and
non-portable it might be.

-cd
Nov 16 '05 #5

"Hendrik Schober" <Sp******@gmx.de> wrote in message
news:uL**************@TK2MSFTNGP10.phx.gbl...
"Carl Daniel [VC++ MVP]" <cp******@nospam.mvps.org> wrote:
[...]
Why doesn't ifstream have a wopen() ?


Because the C++ community can't agree on how such a function should work. For a system such as Windows that has a Unicode file system, the desired
behavior seems obvious: just pass the string to the filesystem. But since C++ tries to address a much wider range of systems, many of which don't
support unicode filesystems, we're left with a C++ standard in which there is no standard compliant, portable way to open a file given a unicode file name.


I consider that a weak argument. After all,
C++ also tries to address systems where there's
no file system at all.
(Carl, I know you're not the one to argue with
about that. However, I just couldn't let that
pass uncommented. <g>)
[...]


One of the reasons more and more people are moving to Java and C#.... At
least, those languages work in International environments... Unlike C++,
where it is "implementation defined" if things will work sensibly.
Nov 16 '05 #6
"Charles F McDevitt" <Ch************@m-s-n.com> wrote:
[...]
One of the reasons more and more people are moving to Java and C#.... At
least, those languages work in International environments... Unlike C++,
where it is "implementation defined" if things will work sensibly.

Well, I don't know much about Java or C#.
But AFAK, in Java, the Unicode char is fixed
to 16bit. While that certainly is platform
independend, it isn't very good either.

Schobi

--
Sp******@gmx.de is never read
I'm Schobi at suespammers org

"And why should I know better by now/When I'm old enough not to?"
Beth Orton
Nov 16 '05 #7

"Hendrik Schober" <Sp******@gmx.de> wrote in message
news:uX*************@TK2MSFTNGP11.phx.gbl...
"Charles F McDevitt" <Ch************@m-s-n.com> wrote:
[...]
One of the reasons more and more people are moving to Java and C#.... At
least, those languages work in International environments... Unlike C++,
where it is "implementation defined" if things will work sensibly.

Well, I don't know much about Java or C#.
But AFAK, in Java, the Unicode char is fixed
to 16bit. While that certainly is platform
independend, it isn't very good either.


It's only fixed 16-bits UTF-16 internal to your Java Code.
You can write or read it from there anyway you want,
although the default is to keep it 16-bit UTF-16.
Java has built-in conversion routines for most character sets you'd want.

In C++, if your characters are wchar_t, it's implementation
defined if they are 16-bit, 32-bit, or some other size,
and implementation defined if they are Unicode or not.
They could be any conceviable character set that fits in the size.

When writing out, say with a wofstream, C++ mandates that the
default behaviour is to convert to narrow characters.

But what the conversion is, is implementation defined.
Even if you happen to have unicode in your wchar_t string,
the default conversion via wofstream could convert to any
character set, and again, it's implementation defined.

Microsoft's choice is the convert to the local code page
(makes sense on Windows except when the local code page can't handle the
unicode characters)
and Linux seems to convert to UTF-8,
and different UNIXes do whatever they think is sensible,
but you can't count on any consistent behavour.
All of this makes it a pain to write portable C++ code.
Nov 16 '05 #8

"Carl Daniel [VC++ MVP]" <cp******@nospam.mvps.org> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...
Charles F McDevitt wrote:
I'm trying to upgrade some old code that used old iostreams.

At one place in the code, I have a path/filename in a wchar_t string
(unicode utf-16).

I need to open an ifstream to that file. But the open() on ifstream
only takes char * strings (mbcs?).

In old iostreams, I could _wopen() the file, get the filedesc, and
call attach() on the ifstream.
But filedescs and attach() don't exist in standard iostreams.

I can't just convert the unicode string to a mbcs string, because the
user's default codepage might not allow for all the characters that
are in the unicode string.

I can of course convert the unicode string to utf-8, but I can't find
any way to get the iostream open() to believe that the string is
utf-8.

trying to ibue() a stream with a custom locale that has a custom
codecvt seems to only affect data written to the stream, not how file
names passed to open are interpreted.

Does anyone know how I can accomplish opening an ifstream to a
unicode named file?

Why doesn't ifstream have a wopen() ?
Because the C++ community can't agree on how such a function should work.
For a system such as Windows that has a Unicode file system, the desired
behavior seems obvious: just pass the string to the filesystem. But

since C++ tries to address a much wider range of systems, many of which don't
support unicode filesystems, we're left with a C++ standard in which there
is no standard compliant, portable way to open a file given a unicode file
name.

The library implementation supplied with VC does provide a way to do it,
however. Open the file using the C runtime library _wfopen, then create the ifstream by passing the FILE* that _wfopen() returns to the ifstream
constructor.

#include <fstream>

void foo(const wchar_t* pwsz)
{
std::ifstream stm(_wfopen(pwsz,L"rb"));

// Do stuff with the stream
}

-cd


One issue with this approach (other than the need to rewrite a lot of code):

It is OK to call .imbue() after the constructor opens the file?

I need a custom locale (well, custom codecvt facet). Normally, I would
construct
the stream, call imbue(), and then call open().

But, if the only way to open to a unicode named file is by opening in the
constructor,
I need to then imbue() my locale after the open has happened.
Is that legal?
Nov 16 '05 #9
"Charles F McDevitt" <Ch************@m-s-n.com> wrote:
[...]
It's only fixed 16-bits UTF-16 internal to your Java Code.
If it really is UTF-16, it's all right.
I was told it only takes Unicode < 2^16
(as Windows does).
[...]


Schobi

--
Sp******@gmx.de is never read
I'm Schobi at suespammers org

"And why should I know better by now/When I'm old enough not to?"
Beth Orton
Nov 16 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
by: ernst.stiegler | last post by:
Hello, I just want to read a whole line from a console input. What I don't understand is that I always have to press ENTER twice in order to see the line I've entered. Here's my code : ...
20
by: Mark | last post by:
I am using gnu g++ version 3.3.2, trying a simple test to read in and then write out a large (100,000 line) text file ########################################## CSTDIO VERSION TO READ/WRITE...
9
by: Frances | last post by:
at work we switched to UTF-16 encoding and now when I open html files in HomeSite the code is all messed up, and if I turn on "enable non-ANSI file encoding" code looks fine when I open files but I...
3
by: Stanislaw Findeisen | last post by:
Does anyone know how to create file shortcuts in Windows? The only way I know is like: --------------------------------------------------------------- import win32com.client ...
8
by: Vinod | last post by:
Hi, I have a stored procedure which expects a varbinary datatype. How can i pass a varbinary datatype from asp.net directly to the stored procedure. I tried using the Convert function in Sql...
3
by: MaaSTaaR | last post by:
Hello ... firstly , sorry for my bad English . i have problem with open() function when i use it with file which name in Arabic , the open() will not find the file , and i am sure the file is...
6
by: kath | last post by:
Hi all, Platform: winxp Version: Python 2.3 I have a task of reading files in a folder and creating an one excel file with sheets, one sheet per file, with sheet named...
8
by: sore eyes | last post by:
Hi I just downloaded the free Watcom compiler and am having a little trouble with File IO http://www.openwatcom.org/index.php/Download I downloaded the following example, commented out the...
2
by: John Nagle | last post by:
Here's a strange little bug. "socket.getaddrinfo" blows up if given a bad domain name containing ".." in Unicode. The same string in ASCII produces the correct "gaierror" exception. Actually,...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.