473,748 Members | 9,599 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

iostream question: How to open unicode file name.

I'm trying to upgrade some old code that used old iostreams.

At one place in the code, I have a path/filename in a wchar_t string
(unicode utf-16).

I need to open an ifstream to that file. But the open() on ifstream only
takes char * strings (mbcs?).

In old iostreams, I could _wopen() the file, get the filedesc, and call
attach() on the ifstream.
But filedescs and attach() don't exist in standard iostreams.

I can't just convert the unicode string to a mbcs string, because the user's
default codepage might not allow for all the characters that are in the
unicode string.

I can of course convert the unicode string to utf-8, but I can't find any
way to get the iostream open() to believe that the string is utf-8.

trying to ibue() a stream with a custom locale that has a custom codecvt
seems to only affect data written to the stream, not how file names passed
to open are interpreted.

Does anyone know how I can accomplish opening an ifstream to a unicode named
file?

Why doesn't ifstream have a wopen() ?

Nov 16 '05 #1
9 13425
Charles F McDevitt wrote:
I'm trying to upgrade some old code that used old iostreams.

At one place in the code, I have a path/filename in a wchar_t string
(unicode utf-16).

I need to open an ifstream to that file. But the open() on ifstream
only takes char * strings (mbcs?).

In old iostreams, I could _wopen() the file, get the filedesc, and
call attach() on the ifstream.
But filedescs and attach() don't exist in standard iostreams.

I can't just convert the unicode string to a mbcs string, because the
user's default codepage might not allow for all the characters that
are in the unicode string.

I can of course convert the unicode string to utf-8, but I can't find
any way to get the iostream open() to believe that the string is
utf-8.

trying to ibue() a stream with a custom locale that has a custom
codecvt seems to only affect data written to the stream, not how file
names passed to open are interpreted.

Does anyone know how I can accomplish opening an ifstream to a
unicode named file?

Why doesn't ifstream have a wopen() ?


Because the C++ community can't agree on how such a function should work.
For a system such as Windows that has a Unicode file system, the desired
behavior seems obvious: just pass the string to the filesystem. But since
C++ tries to address a much wider range of systems, many of which don't
support unicode filesystems, we're left with a C++ standard in which there
is no standard compliant, portable way to open a file given a unicode file
name.

The library implementation supplied with VC does provide a way to do it,
however. Open the file using the C runtime library _wfopen, then create the
ifstream by passing the FILE* that _wfopen() returns to the ifstream
constructor.

#include <fstream>

void foo(const wchar_t* pwsz)
{
std::ifstream stm(_wfopen(pws z,L"rb"));

// Do stuff with the stream
}

-cd
Nov 16 '05 #2

"Carl Daniel [VC++ MVP]" <cp******@nospa m.mvps.org> wrote in message
news:%2******** ********@TK2MSF TNGP09.phx.gbl. ..
Charles F McDevitt wrote:
I'm trying to upgrade some old code that used old iostreams.

At one place in the code, I have a path/filename in a wchar_t string
(unicode utf-16).

I need to open an ifstream to that file. But the open() on ifstream
only takes char * strings (mbcs?).

In old iostreams, I could _wopen() the file, get the filedesc, and
call attach() on the ifstream.
But filedescs and attach() don't exist in standard iostreams.

I can't just convert the unicode string to a mbcs string, because the
user's default codepage might not allow for all the characters that
are in the unicode string.

I can of course convert the unicode string to utf-8, but I can't find
any way to get the iostream open() to believe that the string is
utf-8.

trying to ibue() a stream with a custom locale that has a custom
codecvt seems to only affect data written to the stream, not how file
names passed to open are interpreted.

Does anyone know how I can accomplish opening an ifstream to a
unicode named file?

Why doesn't ifstream have a wopen() ?
Because the C++ community can't agree on how such a function should work.
For a system such as Windows that has a Unicode file system, the desired
behavior seems obvious: just pass the string to the filesystem. But

since C++ tries to address a much wider range of systems, many of which don't
support unicode filesystems, we're left with a C++ standard in which there
is no standard compliant, portable way to open a file given a unicode file
name.

The library implementation supplied with VC does provide a way to do it,
however. Open the file using the C runtime library _wfopen, then create the ifstream by passing the FILE* that _wfopen() returns to the ifstream
constructor.

#include <fstream>

void foo(const wchar_t* pwsz)
{
std::ifstream stm(_wfopen(pws z,L"rb"));

// Do stuff with the stream
}

-cd


Thanks.. I guess that's my only choice.
Nov 16 '05 #3
"Carl Daniel [VC++ MVP]" <cp******@nospa m.mvps.org> wrote:
[...]
Why doesn't ifstream have a wopen() ?
Because the C++ community can't agree on how such a function should work.
For a system such as Windows that has a Unicode file system, the desired
behavior seems obvious: just pass the string to the filesystem. But since
C++ tries to address a much wider range of systems, many of which don't
support unicode filesystems, we're left with a C++ standard in which there
is no standard compliant, portable way to open a file given a unicode file
name.


I consider that a weak argument. After all,
C++ also tries to address systems where there's
no file system at all.
(Carl, I know you're not the one to argue with
about that. However, I just couldn't let that
pass uncommented. <g>)
[...]
-cd


Schobi

--
Sp******@gmx.de is never read
I'm Schobi at suespammers org

"And why should I know better by now/When I'm old enough not to?"
Beth Orton
Nov 16 '05 #4
Hendrik Schober wrote:
"Carl Daniel [VC++ MVP]" <cp******@nospa m.mvps.org> wrote:
[...]
Why doesn't ifstream have a wopen() ?


Because the C++ community can't agree on how such a function should
work. For a system such as Windows that has a Unicode file system,
the desired behavior seems obvious: just pass the string to the
filesystem. But since C++ tries to address a much wider range of
systems, many of which don't support unicode filesystems, we're left
with a C++ standard in which there is no standard compliant,
portable way to open a file given a unicode file name.


I consider that a weak argument. After all,
C++ also tries to address systems where there's
no file system at all.
(Carl, I know you're not the one to argue with
about that. However, I just couldn't let that
pass uncommented. <g>)


Quite so - I consider it a very weak argument as well, but that's the state
of affairs, unfortunately. At least there is a workaround, however ugly and
non-portable it might be.

-cd
Nov 16 '05 #5

"Hendrik Schober" <Sp******@gmx.d e> wrote in message
news:uL******** ******@TK2MSFTN GP10.phx.gbl...
"Carl Daniel [VC++ MVP]" <cp******@nospa m.mvps.org> wrote:
[...]
Why doesn't ifstream have a wopen() ?


Because the C++ community can't agree on how such a function should work. For a system such as Windows that has a Unicode file system, the desired
behavior seems obvious: just pass the string to the filesystem. But since C++ tries to address a much wider range of systems, many of which don't
support unicode filesystems, we're left with a C++ standard in which there is no standard compliant, portable way to open a file given a unicode file name.


I consider that a weak argument. After all,
C++ also tries to address systems where there's
no file system at all.
(Carl, I know you're not the one to argue with
about that. However, I just couldn't let that
pass uncommented. <g>)
[...]


One of the reasons more and more people are moving to Java and C#.... At
least, those languages work in International environments... Unlike C++,
where it is "implementa tion defined" if things will work sensibly.
Nov 16 '05 #6
"Charles F McDevitt" <Ch************ @m-s-n.com> wrote:
[...]
One of the reasons more and more people are moving to Java and C#.... At
least, those languages work in International environments... Unlike C++,
where it is "implementa tion defined" if things will work sensibly.

Well, I don't know much about Java or C#.
But AFAK, in Java, the Unicode char is fixed
to 16bit. While that certainly is platform
independend, it isn't very good either.

Schobi

--
Sp******@gmx.de is never read
I'm Schobi at suespammers org

"And why should I know better by now/When I'm old enough not to?"
Beth Orton
Nov 16 '05 #7

"Hendrik Schober" <Sp******@gmx.d e> wrote in message
news:uX******** *****@TK2MSFTNG P11.phx.gbl...
"Charles F McDevitt" <Ch************ @m-s-n.com> wrote:
[...]
One of the reasons more and more people are moving to Java and C#.... At
least, those languages work in International environments... Unlike C++,
where it is "implementa tion defined" if things will work sensibly.

Well, I don't know much about Java or C#.
But AFAK, in Java, the Unicode char is fixed
to 16bit. While that certainly is platform
independend, it isn't very good either.


It's only fixed 16-bits UTF-16 internal to your Java Code.
You can write or read it from there anyway you want,
although the default is to keep it 16-bit UTF-16.
Java has built-in conversion routines for most character sets you'd want.

In C++, if your characters are wchar_t, it's implementation
defined if they are 16-bit, 32-bit, or some other size,
and implementation defined if they are Unicode or not.
They could be any conceviable character set that fits in the size.

When writing out, say with a wofstream, C++ mandates that the
default behaviour is to convert to narrow characters.

But what the conversion is, is implementation defined.
Even if you happen to have unicode in your wchar_t string,
the default conversion via wofstream could convert to any
character set, and again, it's implementation defined.

Microsoft's choice is the convert to the local code page
(makes sense on Windows except when the local code page can't handle the
unicode characters)
and Linux seems to convert to UTF-8,
and different UNIXes do whatever they think is sensible,
but you can't count on any consistent behavour.
All of this makes it a pain to write portable C++ code.
Nov 16 '05 #8

"Carl Daniel [VC++ MVP]" <cp******@nospa m.mvps.org> wrote in message
news:%2******** ********@TK2MSF TNGP09.phx.gbl. ..
Charles F McDevitt wrote:
I'm trying to upgrade some old code that used old iostreams.

At one place in the code, I have a path/filename in a wchar_t string
(unicode utf-16).

I need to open an ifstream to that file. But the open() on ifstream
only takes char * strings (mbcs?).

In old iostreams, I could _wopen() the file, get the filedesc, and
call attach() on the ifstream.
But filedescs and attach() don't exist in standard iostreams.

I can't just convert the unicode string to a mbcs string, because the
user's default codepage might not allow for all the characters that
are in the unicode string.

I can of course convert the unicode string to utf-8, but I can't find
any way to get the iostream open() to believe that the string is
utf-8.

trying to ibue() a stream with a custom locale that has a custom
codecvt seems to only affect data written to the stream, not how file
names passed to open are interpreted.

Does anyone know how I can accomplish opening an ifstream to a
unicode named file?

Why doesn't ifstream have a wopen() ?
Because the C++ community can't agree on how such a function should work.
For a system such as Windows that has a Unicode file system, the desired
behavior seems obvious: just pass the string to the filesystem. But

since C++ tries to address a much wider range of systems, many of which don't
support unicode filesystems, we're left with a C++ standard in which there
is no standard compliant, portable way to open a file given a unicode file
name.

The library implementation supplied with VC does provide a way to do it,
however. Open the file using the C runtime library _wfopen, then create the ifstream by passing the FILE* that _wfopen() returns to the ifstream
constructor.

#include <fstream>

void foo(const wchar_t* pwsz)
{
std::ifstream stm(_wfopen(pws z,L"rb"));

// Do stuff with the stream
}

-cd


One issue with this approach (other than the need to rewrite a lot of code):

It is OK to call .imbue() after the constructor opens the file?

I need a custom locale (well, custom codecvt facet). Normally, I would
construct
the stream, call imbue(), and then call open().

But, if the only way to open to a unicode named file is by opening in the
constructor,
I need to then imbue() my locale after the open has happened.
Is that legal?
Nov 16 '05 #9
"Charles F McDevitt" <Ch************ @m-s-n.com> wrote:
[...]
It's only fixed 16-bits UTF-16 internal to your Java Code.
If it really is UTF-16, it's all right.
I was told it only takes Unicode < 2^16
(as Windows does).
[...]


Schobi

--
Sp******@gmx.de is never read
I'm Schobi at suespammers org

"And why should I know better by now/When I'm old enough not to?"
Beth Orton
Nov 16 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
3844
by: ernst.stiegler | last post by:
Hello, I just want to read a whole line from a console input. What I don't understand is that I always have to press ENTER twice in order to see the line I've entered. Here's my code : #include <string> #include <iostream>
20
5534
by: Mark | last post by:
I am using gnu g++ version 3.3.2, trying a simple test to read in and then write out a large (100,000 line) text file ########################################## CSTDIO VERSION TO READ/WRITE TEXT FILE: #include <cstdlib> #include <cstdio> using namespace std;
9
2799
by: Frances | last post by:
at work we switched to UTF-16 encoding and now when I open html files in HomeSite the code is all messed up, and if I turn on "enable non-ANSI file encoding" code looks fine when I open files but I get this error when I try to save them.. "The character set defined in your document does not match the encoding format you selected in the Save As Dialog" (even though I don't do Save As, just save..) and sometimes doesn't let me save them...
3
5633
by: Stanislaw Findeisen | last post by:
Does anyone know how to create file shortcuts in Windows? The only way I know is like: --------------------------------------------------------------- import win32com.client wScriptShellObject = win32com.client.Dispatch("WScript.Shell") shortcutName = unicode("shortcut.lnk", "utf8")
8
8585
by: Vinod | last post by:
Hi, I have a stored procedure which expects a varbinary datatype. How can i pass a varbinary datatype from asp.net directly to the stored procedure. I tried using the Convert function in Sql server to convert the string to varbinary but it gives a different value. Can you give me suggestions on how to approach this problem. The SQL server column datatype was made to varbinary because the data is in hexadecimal format.
3
2483
by: MaaSTaaR | last post by:
Hello ... firstly , sorry for my bad English . i have problem with open() function when i use it with file which name in Arabic , the open() will not find the file , and i am sure the file is exist . so how i can solve this problem ?
6
3099
by: kath | last post by:
Hi all, Platform: winxp Version: Python 2.3 I have a task of reading files in a folder and creating an one excel file with sheets, one sheet per file, with sheet named as filename. I am facing problem in handling special characters. I am using XLRD and XLW package to read/write from/to file. But facing problem in handling special characters. I am getting encode error.
8
2033
by: sore eyes | last post by:
Hi I just downloaded the free Watcom compiler and am having a little trouble with File IO http://www.openwatcom.org/index.php/Download I downloaded the following example, commented out the Command line arguments so that I could debug more easily. The example is a simple file copy. and it works. but I would like to customize this to automate some redundant changes in some large files.
2
4039
by: John Nagle | last post by:
Here's a strange little bug. "socket.getaddrinfo" blows up if given a bad domain name containing ".." in Unicode. The same string in ASCII produces the correct "gaierror" exception. Actually, this deserves a documentation mention. The "socket" module, given a Unicode string, calls the International Domain Name parser, "idna.py", which has a a whole error system of its own. The IDNA documentation says that "Furthermore, the socket...
0
8830
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9370
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9247
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6796
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6074
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4874
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3312
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2782
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2215
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.