jonathanmcdougall@DELyahoo.ca (02 May 2005 06:56,
news:<uqide.6254$SI2.577229@wagner.videotron.net>) a écrit :
[color=blue]
> I started using boost's filesystem library a
> couple of days ago. In its FAQ, it states
>
> "Wide-character names would provide an illusion of
> portability where portability does not in fact
> exist. Behavior would be completely different on
> operating systems (Windows, for example) that
> support wide-character names, than on systems[/color]
I think you overlooked a detail here : what made this portability an
"illusion" is how different file-systems define file-names rules.
It doesn't mean use of wide-character strings in C++ is not portable, only
that using unicode filenames with various native filesystems is not..
(but boost::filesystem started an "i18n" branch recently, and accessing
files by wide-char names seems to be on the menu, so someone probably took
some time to make that work for the most common platforms)
[color=blue]
> 2. Is it a good idea to let the user choose
> between Unicode and ASCII in a library in a
> transparent way (such as Microsoft's -A and -W
> versions of all functions)?[/color]
I think it is.
Posix systems let the user choose a locale (by setting $LANG, or various
sub-variables like LC_TYPE ..).
You can (hope to) get the user's environment locale with portable C++ :
std::locale userLocale("");
but then, there's no easy portable way to know whether this locale uses
UTF-8 for charset encoding or what. (On posix systems you can try to detect
whether "UTF-8" occurs in the useLocale.name() string)
in basic situations, you should *not* need to know, but just :
.. use wide-chars in your program, and wide streams
.. imbue the user's locale on all the wide streams you use and let them
handle conversions.
[ In fact wcout might not work as well as any other widestream .. I found
that imbuing on wcout was ignored, and setting the global locale :
std::global(userLocale);
prior to using wcout was the only way to get the locale have any effect on
wcout ]
[color=blue]
> 3. What is the best way to convert wide strings to
> and from narrow strings? System-dependent
> functions? A simple loop converting char's to
> wchar_t's?[/color]
I think the expected way is to let the wide streams handle the conversions.
They use their locale's codecvt facet to convert the internal char_type
sequences to the external char encoding.
if you have to widen/narrow stuff yourself, you can use a locale's widen and
narrow function.
everything boils down to using the "right" locale for your situation.
(note boost - and other portable libraries - provide UTF-8 locales that
provide conversion from wchars holding unicode code-points to UTF-8 encoded
char sequences)
[color=blue]
> 4. Will C++0x provide more means for using wide
> and narrow strings, such as conversions and
> transparency (converting "strings" into L"strings"
> automatically, for example, providing standard
> macros such as UNICODE)[/color]
that's already handled by current standard, but this conversion is not
canonical, different locales can mean different conversions, so this
depends on the locale.
A locale's codecvt<wchar_t, char, mbstate_t> facet serves that purpose.
For more details on locales, check Stroustrup's Appendix D :
http://www.research.att.com/~bs/3rd_loc0.html
[color=blue]
> 5. Are wide characters meant to be used with
> Unicode or are they provided for an
> implemention-defined use?[/color]
mostly everything is implementation-defined when it comes to locales and
wide-chars..
the values in wchar_t are most of the times "unicode" (UTF-32) code-points,
but check your compiler's documentation if you have to rely on it .. For
instance, gcc-3.4 lets you modify that with command-line option
-fexec-wide-charset, and uses UTF-32 by default.
The way I see it, you can either :
1. use the compiler's native encoding of wide characters, along with the
native locales, and let your compiler's library do its work. In this case,
you don't care what the values are in those wchar_t, as long as it matches
what the locales expect. (and it should !).
2. enforce your own wide-char encoding (on a 4+ bytes type), and your own
conversions (with a 3rd party facet, or set of functions), without ever
using the compiler's native locale and wide IO features.
3. if you want to mix native stuff with third-party tools : set-up the
proper native-to-UTF-32 conversion system (e.g. make a header which tests
compiler-specific and std::library-specific macros, and does the proper
conversion, or aborts, or whatever. In most of the cases, the proper
conversion is keeping the wchar_t values untouched) and apply that
conversion between native calls and third-party UTF-32 calls.
--
Samuel