By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,722 Members | 1,264 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,722 IT Pros & Developers. It's quick & easy.

std::wstringbuf and imbue to convert from utf-8 to wchar_t?

P: n/a
Hi,

I have an API that returns UTF-8 encoded strings. I have a utf8 codevt
facet available to do the conversion from UTF-8 to wchar_t encoding
defined by the platform. I have no trouble converting when a UTF-8
encoded string comes from file - I just create a std::wifstream and
imbue it with a locale that uses the utf-8 facet for
std::locale::ctype. Then I just use operator>to get wstring properly
decoded from UTF-8. I thought I could create something similar for
std::wstringstream or std::wstringbuf, but I have a hard time with it.

I imagine the situation that if a std::wstringstream is imbued with
UTF-8, then it stored an array of char (not wchar_t) which is encoded
with UTF-8. I can push to it or get from it wide string like I like,
and the result is encoded in UTF-8 in some internal buffer.

What I now need is to be able to supply my UTF-8 buffer prefilled with
the values I need in UTF-8 to act as the internal UTF-8 encoded buffer
for the std::wstingbuf, and then call operator>>(..., std::wstring &),
to get the wide-string representation converted from the UTF-8 to the
proper wide encoding. Also while I am at it, I would like to know the
reverse - how to get this internal UTF-8 encoded buffer (so I can push
wstrings into it as I like and get a "char *" encoded in UTF-8).

Sample code (how I would imagine it):
char name[] = "Boris Du" "\xc5\xa1" "ek"; // my name - Boris Dušek
std::wstringstream conv;
conv.rdbuf()->pubsetcharbuf(name, 11); // pubsetbuf only accepts
"wchar_t *", not "char *"
std::wstring wname;
conv >wname; // now my name should be properly decoded from UTF-8

Thanks for any suggestions,
Boris
Nov 2 '08 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Sample code (how I would imagine it):
char name[] = "Boris Du" "\xc5\xa1" "ek"; // my name - Boris Dušek
std::wstringstream conv;
conv.rdbuf()->pubsetcharbuf(name, 11); // pubsetbuf only accepts
"wchar_t *", not "char *"
std::wstring wname;
conv >wname; // now my name should be properly decoded from UTF-8
Please imagine that I imbued conv with a UTF-8 locale; I forgot to put
that into the above code.
Nov 2 '08 #2

P: n/a
Sam
Boris Dušek writes:
>Sample code (how I would imagine it):
char name[] = "Boris Du" "\xc5\xa1" "ek"; // my name - Boris Dušek
std::wstringstream conv;
conv.rdbuf()->pubsetcharbuf(name, 11); // pubsetbuf only accepts
"wchar_t *", not "char *"
std::wstring wname;
conv >wname; // now my name should be properly decoded from UTF-8
Please imagine that I imbued conv with a UTF-8 locale; I forgot to put
that into the above code.
You need to instantiate a std::locale("en_US.utf-8"). Then, invoke
std::use_facet< std::codecvt<wchar_t, char(locale) to obtain a reference
to a std::codecvt<wchar_t, charobject. Then, use the object's in() and
out() methods to convert between utf-8 encoded chars, and wchar_t.

Yes, this is a rather convoluted. I don't understand why doing these kinds
of things have to be so complicated, but, that's how it is.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEABECAAYFAkkOPV0ACgkQx9p3GYHlUOKX9QCeMT+DHandRY V30I5mLmtFCcwd
2eMAn00NB6xhCd84NdfZQFLawaQIhXca
=jCF1
-----END PGP SIGNATURE-----

Nov 2 '08 #3

P: n/a
On Nov 2, 8:27 pm, Boris Du?ek <boris.du...@gmail.comwrote:
I have an API that returns UTF-8 encoded strings. I have a
utf8 codevt facet available to do the conversion from UTF-8 to
wchar_t encoding defined by the platform. I have no trouble
converting when a UTF-8 encoded string comes from file - I
just create a std::wifstream and imbue it with a locale that
uses the utf-8 facet for std::locale::ctype. Then I just use
operator>to get wstring properly decoded from UTF-8. I
thought I could create something similar for
std::wstringstream or std::wstringbuf, but I have a hard time
with it.
It won't work, because wstringbuf doesn't take input or generate
output in the form of char's. wstringbuf uses a wstring. The
code translation in wfilebuf takes place in the wfilebuf, not in
any of the base classes, and it takes place because all file IO
in C++ involves char's; it's there to allow you to transfer
char's to and from the disk, while only seeing wchar_t at the
interface with the class.
I imagine the situation that if a std::wstringstream is imbued
with UTF-8, then it stored an array of char (not wchar_t)
which is encoded with UTF-8. I can push to it or get from it
wide string like I like, and the result is encoded in UTF-8 in
some internal buffer.
What I now need is to be able to supply my UTF-8 buffer
prefilled with the values I need in UTF-8 to act as the
internal UTF-8 encoded buffer for the std::wstingbuf, and then
call operator>>(..., std::wstring &), to get the wide-string
representation converted from the UTF-8 to the proper wide
encoding. Also while I am at it, I would like to know the
reverse - how to get this internal UTF-8 encoded buffer (so I
can push wstrings into it as I like and get a "char *" encoded
in UTF-8).
Sample code (how I would imagine it):
char name[] = "Boris Du" "\xc5\xa1" "ek"; // my name - Boris Du?ek
std::wstringstream conv;
conv.rdbuf()->pubsetcharbuf(name, 11); // pubsetbuf only accepts
"wchar_t *", not "char *"
There is no pubsetcharbuf function. It's the str() function
you'd be interested in. But in all cases; the character type of
a wstringbuf is always wchar_t; the class does not support
conversion to any other basic type. (That is, in a way, the
price we pay for it being a template.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Nov 3 '08 #4

P: n/a
Thanks to both of you. I now see while filebuf is special. I found a
wbuffer_convert template at Dinkumware's site, and also found that it
will be in C++0x. I also discovered Boost.Iostreams' code_converter (I
am already using Boost.Iostreams in my project, so using it is easy).
Nov 3 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.