473,324 Members | 2,541 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

std::wstringbuf and imbue to convert from utf-8 to wchar_t?

Hi,

I have an API that returns UTF-8 encoded strings. I have a utf8 codevt
facet available to do the conversion from UTF-8 to wchar_t encoding
defined by the platform. I have no trouble converting when a UTF-8
encoded string comes from file - I just create a std::wifstream and
imbue it with a locale that uses the utf-8 facet for
std::locale::ctype. Then I just use operator>to get wstring properly
decoded from UTF-8. I thought I could create something similar for
std::wstringstream or std::wstringbuf, but I have a hard time with it.

I imagine the situation that if a std::wstringstream is imbued with
UTF-8, then it stored an array of char (not wchar_t) which is encoded
with UTF-8. I can push to it or get from it wide string like I like,
and the result is encoded in UTF-8 in some internal buffer.

What I now need is to be able to supply my UTF-8 buffer prefilled with
the values I need in UTF-8 to act as the internal UTF-8 encoded buffer
for the std::wstingbuf, and then call operator>>(..., std::wstring &),
to get the wide-string representation converted from the UTF-8 to the
proper wide encoding. Also while I am at it, I would like to know the
reverse - how to get this internal UTF-8 encoded buffer (so I can push
wstrings into it as I like and get a "char *" encoded in UTF-8).

Sample code (how I would imagine it):
char name[] = "Boris Du" "\xc5\xa1" "ek"; // my name - Boris Dušek
std::wstringstream conv;
conv.rdbuf()->pubsetcharbuf(name, 11); // pubsetbuf only accepts
"wchar_t *", not "char *"
std::wstring wname;
conv >wname; // now my name should be properly decoded from UTF-8

Thanks for any suggestions,
Boris
Nov 2 '08 #1
4 6834
Sample code (how I would imagine it):
char name[] = "Boris Du" "\xc5\xa1" "ek"; // my name - Boris Dušek
std::wstringstream conv;
conv.rdbuf()->pubsetcharbuf(name, 11); // pubsetbuf only accepts
"wchar_t *", not "char *"
std::wstring wname;
conv >wname; // now my name should be properly decoded from UTF-8
Please imagine that I imbued conv with a UTF-8 locale; I forgot to put
that into the above code.
Nov 2 '08 #2
Sam
Boris Dušek writes:
>Sample code (how I would imagine it):
char name[] = "Boris Du" "\xc5\xa1" "ek"; // my name - Boris Dušek
std::wstringstream conv;
conv.rdbuf()->pubsetcharbuf(name, 11); // pubsetbuf only accepts
"wchar_t *", not "char *"
std::wstring wname;
conv >wname; // now my name should be properly decoded from UTF-8
Please imagine that I imbued conv with a UTF-8 locale; I forgot to put
that into the above code.
You need to instantiate a std::locale("en_US.utf-8"). Then, invoke
std::use_facet< std::codecvt<wchar_t, char(locale) to obtain a reference
to a std::codecvt<wchar_t, charobject. Then, use the object's in() and
out() methods to convert between utf-8 encoded chars, and wchar_t.

Yes, this is a rather convoluted. I don't understand why doing these kinds
of things have to be so complicated, but, that's how it is.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEABECAAYFAkkOPV0ACgkQx9p3GYHlUOKX9QCeMT+DHandRY V30I5mLmtFCcwd
2eMAn00NB6xhCd84NdfZQFLawaQIhXca
=jCF1
-----END PGP SIGNATURE-----

Nov 2 '08 #3
On Nov 2, 8:27 pm, Boris Du?ek <boris.du...@gmail.comwrote:
I have an API that returns UTF-8 encoded strings. I have a
utf8 codevt facet available to do the conversion from UTF-8 to
wchar_t encoding defined by the platform. I have no trouble
converting when a UTF-8 encoded string comes from file - I
just create a std::wifstream and imbue it with a locale that
uses the utf-8 facet for std::locale::ctype. Then I just use
operator>to get wstring properly decoded from UTF-8. I
thought I could create something similar for
std::wstringstream or std::wstringbuf, but I have a hard time
with it.
It won't work, because wstringbuf doesn't take input or generate
output in the form of char's. wstringbuf uses a wstring. The
code translation in wfilebuf takes place in the wfilebuf, not in
any of the base classes, and it takes place because all file IO
in C++ involves char's; it's there to allow you to transfer
char's to and from the disk, while only seeing wchar_t at the
interface with the class.
I imagine the situation that if a std::wstringstream is imbued
with UTF-8, then it stored an array of char (not wchar_t)
which is encoded with UTF-8. I can push to it or get from it
wide string like I like, and the result is encoded in UTF-8 in
some internal buffer.
What I now need is to be able to supply my UTF-8 buffer
prefilled with the values I need in UTF-8 to act as the
internal UTF-8 encoded buffer for the std::wstingbuf, and then
call operator>>(..., std::wstring &), to get the wide-string
representation converted from the UTF-8 to the proper wide
encoding. Also while I am at it, I would like to know the
reverse - how to get this internal UTF-8 encoded buffer (so I
can push wstrings into it as I like and get a "char *" encoded
in UTF-8).
Sample code (how I would imagine it):
char name[] = "Boris Du" "\xc5\xa1" "ek"; // my name - Boris Du?ek
std::wstringstream conv;
conv.rdbuf()->pubsetcharbuf(name, 11); // pubsetbuf only accepts
"wchar_t *", not "char *"
There is no pubsetcharbuf function. It's the str() function
you'd be interested in. But in all cases; the character type of
a wstringbuf is always wchar_t; the class does not support
conversion to any other basic type. (That is, in a way, the
price we pay for it being a template.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Nov 3 '08 #4
Thanks to both of you. I now see while filebuf is special. I found a
wbuffer_convert template at Dinkumware's site, and also found that it
will be in C++0x. I also discovered Boost.Iostreams' code_converter (I
am already using Boost.Iostreams in my project, so using it is easy).
Nov 3 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: Flzw | last post by:
How to convert a std::string to a WCHAR* ? is there any methods or something ? I can't find. Thanks
1
by: Voronkov Konstantin | last post by:
Hello all! std::ostringstream stream; stream << 8080; std::string str = stream.str(); // str == "8 080" The code shown above in mine big program result the str variable value to
8
by: davihigh | last post by:
My Friends: I am using std::ofstream (as well as ifstream), I hope that when i wrote in some std::string(...) with locale, ofstream can convert to UTF-8 encoding and save file to disk. So does...
8
by: Divick | last post by:
Hi all, can somebody tell how much std::wstring is supported across different compilers on different platforms? AFAIK std::string is supported by almost all C++ compilers and almost all platforms,...
7
by: Ralf Goertz | last post by:
Hi, since my previous post <455440ad$0$30326$9b4e6d93@newsspool1.arcor-online.netis still unanswered I'd like to rephrase my question. In order to read/write a wstring in UTF-8 encoding it is...
2
by: cris | last post by:
Hi, I got trouble when I try to change the locale used in a program foo.cc // ----------------- begin #include <iostream> #include <locale> #include <exception> using namespace std; int...
2
by: year1943 | last post by:
There was the same topic not so long ago, but as I see it stays w/o answer:...
4
by: barnum | last post by:
Hi, I have a std::string which I know is UTF-8 encoded. How can I make a System::String^ from it? I tried UTF8Encoding class, but it wants a Byte array, and I don't know how to get that from a...
3
by: Travis | last post by:
Is there an easy to convert from UnicodeString to string or char *?
2
by: mathieu | last post by:
Hi, I am playing with the following C++ piece of code (*). At least on my system debian/gcc 4.3 it looks like I am not writing out a floating point separator as a comma. what are the operation...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.