473,729 Members | 2,272 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

std::wstringbuf and imbue to convert from utf-8 to wchar_t?

Hi,

I have an API that returns UTF-8 encoded strings. I have a utf8 codevt
facet available to do the conversion from UTF-8 to wchar_t encoding
defined by the platform. I have no trouble converting when a UTF-8
encoded string comes from file - I just create a std::wifstream and
imbue it with a locale that uses the utf-8 facet for
std::locale::ct ype. Then I just use operator>to get wstring properly
decoded from UTF-8. I thought I could create something similar for
std::wstringstr eam or std::wstringbuf , but I have a hard time with it.

I imagine the situation that if a std::wstringstr eam is imbued with
UTF-8, then it stored an array of char (not wchar_t) which is encoded
with UTF-8. I can push to it or get from it wide string like I like,
and the result is encoded in UTF-8 in some internal buffer.

What I now need is to be able to supply my UTF-8 buffer prefilled with
the values I need in UTF-8 to act as the internal UTF-8 encoded buffer
for the std::wstingbuf, and then call operator>>(..., std::wstring &),
to get the wide-string representation converted from the UTF-8 to the
proper wide encoding. Also while I am at it, I would like to know the
reverse - how to get this internal UTF-8 encoded buffer (so I can push
wstrings into it as I like and get a "char *" encoded in UTF-8).

Sample code (how I would imagine it):
char name[] = "Boris Du" "\xc5\xa1" "ek"; // my name - Boris Dušek
std::wstringstr eam conv;
conv.rdbuf()->pubsetcharbuf( name, 11); // pubsetbuf only accepts
"wchar_t *", not "char *"
std::wstring wname;
conv >wname; // now my name should be properly decoded from UTF-8

Thanks for any suggestions,
Boris
Nov 2 '08 #1
4 6873
Sample code (how I would imagine it):
char name[] = "Boris Du" "\xc5\xa1" "ek"; // my name - Boris Dušek
std::wstringstr eam conv;
conv.rdbuf()->pubsetcharbuf( name, 11); // pubsetbuf only accepts
"wchar_t *", not "char *"
std::wstring wname;
conv >wname; // now my name should be properly decoded from UTF-8
Please imagine that I imbued conv with a UTF-8 locale; I forgot to put
that into the above code.
Nov 2 '08 #2
Sam
Boris Dušek writes:
>Sample code (how I would imagine it):
char name[] = "Boris Du" "\xc5\xa1" "ek"; // my name - Boris Dušek
std::wstringst ream conv;
conv.rdbuf()->pubsetcharbuf( name, 11); // pubsetbuf only accepts
"wchar_t *", not "char *"
std::wstring wname;
conv >wname; // now my name should be properly decoded from UTF-8
Please imagine that I imbued conv with a UTF-8 locale; I forgot to put
that into the above code.
You need to instantiate a std::locale("en _US.utf-8"). Then, invoke
std::use_facet< std::codecvt<wc har_t, char(locale) to obtain a reference
to a std::codecvt<wc har_t, charobject. Then, use the object's in() and
out() methods to convert between utf-8 encoded chars, and wchar_t.

Yes, this is a rather convoluted. I don't understand why doing these kinds
of things have to be so complicated, but, that's how it is.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEABECAAYFAkk OPV0ACgkQx9p3GY HlUOKX9QCeMT+DH andRYV30I5mLmtF Ccwd
2eMAn00NB6xhCd8 4NdfZQFLawaQIhX ca
=jCF1
-----END PGP SIGNATURE-----

Nov 2 '08 #3
On Nov 2, 8:27 pm, Boris Du?ek <boris.du...@gm ail.comwrote:
I have an API that returns UTF-8 encoded strings. I have a
utf8 codevt facet available to do the conversion from UTF-8 to
wchar_t encoding defined by the platform. I have no trouble
converting when a UTF-8 encoded string comes from file - I
just create a std::wifstream and imbue it with a locale that
uses the utf-8 facet for std::locale::ct ype. Then I just use
operator>to get wstring properly decoded from UTF-8. I
thought I could create something similar for
std::wstringstr eam or std::wstringbuf , but I have a hard time
with it.
It won't work, because wstringbuf doesn't take input or generate
output in the form of char's. wstringbuf uses a wstring. The
code translation in wfilebuf takes place in the wfilebuf, not in
any of the base classes, and it takes place because all file IO
in C++ involves char's; it's there to allow you to transfer
char's to and from the disk, while only seeing wchar_t at the
interface with the class.
I imagine the situation that if a std::wstringstr eam is imbued
with UTF-8, then it stored an array of char (not wchar_t)
which is encoded with UTF-8. I can push to it or get from it
wide string like I like, and the result is encoded in UTF-8 in
some internal buffer.
What I now need is to be able to supply my UTF-8 buffer
prefilled with the values I need in UTF-8 to act as the
internal UTF-8 encoded buffer for the std::wstingbuf, and then
call operator>>(..., std::wstring &), to get the wide-string
representation converted from the UTF-8 to the proper wide
encoding. Also while I am at it, I would like to know the
reverse - how to get this internal UTF-8 encoded buffer (so I
can push wstrings into it as I like and get a "char *" encoded
in UTF-8).
Sample code (how I would imagine it):
char name[] = "Boris Du" "\xc5\xa1" "ek"; // my name - Boris Du?ek
std::wstringstr eam conv;
conv.rdbuf()->pubsetcharbuf( name, 11); // pubsetbuf only accepts
"wchar_t *", not "char *"
There is no pubsetcharbuf function. It's the str() function
you'd be interested in. But in all cases; the character type of
a wstringbuf is always wchar_t; the class does not support
conversion to any other basic type. (That is, in a way, the
price we pay for it being a template.)

--
James Kanze (GABI Software) email:ja******* **@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Nov 3 '08 #4
Thanks to both of you. I now see while filebuf is special. I found a
wbuffer_convert template at Dinkumware's site, and also found that it
will be in C++0x. I also discovered Boost.Iostreams ' code_converter (I
am already using Boost.Iostreams in my project, so using it is easy).
Nov 3 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
28187
by: Flzw | last post by:
How to convert a std::string to a WCHAR* ? is there any methods or something ? I can't find. Thanks
1
2206
by: Voronkov Konstantin | last post by:
Hello all! std::ostringstream stream; stream << 8080; std::string str = stream.str(); // str == "8 080" The code shown above in mine big program result the str variable value to
8
14329
by: davihigh | last post by:
My Friends: I am using std::ofstream (as well as ifstream), I hope that when i wrote in some std::string(...) with locale, ofstream can convert to UTF-8 encoding and save file to disk. So does ifstream. Something I found shows that, I need to have a proper codecvt to set it. I need more information, maybe a small piece of code sample. Thank you!
8
16482
by: Divick | last post by:
Hi all, can somebody tell how much std::wstring is supported across different compilers on different platforms? AFAIK std::string is supported by almost all C++ compilers and almost all platforms, is that also the case with wstring? Another related question that I have is, is it advisable to use wstring than string for unicode support? To be able to support Unicode build, is it that all the occurrence of std::string will need to be...
7
9778
by: Ralf Goertz | last post by:
Hi, since my previous post <455440ad$0$30326$9b4e6d93@newsspool1.arcor-online.netis still unanswered I'd like to rephrase my question. In order to read/write a wstring in UTF-8 encoding it is *not* sufficient to imbue the stream with a locale like "de_DE.UTF-8". Doing so only takes care of facets of decimal numbers and the like. Rather, one has to call locale::global("de_DE.UTF-8"). Is this behaviour conforming to the standard? And if...
2
11919
by: cris | last post by:
Hi, I got trouble when I try to change the locale used in a program foo.cc // ----------------- begin #include <iostream> #include <locale> #include <exception> using namespace std; int main(int argc, char** argv)
2
3893
by: year1943 | last post by:
There was the same topic not so long ago, but as I see it stays w/o answer: http://groups.google.ru/group/comp.lang.c++/browse_thread/thread/9a05d7bba9394a60/fe109c899f916871?lnk=gst&q=locale+imbue&rnum=1&hl=ru#fe109c899f916871 As Bjarne Stroustrup said in his book, "in Stroustrup (retranslated from German) "Setting the global locale does not affect existing input/output streams. The streams continue to use those locales that were...
4
6214
by: barnum | last post by:
Hi, I have a std::string which I know is UTF-8 encoded. How can I make a System::String^ from it? I tried UTF8Encoding class, but it wants a Byte array, and I don't know how to get that from a std::string. Thanks for any help!
3
11664
by: Travis | last post by:
Is there an easy to convert from UnicodeString to string or char *?
2
5308
by: mathieu | last post by:
Hi, I am playing with the following C++ piece of code (*). At least on my system debian/gcc 4.3 it looks like I am not writing out a floating point separator as a comma. what are the operation affected by the LC_NUMERIC env var value ? Thanks -Mathieu
0
8761
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9426
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9281
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9200
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
6722
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6022
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4525
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3238
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2163
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.