On Mar 25, 4:43 pm, inte...@interec.net wrote:
Is it possible and ok to use boost::utf8_codecvt_facet to write a
function to convert UTF-16 wchar_t to UTF-8 char and vice versa.
boost::utf8_codecvt_facethttp://www.boost.org/libs/serialization/doc/codecvt.html
How to I code the following functions:
string toUTF8(const wstring sUTF16); // converts utf-16 wstring
into utf-8 string
string toUTF16(const string sUTF8); //converts utf-8 string into
utf-16 wstring
thanks
I don't know about using the Boost library to do this, but I've
written versions of these functions myself. The trick is to iterate
through the strings one UTF32 character at a time and then re-encode
this in the other format. You *must* go through UTF32 or you'll have
incorrect encodings. You must not encode a UTF16 surrogate pair (for
example) as two UTF-8 sequences.
One way to do this is to write the following bits: iterators that take
a UTF-8 sequence or UTF16 sequence and step through one UTF-32
character at a time. The iterator dereferences to the current UTF-32
character.
Then you want functions for converting a single UTF32 character to
either UTF-8 (up to four characters) or UTF-16 (up to two characters).
With those building blocks it's fairly straightforward to do. I think
you may find codecvt much harder to drive from your own code so unless
somebody has already done it it'll probably be easier to write these
functions yourself.
K