Connecting Tech Pros Worldwide Forums | Help | Site Map

convert raw bytes to Unicode strings

brad
Guest
 
Posts: n/a
#1: Jul 21 '08
Does standard C++ have any methods to do this? I'd like to convert raw
bytes to utf-8. Thanks for any tips.

Victor Bazarov
Guest
 
Posts: n/a
#2: Jul 21 '08

re: convert raw bytes to Unicode strings


brad wrote:
Quote:
Does standard C++ have any methods to do this? I'd like to convert raw
bytes to utf-8. Thanks for any tips.
What is the difference between "raw bytes" and "utf-8"?

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
brad
Guest
 
Posts: n/a
#3: Jul 21 '08

re: convert raw bytes to Unicode strings


Victor Bazarov wrote:
Quote:
What is the difference between "raw bytes" and "utf-8"?
>
V
raw bytes are not character streams. They do not conform to the concept
of a char. grep a binary file for a string, then grep a text file for a
string to gain a better understanding of this difference.
Pascal J. Bourguignon
Guest
 
Posts: n/a
#4: Jul 21 '08

re: convert raw bytes to Unicode strings


brad <byte8bits@gmail.comwrites:
Quote:
Victor Bazarov wrote:
Quote:
>What is the difference between "raw bytes" and "utf-8"?
>V
>
raw bytes are not character streams. They do not conform to the
concept of a char. grep a binary file for a string, then grep a text
file for a string to gain a better understanding of this difference.
But when you take a string containing characters, and you encode it
into a sequence of UTF-8 bytes, you don't get a string, but a sequence
of bytes.

What is the difference between these bytes and your "raw" bytes?

Do you know what UTF-8 is? (read at least wikipedia article about it).


Anyways, there's no standard C++ function to do what you want. You
could use an external library like libiconv, or just write the utf-8
encoding/decoding algorithm in C++ yourself.

--
__Pascal Bourguignon__
Victor Bazarov
Guest
 
Posts: n/a
#5: Jul 21 '08

re: convert raw bytes to Unicode strings


brad wrote:
Quote:
Victor Bazarov wrote:
Quote:
>What is the difference between "raw bytes" and "utf-8"?
>>
>V
>
raw bytes are not character streams. They do not conform to the concept
of a char. grep a binary file for a string, then grep a text file for a
string to gain a better understanding of this difference.
In C++ a byte is a char. The type 'char' is an integral type "large
enough to store any member of the implementation's basic character set".
There is no separate "concept of a char" from that, at least in C++.

C++ has no specific provisions for UTF-8. There is the class 'codecvt'
(actually a class template), that the Standard says "is for use when
converting from one codeset to another". Perhaps you should look into
that...

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Closed Thread