Connecting Tech Pros Worldwide Help | Site Map

convert raw bytes to Unicode strings

  #1  
Old July 21st, 2008, 02:25 PM
brad
Guest
 
Posts: n/a
Does standard C++ have any methods to do this? I'd like to convert raw
bytes to utf-8. Thanks for any tips.
  #2  
Old July 21st, 2008, 02:55 PM
Victor Bazarov
Guest
 
Posts: n/a

re: convert raw bytes to Unicode strings


brad wrote:
Quote:
Does standard C++ have any methods to do this? I'd like to convert raw
bytes to utf-8. Thanks for any tips.
What is the difference between "raw bytes" and "utf-8"?

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
  #3  
Old July 21st, 2008, 02:55 PM
brad
Guest
 
Posts: n/a

re: convert raw bytes to Unicode strings


Victor Bazarov wrote:
Quote:
What is the difference between "raw bytes" and "utf-8"?
>
V
raw bytes are not character streams. They do not conform to the concept
of a char. grep a binary file for a string, then grep a text file for a
string to gain a better understanding of this difference.
  #4  
Old July 21st, 2008, 03:15 PM
Pascal J. Bourguignon
Guest
 
Posts: n/a

re: convert raw bytes to Unicode strings


brad <byte8bits@gmail.comwrites:
Quote:
Victor Bazarov wrote:
Quote:
>What is the difference between "raw bytes" and "utf-8"?
>V
>
raw bytes are not character streams. They do not conform to the
concept of a char. grep a binary file for a string, then grep a text
file for a string to gain a better understanding of this difference.
But when you take a string containing characters, and you encode it
into a sequence of UTF-8 bytes, you don't get a string, but a sequence
of bytes.

What is the difference between these bytes and your "raw" bytes?

Do you know what UTF-8 is? (read at least wikipedia article about it).


Anyways, there's no standard C++ function to do what you want. You
could use an external library like libiconv, or just write the utf-8
encoding/decoding algorithm in C++ yourself.

--
__Pascal Bourguignon__
  #5  
Old July 21st, 2008, 03:15 PM
Victor Bazarov
Guest
 
Posts: n/a

re: convert raw bytes to Unicode strings


brad wrote:
Quote:
Victor Bazarov wrote:
Quote:
>What is the difference between "raw bytes" and "utf-8"?
>>
>V
>
raw bytes are not character streams. They do not conform to the concept
of a char. grep a binary file for a string, then grep a text file for a
string to gain a better understanding of this difference.
In C++ a byte is a char. The type 'char' is an integral type "large
enough to store any member of the implementation's basic character set".
There is no separate "concept of a char" from that, at least in C++.

C++ has no specific provisions for UTF-8. There is the class 'codecvt'
(actually a class template), that the Standard says "is for use when
converting from one codeset to another". Perhaps you should look into
that...

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Closed Thread


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to check is something is a list or a dictionary or a string? dudeja.rajat@gmail.com answers 6 August 30th, 2008 01:45 PM
[unicode] inconvenient unicode conversion of non-string arguments Holger Joukl answers 5 December 14th, 2006 10:25 PM
unicode wrap unicode object? ygao answers 6 April 8th, 2006 11:05 AM
text decoding from dataset, hmm... help appreciated. Peter Row answers 40 July 21st, 2005 02:58 PM