472,364 Members | 1,986 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,364 software developers and data experts.

Unicode in C++

Hi,

I've known C/C++ for years, but only ever used ascii strings. I have a
client who wants to know how gcc handles unicode. I've found the functions
utf8_mbtowc, utf8_mbstowcs, utf8_wctomb and utf8_wcstombs, but I'm
wondering if there are any other libraries or functions which can do things
like handle different kinds of encodings?

Thanks
Michael Davis

Jul 23 '05 #1
7 8853
Michael Davis wrote:
Hi,

I've known C/C++ for years, but only ever used ascii strings. I have a
client who wants to know how gcc handles unicode. I've found the functions
utf8_mbtowc, utf8_mbstowcs, utf8_wctomb and utf8_wcstombs, but I'm
wondering if there are any other libraries or functions which can do
things like handle different kinds of encodings?


There is iconv.
Jul 23 '05 #2
Rolf Magnus wrote:
Michael Davis wrote:
Hi,

I've known C/C++ for years, but only ever used ascii strings. I have a
client who wants to know how gcc handles unicode. I've found the
functions utf8_mbtowc, utf8_mbstowcs, utf8_wctomb and utf8_wcstombs, but
I'm wondering if there are any other libraries or functions which can do
things like handle different kinds of encodings?


There is iconv.


Thanks!
md

Jul 23 '05 #3
A proper std:: way is using wchar_t, wstring types - can handle
Unicode strings.
(fstream -> wfstream, ostream -> wostream, istream -> wistream, etc)
To display characters properly (in a window, console) or to save them
in a file you have to use locales (regional settings) that are
available in your computer.

E.g. to find a name of the available locale:
....
#include <locale>
....
...
..

try
{
locale AvailLocale("german");
cout << AvailLocale.name() << endl;
}
catch(runtime_error& e )
{
cout << e.what() << endl;
}

You should get something like this :
German_Germany.1252(in Windows)
de_DE.iso8859-1(in Unix/Linux)
See
http://cvs.sourceforge.net/viewcvs.p...as?rev=1.1.1.3
for more detailed list.
To save a pure Unicode string to file you need to upgrade STL
http://www.codeproject.com/vcpp/stl/...asp?print=true
or to use C-like way (fwrite) but it is not common way of doing that -
it is platform dependent.

Use available locales, e.g.:

locale Ger("German_Germany.1252");
wcout.imbue(Ger); //attach locale to stream
wstring ws(L"A german text...");
wcout << ws << endl;
//to get a current locale of a stream use:
CurrentLocale = wcout.getloc();

It is good to use a text editor that can display/manage these locales.

Also visit
http://www.langer.camelot.de/Article...tion/I18N.html

Jul 23 '05 #4
el****@gmail.com wrote:
A proper std:: way is using wchar_t, wstring types - can handle
Unicode strings.
(fstream -> wfstream, ostream -> wostream, istream -> wistream, etc)


By 'Unicode' you mean UTF-16, right?

Jul 23 '05 #5
Rapscallion wrote:
el****@gmail.com wrote:
A proper std:: way is using wchar_t, wstring types - can handle
Unicode strings.
(fstream -> wfstream, ostream -> wostream, istream -> wistream, etc)

By 'Unicode' you mean UTF-16, right?

Not necessarily. While Windows equates UNICODE with UTF-16, many
of the UNIX implemeations use a 32 bit wchar_t and UNICODE>

Unfortunately, while the various W-versions of the functions can
support wide char (presumably some UNICODE version) strings. Most
of the major C++ interfaces don't support it. The assumption of
the standardizer is there some mutibyte-char type that you can use
for the system interfaces. It's really stupid and causes a pain
in the butt on systems that really don't have that mapping (like
Windows).
Jul 23 '05 #6
Rapscallion wrote:
el****@gmail.com wrote:
A proper std:: way is using wchar_t, wstring types - can handle
Unicode strings.
(fstream -> wfstream, ostream -> wostream, istream -> wistream, etc)


By 'Unicode' you mean UTF-16, right?


By 'Unicode' he should mean wide characters of an unspecified encoding. On
my compiler, it's definitely not UTF-16, because wchar_t is 32bits.

Jul 23 '05 #7
Unicode is a very big character set where each character has its own
index. There is
thousands of characters in this set. Unicode means standard it is not
character encoding. There also exists standard with name ISO 10646.
Theoretically ISO 10646 can handle about billions of characters. The
first 65 536 characters of ISO 10646 are identical with Unicode
standard. Advantage of Unicode or ISO 10646 is that these formats cover
almost every character you would ever need.

Non-Wide Characters - reprezented with CHAR:
Many charsets (ISO 8859-1, ISO 8859-2, ...) include 256 characters - it
means that it is not possible to cover every language in such small
number of characters. But many applications are not able to manage
Unicode at this time so use some of encodings/character representations
available in your OS:

standardized charsets ISO 8859...
or windows-125X ...
or Mac x-mac-ce ...etc
or UTF-8.

UTF? yes but it is reprezented with WIDE CHAR.
UTF-8 is a way how to write a character to file: ASCII characters are
represented with one byte and other characters are represented with
more than one byte.
example: 11000011-10101101

UTF-16: All characters are represented with two bytes. Some of those
characters have a special meaning.
example: 11101101-00000000

To represent all languages as much as possible use wchar_t (one
character), wstring (string). These types are __usually__ able to cover
all characters in Unicode standard with 4 bytes but it can be also 2
bytes. w means wide characters. To use them you have to use streams for
wide characters. Please see std::locale, std::locale::facet. When
using w-objects you have to be sure about your current
encoding/charset.

Usually we express text in programs with CHARs (We can be happy enough
with chars) but sometime we want to use a different language, very
different language that is not covered in the available encoding (with
256 characters, windows-125X, ISO88...). We can handle text in program
like Unicode set (and we can be happy as well) but we (in C++) usually
write to file using available encoding (non-Unicode)in our OS because
it is not possible when using std::. One way is
http://www.codeproject.com/vcpp/stl/...asp?print=true
another way is using C function fwrite:

wchar_t myWString[] = L"Some strange characters."
fwrite(myWString, sizeof(wchar_t), sizeof(myWString)/sizeof(wchar_t),
myFile );

but is is not portable.

Jul 23 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Michael Weir | last post by:
I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file. Does anyone have a couple of lines of code...
8
by: Bill Eldridge | last post by:
I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5,...
8
by: Francis Girard | last post by:
Hi, For the first time in my programmer life, I have to take care of character encoding. I have a question about the BOM marks. If I understand well, into the UTF-8 unicode binary...
48
by: Zenobia | last post by:
Recently I was editing a document in GoLive 6. I like GoLive because it has some nice features such as: * rewrite source code * check syntax * global search & replace (through several files at...
4
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...
2
by: Neil Schemenauer | last post by:
python-dev@python.org.] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. Based on my testing, I believe the idea is...
10
by: Nikolay Petrov | last post by:
How can I convert DOS cyrillic text to Unicode
6
by: Jeff | last post by:
Hi - I'm setting up a streamreader in a VB.NET app to read a text file and display its contents in a multiline textbox. If I set it up with System.Text.Encoding.Unicode, it reads a unicode...
13
by: Tomás | last post by:
Let's start off with: class Nation { public: virtual const char* GetName() const = 0; } class Norway : public Nation { public: virtual const char* GetName() const
24
by: ChaosKCW | last post by:
Hi I am reading from an oracle database using cx_Oracle. I am writing to a SQLite database using apsw. The oracle database is returning utf-8 characters for euopean item names, ie special...
2
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge required to effectively administer and manage Oracle...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and credentials and received a successful connection...
1
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web server and have made sure to enable curl. I get a...
0
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...
1
by: ezappsrUS | last post by:
Hi, I wonder if someone knows where I am going wrong below. I have a continuous form and two labels where only one would be visible depending on the checkbox being checked or not. Below is the...
0
by: jack2019x | last post by:
hello, Is there code or static lib for hook swapchain present? I wanna hook dxgi swapchain present for dx11 and dx9.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.