473,716 Members | 2,558 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

C++0x two Unicode proposals. A correction one and a different one

Based on a discussion about Unicode in clc++ inside a discussion thread
with subject "next ISO C++ standard", and the data provided in
http://en.wikipedia.org/wiki/C%2B%2B0x , and with the design ideals:

1. To provide Unicode support in C++0x always and explicitly.
2. To provide support to all Unicode sets out there.
I think the implementation of these as:

a) char, char16_t and char32_t types.
b) built-in Unicode literals.

should become:

I) Library, implementation defined types like utf8_char, utf16_char, and
utf32_char, leaving alone and not polluting the existing built in types
like char for now and in the future.

II) Leave b) as it is.
In this way, the built in types are not polluted with additional
ever-growing list of UTFs, while in the future the old ones can easily
be deprecated/obsoleted in the library. The pollution of an ever growing
list of UTF characters and literals will be minimal.

Also I think this UTF implementation change will cause minimal change in
the existing C++0x.

---------------------------------------------------------------------------
My second thought on this, is that Unicode support should also become
optional. This will further decrease pollution of built in types and
string literals. An implementation should be able to choose whether it
will support Unicode and which one.
Jan 17 '08 #1
2 1981
Ioannis Vranos wrote:
Based on a discussion about Unicode in clc++ inside a discussion thread
with subject "next ISO C++ standard", and the data provided in
http://en.wikipedia.org/wiki/C%2B%2B0x , and with the design ideals:

1. To provide Unicode support in C++0x always and explicitly.
2. To provide support to all Unicode sets out there.
I think the implementation of these as:

a) char, char16_t and char32_t types.
b) built-in Unicode literals.

should become:

I) Library, implementation defined types like utf8_char, utf16_char, and
utf32_char, leaving alone and not polluting the existing built in types
like char for now and in the future.
The problem is that if the library does something like this:

typedef uint32_t char32_t;

then when I write

char32_t c = L'a';
cout << c;

It will output c as "64", not 'c', because the overloading of operator<<
can't detect the typedef.

The library could implement a char32_t like

class char32_t {
uint32_t impl;
....
};

but that has its own problems. It all works OK if these are built-in types.
II) Leave b) as it is.
So if I write a UTF-16 literal using the built-in literal syntax, what
is its type? It has to be a built-in type, not a library type.
Phil.
Jan 17 '08 #2
Phil Endecott wrote:
Ioannis Vranos wrote:
>Based on a discussion about Unicode in clc++ inside a discussion thread
with subject "next ISO C++ standard", and the data provided in
http://en.wikipedia.org/wiki/C%2B%2B0x , and with the design ideals:

1. To provide Unicode support in C++0x always and explicitly.
2. To provide support to all Unicode sets out there.
I think the implementation of these as:

a) char, char16_t and char32_t types.
b) built-in Unicode literals.

should become:

I) Library, implementation defined types like utf8_char, utf16_char, and
utf32_char, leaving alone and not polluting the existing built in types
like char for now and in the future.

The problem is that if the library does something like this:

typedef uint32_t char32_t;

then when I write

char32_t c = L'a';
cout << c;

It will output c as "64", not 'c', because the overloading of operator<<
can't detect the typedef.

Well, then the library should not do that typedef and operator<< of cout
should be implemented to work with the provided character type.

The library could implement a char32_t like

class char32_t {
uint32_t impl;
....
};

but that has its own problems. It all works OK if these are built-in
types.

If your above type suggestion is not possible to be implemented, why not
focusing on providing language tools that make it possible instead?
>
>II) Leave b) as it is.

So if I write a UTF-16 literal using the built-in literal syntax, what
is its type? It has to be a built-in type, not a library type.

It can be a library type. AFAIK a built-in type can also look like a
library type, if it is hidden when the equivalent header is not #included.

In any case my main point of my "correction " proposal, is that the C++
built-in types should not be tied with a specific character encoding system.

Consider the possibility if after some years, a now non-existent, new
character system becomes the dominant one, while C++ built in types are
tied with Unicode.

Having any specific character system provided as a library extension
(implementation-defined type), C++ will have the flexibility to adapt to
new character systems that will emerge in the future without messing
with its built in types.

The same way math-specific types should not become built-in in C++ but
as library extensions, I think the same should happen with character
systems, regular expressions etc.

So as another example, although probably not needed in standard C++,
let's consider adding EBCDIC support explicitly as a library extension.

Something like:

#include <whatever>

// ...
std::ebcdic_cha r *p= EB"This is a text";
std::ebcdic char c= EB'c';
This style can work for whatever character type system. UTF8, UTF16,
UTF32 whatever.

I think tiying any specific character system with built in types, is
Java-style approach (like C#/.NET etc.) which is a whole framework and
not a programming language alone, and can be changed at will.
Apart from this, I also think that wchar_t should be the largest
character system a specific compiler provides, so for example if a
compiler provides UTF32 as its largest character type, for this compiler
wchar_t should be equivalent with the UTF32 character type of this
compiler.
Jan 17 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
2341
by: Jenta | last post by:
A World Beyond Capitalism 2005, An Annual International Multiracial Alliance Building Peace Conference Is Accepting Proposals... ...and Online Registration is now available if you plan to table and participate in the International Grassroots Exhibition: http://www.lfhniivaaaa.info/awbcgrassrootsofpeace We would greatly like some proposals from all people worldwide, especially
6
1402
by: A.M. Kuchling | last post by:
For anyone who's interested: the Python wiki now contains a list of the PSF-mentored proposals that were accepted for Google's Summer of Code: http://wiki.python.org/moin/SummerOfCode --amk
48
4628
by: Zenobia | last post by:
Recently I was editing a document in GoLive 6. I like GoLive because it has some nice features such as: * rewrite source code * check syntax * global search & replace (through several files at once) * regular expression search & replace. Normally my documents are encoded with the ISO setting. Recently I was writing an XHTML document. After changing the encoding to UTF-8 I used the
14
1803
by: Ioannis Vranos | last post by:
I would like to see your views on these. C++98 is already a large language since it supports 4 paradigms and each one is supported well, with optimal space and time efficiency. And this is excellent. From the few things that i have read about C++0x, in addition to some C99... features (actually some other term comes in my mind for this instinctively, but it is another subject for discussion), there is library expansion with
32
49711
by: Wolfgang Draxinger | last post by:
I understand that it is perfectly possible to store UTF-8 strings in a std::string, however doing so can cause some implicaions. E.g. you can't count the amount of characters by length() | size(). Instead one has to iterate through the string, parse all UTF-8 multibytes and count each multibyte as one character. To address this problem the GTKmm bindings for the GTK+ toolkit have implemented a own string class Glib::ustring...
0
1738
by: Kevin Altis | last post by:
OSCON 2006: Opening Innovation http://conferences.oreillynet.com/os2006/ Save the date for the 8th annual O'Reilly Open Source Convention, happening July 24-28, 2006 at the Oregon Convention Center in beautiful Portland, Oregon. Call For Participation
0
1925
by: Kevin Altis | last post by:
OSCON 2006: Opening Innovation http://conferences.oreillynet.com/os2006/ Save the date for the 8th annual O'Reilly Open Source Convention, happening July 24-28, 2006 at the Oregon Convention Center in beautiful Portland, Oregon. Call For Participation
3
1970
by: Sektor van Skijlen | last post by:
Is there any official proposal for annotations in C++0x? So far annotations have been "implicitly" used in many proposals as some (usually) free-form text enclosed in ] (for example, n2493, n2509, n1943 - not used in later updates). Is there any consistent proposal for the general mechanism of annotations? -- // _ ___ Michal "Sektor" Malecki <sektor(whirl)kis.p.lodz.pl>
29
2111
by: Ioannis Vranos | last post by:
Hi, I am currently learning QT, a portable C++ framework which comes with both a commercial and GPL license, and which provides conversion operations to its various types to/from standard C++ types. For example its QString type provides a toWString() that returns a std::wstring with its Unicode contents. So, since wstring supports the largest character set, why do we need explicit Unicode types in C++?
0
8823
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9200
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9105
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7980
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6647
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5969
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4738
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3177
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2543
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.