C++0x two Unicode proposals. A correction one and a different one

Ioannis Vranos

Based on a discussion about Unicode in clc++ inside a discussion thread
with subject "next ISO C++ standard", and the data provided in
http://en.wikipedia.org/wiki/C%2B%2B0x , and with the design ideals:

1. To provide Unicode support in C++0x always and explicitly.
2. To provide support to all Unicode sets out there.
I think the implementation of these as:

a) char, char16_t and char32_t types.
b) built-in Unicode literals.

should become:

I) Library, implementation defined types like utf8_char, utf16_char, and
utf32_char, leaving alone and not polluting the existing built in types
like char for now and in the future.

II) Leave b) as it is.
In this way, the built in types are not polluted with additional
ever-growing list of UTFs, while in the future the old ones can easily
be deprecated/obsoleted in the library. The pollution of an ever growing
list of UTF characters and literals will be minimal.

Also I think this UTF implementation change will cause minimal change in
the existing C++0x.

---------------------------------------------------------------------------
My second thought on this, is that Unicode support should also become
optional. This will further decrease pollution of built in types and
string literals. An implementation should be able to choose whether it
will support Unicode and which one.

Jan 17 '08 #1

Subscribe Post Reply

1966

Phil Endecott

Ioannis Vranos wrote:

Based on a discussion about Unicode in clc++ inside a discussion thread
with subject "next ISO C++ standard", and the data provided in
http://en.wikipedia.org/wiki/C%2B%2B0x , and with the design ideals:

1. To provide Unicode support in C++0x always and explicitly.
2. To provide support to all Unicode sets out there.
I think the implementation of these as:

a) char, char16_t and char32_t types.
b) built-in Unicode literals.

should become:

I) Library, implementation defined types like utf8_char, utf16_char, and
utf32_char, leaving alone and not polluting the existing built in types
like char for now and in the future.

The problem is that if the library does something like this:

typedef uint32_t char32_t;

then when I write

char32_t c = L'a';
cout << c;

It will output c as "64", not 'c', because the overloading of operator<<
can't detect the typedef.

The library could implement a char32_t like

class char32_t {
uint32_t impl;
....
};

but that has its own problems. It all works OK if these are built-in types.

II) Leave b) as it is.

So if I write a UTF-16 literal using the built-in literal syntax, what
is its type? It has to be a built-in type, not a library type.
Phil.

Jan 17 '08 #2

Ioannis Vranos

Phil Endecott wrote:

Ioannis Vranos wrote:
>Based on a discussion about Unicode in clc++ inside a discussion thread
with subject "next ISO C++ standard", and the data provided in
http://en.wikipedia.org/wiki/C%2B%2B0x , and with the design ideals:

1. To provide Unicode support in C++0x always and explicitly.
2. To provide support to all Unicode sets out there.
I think the implementation of these as:

a) char, char16_t and char32_t types.
b) built-in Unicode literals.

should become:

I) Library, implementation defined types like utf8_char, utf16_char, and
utf32_char, leaving alone and not polluting the existing built in types
like char for now and in the future.

The problem is that if the library does something like this:

typedef uint32_t char32_t;

then when I write

char32_t c = L'a';
cout << c;

It will output c as "64", not 'c', because the overloading of operator<<
can't detect the typedef.

Well, then the library should not do that typedef and operator<< of cout
should be implemented to work with the provided character type.

The library could implement a char32_t like

class char32_t {
uint32_t impl;
....
};

but that has its own problems. It all works OK if these are built-in
types.

If your above type suggestion is not possible to be implemented, why not
focusing on providing language tools that make it possible instead?

>
>II) Leave b) as it is.

So if I write a UTF-16 literal using the built-in literal syntax, what
is its type? It has to be a built-in type, not a library type.

It can be a library type. AFAIK a built-in type can also look like a
library type, if it is hidden when the equivalent header is not #included.

In any case my main point of my "correction" proposal, is that the C++
built-in types should not be tied with a specific character encoding system.

Consider the possibility if after some years, a now non-existent, new
character system becomes the dominant one, while C++ built in types are
tied with Unicode.

Having any specific character system provided as a library extension
(implementation-defined type), C++ will have the flexibility to adapt to
new character systems that will emerge in the future without messing
with its built in types.

The same way math-specific types should not become built-in in C++ but
as library extensions, I think the same should happen with character
systems, regular expressions etc.

So as another example, although probably not needed in standard C++,
let's consider adding EBCDIC support explicitly as a library extension.

Something like:

#include <whatever>

// ...
std::ebcdic_char *p= EB"This is a text";
std::ebcdic char c= EB'c';
This style can work for whatever character type system. UTF8, UTF16,
UTF32 whatever.

I think tiying any specific character system with built in types, is
Java-style approach (like C#/.NET etc.) which is a whole framework and
not a programming language alone, and can be changed at will.
Apart from this, I also think that wchar_t should be the largest
character system a specific compiler provides, so for example if a
compiler provides UTF32 as its largest character type, for this compiler
wchar_t should be equivalent with the UTF32 character type of this
compiler.

Jan 17 '08 #3

by: Jenta | last post by:

A World Beyond Capitalism 2005, An Annual International Multiracial Alliance Building Peace Conference Is Accepting Proposals... ...and Online Registration is now available if you plan to table...

Python

Accepted Summer of Code proposals

by: A.M. Kuchling | last post by:

For anyone who's interested: the Python wiki now contains a list of the PSF-mentored proposals that were accepted for Google's Summer of Code: http://wiki.python.org/moin/SummerOfCode --amk

Python

Adobe GoLive 6 - Nasty feature with UTF-8 encoding

by: Zenobia | last post by:

Recently I was editing a document in GoLive 6. I like GoLive because it has some nice features such as: * rewrite source code * check syntax * global search & replace (through several files at...

HTML / CSS

C++ danger to break due to its weight, fragmentation danger - C++0x

by: Ioannis Vranos | last post by:

I would like to see your views on these. C++98 is already a large language since it supports 4 paradigms and each one is supported well, with optimal space and time efficiency. And this is...

C / C++

std::string vs. Unicode UTF-8

by: Wolfgang Draxinger | last post by:

I understand that it is perfectly possible to store UTF-8 strings in a std::string, however doing so can cause some implicaions. E.g. you can't count the amount of characters by length() | size()....

C / C++

ANNOUNCE: OSCON 2006 (Python 14 Conference) Call for Proposals

by: Kevin Altis | last post by:

OSCON 2006: Opening Innovation http://conferences.oreillynet.com/os2006/ Save the date for the 8th annual O'Reilly Open Source Convention, happening July 24-28, 2006 at the Oregon Convention...

Python

ANNOUNCE: OSCON 2006 (Python 14 Conference) Proposals due Feb. 13th

by: Kevin Altis | last post by:

OSCON 2006: Opening Innovation http://conferences.oreillynet.com/os2006/ Save the date for the 8th annual O'Reilly Open Source Convention, happening July 24-28, 2006 at the Oregon Convention...

Python

C++0x [[annotations]]

by: Sektor van Skijlen | last post by:

Is there any official proposal for annotations in C++0x? So far annotations have been "implicitly" used in many proposals as some (usually) free-form text enclosed in ] (for example, n2493,...

C / C++

The need of Unicode types in C++0x

by: Ioannis Vranos | last post by:

Hi, I am currently learning QT, a portable C++ framework which comes with both a commercial and GPL license, and which provides conversion operations to its various types to/from standard C++...

C / C++

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

C++0x two Unicode proposals. A correction one and a different one

Similar topics