Phil Endecott wrote:
Ioannis Vranos wrote:
>Based on a discussion about Unicode in clc++ inside a discussion thread
with subject "next ISO C++ standard", and the data provided in
http://en.wikipedia.org/wiki/C%2B%2B0x , and with the design ideals:
1. To provide Unicode support in C++0x always and explicitly.
2. To provide support to all Unicode sets out there.
I think the implementation of these as:
a) char, char16_t and char32_t types.
b) built-in Unicode literals.
should become:
I) Library, implementation defined types like utf8_char, utf16_char, and
utf32_char, leaving alone and not polluting the existing built in types
like char for now and in the future.
The problem is that if the library does something like this:
typedef uint32_t char32_t;
then when I write
char32_t c = L'a';
cout << c;
It will output c as "64", not 'c', because the overloading of operator<<
can't detect the typedef.
Well, then the library should not do that typedef and operator<< of cout
should be implemented to work with the provided character type.
The library could implement a char32_t like
class char32_t {
uint32_t impl;
....
};
but that has its own problems. It all works OK if these are built-in
types.
If your above type suggestion is not possible to be implemented, why not
focusing on providing language tools that make it possible instead?
>
>II) Leave b) as it is.
So if I write a UTF-16 literal using the built-in literal syntax, what
is its type? It has to be a built-in type, not a library type.
It can be a library type. AFAIK a built-in type can also look like a
library type, if it is hidden when the equivalent header is not #included.
In any case my main point of my "correction" proposal, is that the C++
built-in types should not be tied with a specific character encoding system.
Consider the possibility if after some years, a now non-existent, new
character system becomes the dominant one, while C++ built in types are
tied with Unicode.
Having any specific character system provided as a library extension
(implementation-defined type), C++ will have the flexibility to adapt to
new character systems that will emerge in the future without messing
with its built in types.
The same way math-specific types should not become built-in in C++ but
as library extensions, I think the same should happen with character
systems, regular expressions etc.
So as another example, although probably not needed in standard C++,
let's consider adding EBCDIC support explicitly as a library extension.
Something like:
#include <whatever>
// ...
std::ebcdic_char *p= EB"This is a text";
std::ebcdic char c= EB'c';
This style can work for whatever character type system. UTF8, UTF16,
UTF32 whatever.
I think tiying any specific character system with built in types, is
Java-style approach (like C#/.NET etc.) which is a whole framework and
not a programming language alone, and can be changed at will.
Apart from this, I also think that wchar_t should be the largest
character system a specific compiler provides, so for example if a
compiler provides UTF32 as its largest character type, for this compiler
wchar_t should be equivalent with the UTF32 character type of this
compiler.