Phil Endecott wrote:
Quote:
Ioannis Vranos wrote: Quote:
>Based on a discussion about Unicode in clc++ inside a discussion thread
>with subject "next ISO C++ standard", and the data provided in
>http://en.wikipedia.org/wiki/C%2B%2B0x , and with the design ideals:
>>
>1. To provide Unicode support in C++0x always and explicitly.
>2. To provide support to all Unicode sets out there.
>>
>>
>I think the implementation of these as:
>>
>a) char, char16_t and char32_t types.
>b) built-in Unicode literals.
>>
>should become:
>>
>I) Library, implementation defined types like utf8_char, utf16_char, and
>utf32_char, leaving alone and not polluting the existing built in types
>like char for now and in the future.
| >
The problem is that if the library does something like this:
>
typedef uint32_t char32_t;
>
then when I write
>
char32_t c = L'a';
cout << c;
>
It will output c as "64", not 'c', because the overloading of operator<<
can't detect the typedef.
|
Well, then the library should not do that typedef and operator<< of cout
should be implemented to work with the provided character type.
Quote:
The library could implement a char32_t like
>
class char32_t {
uint32_t impl;
....
};
>
but that has its own problems. It all works OK if these are built-in
types.
|
If your above type suggestion is not possible to be implemented, why not
focusing on providing language tools that make it possible instead?
Quote:
>>
So if I write a UTF-16 literal using the built-in literal syntax, what
is its type? It has to be a built-in type, not a library type.
|
It can be a library type. AFAIK a built-in type can also look like a
library type, if it is hidden when the equivalent header is not #included.
In any case my main point of my "correction" proposal, is that the C++
built-in types should not be tied with a specific character encoding system.
Consider the possibility if after some years, a now non-existent, new
character system becomes the dominant one, while C++ built in types are
tied with Unicode.
Having any specific character system provided as a library extension
(implementation-defined type), C++ will have the flexibility to adapt to
new character systems that will emerge in the future without messing
with its built in types.
The same way math-specific types should not become built-in in C++ but
as library extensions, I think the same should happen with character
systems, regular expressions etc.
So as another example, although probably not needed in standard C++,
let's consider adding EBCDIC support explicitly as a library extension.
Something like:
#include <whatever>
// ...
std::ebcdic_char *p= EB"This is a text";
std::ebcdic char c= EB'c';
This style can work for whatever character type system. UTF8, UTF16,
UTF32 whatever.
I think tiying any specific character system with built in types, is
Java-style approach (like C#/.NET etc.) which is a whole framework and
not a programming language alone, and can be changed at will.
Apart from this, I also think that wchar_t should be the largest
character system a specific compiler provides, so for example if a
compiler provides UTF32 as its largest character type, for this compiler
wchar_t should be equivalent with the UTF32 character type of this
compiler.