"Should I prepare for Plane 18+ Unicode in my C++ program immediately?"
Many (but not all) of the most advanced countries in the world seem to program using either mostly ASCII or mostly the Basic Multilingual Plane (BMP) [Plane 0]. Plane 0 is 0000–FFFF.
Comparing the BMP to only the current Unicode available range is like comparing a gnat to an elephant.
That Unicode range is expanding and it has been since 1991. If your software is not ready for it, and if you want to sell world-wide, then do some adjusting now. You can start here [X].
And don't tell me that using the back-door riddled/anti-security Visual Studio or .net will make you a Unicode capable programmer. You have to learn for yourself.I am new at C++. It seems that many of the old timers in C++ missed this need. I asked Unicode related questions on other sites and got told off and ridiculed. That did not happen here, so I am sharing with you here.
A little bit of background:
Reference: "Biangbiang noodles" [X].The character is 𰻞
A quote from that page:
"The character's traditional and simplified forms were added to Unicode version 13.0 in March 2020 in the CJK Unified Ideographs Extension G block of the newly-allocated Tertiary Ideographic Plane. The corresponding Unicode characters are:
Traditional: U+30EDE 𰻞
Simplified: U+30EDD 𰻝"
I am working on making my C++11 program preemptively future tolerant by parsing text via UTF-8 in binary and have found that this one new character can be handled.
When I put it into my code via Code::Blocks 17.12 I get "⿺*辶⿳穴⿰月⿰⿲⿱幺長⿱言馬⿱幺長刂心"
That is for one Unicode symbol. All of that.
"Complicated" does not seem to stop them. I think that they have included a symbol that is an entire sentence combined. If they are doing that, then it looks to me like 17 planes are not enough for them.
I can handle that in UTF-8.
My parsing of UTF-8 allows up to and including 6 Octets in case they go past 17 planes.
I think that your's should also.
The Unicode Consortium moves on adding more and more. They expanded their limits before. They might go past planes 0 to 16 with planes 17+. Your world wide selling software better be ready.
As many may have suspected, UTF-8 can handle far beyond the current 17 Planes of Unicode.
Consider the potential 6 byte capacity of UTF-8:
I have been instructed that this is not a valid way to think or program, but I would rather be future ready. If paying attention to the Unicode Consortium's activities suggests it, then maybe I should be ready for it.
Expand|Select|Wrap|Line Numbers
- // In UTF-8
- // Char. number range UTF-8 octet sequence
- // (hexadecimal) (binary)
- // Hex Min Hex Max Octets Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6
- // 00000000 000007F 1 0xxxxxxx
- // 00000080 00007FF 2 110xxxxx 10xxxxxx
- // 00000600 000FFFF 3 1110xxxx 10xxxxxx 10xxxxxx
- // 00010000 010FFFF 4 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
- // 00200000 03FFFFF 5 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
- // 04000000 7FFFFFF 6 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
- // The letter x indicates bits available for encoding bits of the character number.
But, the Unicode Consortium has increased it listings often.
That symbol for some kind of noodles seems to me to be like my writing an entire sentence and calling that sentence a symbol. I can accept that logic. Thus, a potential for more than 17 Planes of Unicode.
Returning to my question:
"Should I prepare for Plane 18+ Unicode in my C++ program immediately?"
I think yes.
I think that the military should get ready for it immediately and without delay. I think that commercial and industrial businesses should get ready for it soon. I think that game writers should program in capacity for Planes above 16 now.
I am thus doing that now.
You might want to go back to your code and if you are limiting your UTF-8 to less than 6 bytes per each parse/edit/etc. then expand it's capacity to work with a potential 6 bytes.
It would be nice if there was a strictly "Unicode topic" on this site that was for all programming languages as they dealt with Unicode concerns.
Valid constructive comments please.