467,166 Members | 1,070 Online
Bytes | Developer Community
Ask Question

Home New Posts Topics Members FAQ

Post your question to a community of 467,166 developers. It's quick & easy.

Should I prepare for Plane 18+ Unicode in my C++ program immediately?

SwissProgrammer
128KB
They did it again. The Unicode Consortium has again leaped forward in their listing complications.


"Should I prepare for Plane 18+ Unicode in my C++ program immediately?"


Many (but not all) of the most advanced countries in the world seem to program using either mostly ASCII or mostly the Basic Multilingual Plane (BMP) [Plane 0]. Plane 0 is 0000–​FFFF.

Comparing the BMP to only the current Unicode available range is like comparing a gnat to an elephant.


That Unicode range is expanding and it has been since 1991. If your software is not ready for it, and if you want to sell world-wide, then do some adjusting now. You can start here [X].
And don't tell me that using the back-door riddled/anti-security Visual Studio or .net will make you a Unicode capable programmer. You have to learn for yourself.
I am new at C++. It seems that many of the old timers in C++ missed this need. I asked Unicode related questions on other sites and got told off and ridiculed. That did not happen here, so I am sharing with you here.



A little bit of background:
Reference: "Biangbiang noodles" [X].

A quote from that page:

"The character's traditional and simplified forms were added to Unicode version 13.0 in March 2020 in the CJK Unified Ideographs Extension G block of the newly-allocated Tertiary Ideographic Plane. The corresponding Unicode characters are:

Traditional: U+30EDE 𰻞
Simplified: U+30EDD 𰻝"

I am working on making my C++11 program preemptively future tolerant by parsing text via UTF-8 in binary and have found that this one new character can be handled.
The character is 𰻞
When I put it into my code via Code::Blocks 17.12 I get "⿺*辶⿳穴⿰月⿰⿲⿱幺長⿱言馬⿱幺長刂心"

That is for one Unicode symbol. All of that.

"Complicated" does not seem to stop them. I think that they have included a symbol that is an entire sentence combined. If they are doing that, then it looks to me like 17 planes are not enough for them.



I can handle that in UTF-8.

My parsing of UTF-8 allows up to and including 6 Octets in case they go past 17 planes.

I think that your's should also.


The Unicode Consortium moves on adding more and more. They expanded their limits before. They might go past planes 0 to 16 with planes 17+. Your world wide selling software better be ready.




As many may have suspected, UTF-8 can handle far beyond the current 17 Planes of Unicode.

Consider the potential 6 byte capacity of UTF-8:

I have been instructed that this is not a valid way to think or program, but I would rather be future ready. If paying attention to the Unicode Consortium's activities suggests it, then maybe I should be ready for it.



Expand|Select|Wrap|Line Numbers
  1. //    In UTF-8
  2. //    Char. number range                 UTF-8 octet sequence
  3. //    (hexadecimal)                      (binary)
  4. //    Hex Min    Hex Max       Octets    Byte 1    Byte 2    Byte 3    Byte 4    Byte 5    Byte 6
  5. //    00000000   000007F       1         0xxxxxxx
  6. //    00000080   00007FF       2         110xxxxx  10xxxxxx
  7. //    00000600   000FFFF       3         1110xxxx  10xxxxxx  10xxxxxx
  8. //    00010000   010FFFF       4         11110xxx  10xxxxxx  10xxxxxx  10xxxxxx
  9. //    00200000   03FFFFF       5         111110xx  10xxxxxx  10xxxxxx  10xxxxxx  10xxxxxx
  10. //    04000000   7FFFFFF       6         1111110x  10xxxxxx  10xxxxxx  10xxxxxx  10xxxxxx  10xxxxxx
  11. //    The letter x indicates bits available for encoding bits of the character number.
That is far beyond the current Unicode listings.

But, the Unicode Consortium has increased it listings often.

That symbol for some kind of noodles seems to me to be like my writing an entire sentence and calling that sentence a symbol. I can accept that logic. Thus, a potential for more than 17 Planes of Unicode.

Returning to my question:
"Should I prepare for Plane 18+ Unicode in my C++ program immediately?"

I think yes.

I think that the military should get ready for it immediately and without delay. I think that commercial and industrial businesses should get ready for it soon. I think that game writers should program in capacity for Planes above 16 now.

I am thus doing that now.


You might want to go back to your code and if you are limiting your UTF-8 to less than 6 bytes per each parse/edit/etc. then expand it's capacity to work with a potential 6 bytes.


It would be nice if there was a strictly "Unicode topic" on this site that was for all programming languages as they dealt with Unicode concerns.


Valid constructive comments please.
3 Weeks Ago #1
  • viewed: 1234
Share:

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

8 posts views Thread by Bill Eldridge | last post: by
30 posts views Thread by aurora | last post: by
19 posts views Thread by Svennglenn | last post: by
11 posts views Thread by Jürgen Kahrs | last post: by
4 posts views Thread by David Pratt | last post: by
12 posts views Thread by Rafał Maj Raf256 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.