473,503 Members | 2,289 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Should I prepare for Plane 18+ Unicode in my C++ program immediately?

SwissProgrammer
220 New Member
They did it again. The Unicode Consortium has again leaped forward in their listing complications.


"Should I prepare for Plane 18+ Unicode in my C++ program immediately?"


Many (but not all) of the most advanced countries in the world seem to program using either mostly ASCII or mostly the Basic Multilingual Plane (BMP) [Plane 0]. Plane 0 is 0000–​FFFF.

Comparing the BMP to only the current Unicode available range is like comparing a gnat to an elephant.


That Unicode range is expanding and it has been since 1991. If your software is not ready for it, and if you want to sell world-wide, then do some adjusting now. You can start here [X].
And don't tell me that using the back-door riddled/anti-security Visual Studio or .net will make you a Unicode capable programmer. You have to learn for yourself.
I am new at C++. It seems that many of the old timers in C++ missed this need. I asked Unicode related questions on other sites and got told off and ridiculed. That did not happen here, so I am sharing with you here.



A little bit of background:
Reference: "Biangbiang noodles" [X].

A quote from that page:

"The character's traditional and simplified forms were added to Unicode version 13.0 in March 2020 in the CJK Unified Ideographs Extension G block of the newly-allocated Tertiary Ideographic Plane. The corresponding Unicode characters are:

Traditional: U+30EDE 𰻞
Simplified: U+30EDD 𰻝"

I am working on making my C++11 program preemptively future tolerant by parsing text via UTF-8 in binary and have found that this one new character can be handled.
The character is 𰻞
When I put it into my code via Code::Blocks 17.12 I get "⿺*辶⿳穴⿰月⿰⿲⿱幺長⿱言馬⿱幺長刂心"

That is for one Unicode symbol. All of that.

"Complicated" does not seem to stop them. I think that they have included a symbol that is an entire sentence combined. If they are doing that, then it looks to me like 17 planes are not enough for them.



I can handle that in UTF-8.

My parsing of UTF-8 allows up to and including 6 Octets in case they go past 17 planes.

I think that your's should also.


The Unicode Consortium moves on adding more and more. They expanded their limits before. They might go past planes 0 to 16 with planes 17+. Your world wide selling software better be ready.




As many may have suspected, UTF-8 can handle far beyond the current 17 Planes of Unicode.

Consider the potential 6 byte capacity of UTF-8:

I have been instructed that this is not a valid way to think or program, but I would rather be future ready. If paying attention to the Unicode Consortium's activities suggests it, then maybe I should be ready for it.



Expand|Select|Wrap|Line Numbers
  1. //    In UTF-8
  2. //    Char. number range                 UTF-8 octet sequence
  3. //    (hexadecimal)                      (binary)
  4. //    Hex Min    Hex Max       Octets    Byte 1    Byte 2    Byte 3    Byte 4    Byte 5    Byte 6
  5. //    00000000   000007F       1         0xxxxxxx
  6. //    00000080   00007FF       2         110xxxxx  10xxxxxx
  7. //    00000600   000FFFF       3         1110xxxx  10xxxxxx  10xxxxxx
  8. //    00010000   010FFFF       4         11110xxx  10xxxxxx  10xxxxxx  10xxxxxx
  9. //    00200000   03FFFFF       5         111110xx  10xxxxxx  10xxxxxx  10xxxxxx  10xxxxxx
  10. //    04000000   7FFFFFF       6         1111110x  10xxxxxx  10xxxxxx  10xxxxxx  10xxxxxx  10xxxxxx
  11. //    The letter x indicates bits available for encoding bits of the character number.
That is far beyond the current Unicode listings.

But, the Unicode Consortium has increased it listings often.

That symbol for some kind of noodles seems to me to be like my writing an entire sentence and calling that sentence a symbol. I can accept that logic. Thus, a potential for more than 17 Planes of Unicode.

Returning to my question:
"Should I prepare for Plane 18+ Unicode in my C++ program immediately?"

I think yes.

I think that the military should get ready for it immediately and without delay. I think that commercial and industrial businesses should get ready for it soon. I think that game writers should program in capacity for Planes above 16 now.

I am thus doing that now.


You might want to go back to your code and if you are limiting your UTF-8 to less than 6 bytes per each parse/edit/etc. then expand it's capacity to work with a potential 6 bytes.


It would be nice if there was a strictly "Unicode topic" on this site that was for all programming languages as they dealt with Unicode concerns.


Valid constructive comments please.
Dec 30 '20 #1
0 1689

Sign in to post your reply or Sign up for a free account.

Similar topics

8
5251
by: Bill Eldridge | last post by:
I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5,...
30
2713
by: aurora | last post by:
I have long find the Python default encoding of strict ASCII frustrating. For one thing I prefer to get garbage character than an exception. But the biggest issue is Unicode exception often pop up...
19
5628
by: Svennglenn | last post by:
I'm working on a program that is supposed to save different information to text files. Because the program is in swedish i have to use unicode text for ÅÄÖ letters. When I run the following...
11
21687
by: Jürgen Kahrs | last post by:
Hello, do you think that this file is a proper Unicode file? http://belnet.dl.sourceforge.net/sourceforge/ganttproject/ganttproject-example3.xml <?xml version="1.0" encoding="UTF-8"?> ......
29
4355
by: keredil | last post by:
Hi, Will the memory allocated by malloc get released when program exits? I guess it will since when the program exits, the OS will free all the memory (global, stack, heap) used by this...
4
2389
by: David Pratt | last post by:
Hi. I am working through some tutorials on unicode and am hoping that someone can help explain this for me. I am on mac platform using python 2.4.1 at the moment. I am experimenting with unicode...
12
9869
by: Rafał Maj Raf256 | last post by:
Hi, I have an UNICODE text file endcoded in UTF-8. I should store the UNICODE strings in my program for example in std::wstring right? To be able to work on them normally, so that std::wstring...
1
2082
by: Þ­¾¯ | last post by:
/************************************************** *** *** chrexec.c *** *This shit can be called from root or from any user (in that case executable * should have 06755 permisions) and should...
24
3340
by: Donn Ingle | last post by:
Hello, I hope someone can illuminate this situation for me. Here's the nutshell: 1. On start I call locale.setlocale(locale.LC_ALL,''), the getlocale. 2. If this returns "C" or anything...
2
2121
by: desertavataraz | last post by:
I am going write an application in C++ that allows the user to see two languages at once, and allows them to search each individual language for words or keywords. I have a font that I made...
0
7207
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7093
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7357
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7012
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7468
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5598
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
5023
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
1522
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
748
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.