Should I prepare for Plane 18+ Unicode in my C++ program immediately?

220 New Member

They did it again. The Unicode Consortium has again leaped forward in their listing complications.

"Should I prepare for Plane 18+ Unicode in my C++ program immediately?"

Many (but not all) of the most advanced countries in the world seem to program using either mostly ASCII or mostly the Basic Multilingual Plane (BMP) [Plane 0]. Plane 0 is 0000–FFFF.

Comparing the BMP to only the current Unicode available range is like comparing a gnat to an elephant.

That Unicode range is expanding and it has been since 1991. If your software is not ready for it, and if you want to sell world-wide, then do some adjusting now. You can start here [X].

And don't tell me that using the back-door riddled/anti-security Visual Studio or .net will make you a Unicode capable programmer. You have to learn for yourself.

I am new at C++. It seems that many of the old timers in C++ missed this need. I asked Unicode related questions on other sites and got told off and ridiculed. That did not happen here, so I am sharing with you here.

A little bit of background:

Reference: "Biangbiang noodles" [X].

A quote from that page:

"The character's traditional and simplified forms were added to Unicode version 13.0 in March 2020 in the CJK Unified Ideographs Extension G block of the newly-allocated Tertiary Ideographic Plane. The corresponding Unicode characters are:

Traditional: U+30EDE 𰻞
Simplified: U+30EDD 𰻝"

I am working on making my C++11 program preemptively future tolerant by parsing text via UTF-8 in binary and have found that this one new character can be handled.

The character is 𰻞
When I put it into my code via Code::Blocks 17.12 I get "⿺*辶⿳穴⿰月⿰⿲⿱幺長⿱言馬⿱幺長刂心"

That is for one Unicode symbol. All of that.

"Complicated" does not seem to stop them. I think that they have included a symbol that is an entire sentence combined. If they are doing that, then it looks to me like 17 planes are not enough for them.

I can handle that in UTF-8.

My parsing of UTF-8 allows up to and including 6 Octets in case they go past 17 planes.

I think that your's should also.

The Unicode Consortium moves on adding more and more. They expanded their limits before. They might go past planes 0 to 16 with planes 17+. Your world wide selling software better be ready.

As many may have suspected, UTF-8 can handle far beyond the current 17 Planes of Unicode.

Consider the potential 6 byte capacity of UTF-8:

I have been instructed that this is not a valid way to think or program, but I would rather be future ready. If paying attention to the Unicode Consortium's activities suggests it, then maybe I should be ready for it.

Expand|Select|Wrap|Line Numbers

 //    In UTF-8

//    Char. number range                 UTF-8 octet sequence

//    (hexadecimal)                      (binary)

//    Hex Min    Hex Max       Octets    Byte 1    Byte 2    Byte 3    Byte 4    Byte 5    Byte 6

//    00000000   000007F       1         0xxxxxxx

//    00000080   00007FF       2         110xxxxx  10xxxxxx

//    00000600   000FFFF       3         1110xxxx  10xxxxxx  10xxxxxx

//    00010000   010FFFF       4         11110xxx  10xxxxxx  10xxxxxx  10xxxxxx

//    00200000   03FFFFF       5         111110xx  10xxxxxx  10xxxxxx  10xxxxxx  10xxxxxx

//    04000000   7FFFFFF       6         1111110x  10xxxxxx  10xxxxxx  10xxxxxx  10xxxxxx  10xxxxxx

//    The letter x indicates bits available for encoding bits of the character number.

That is far beyond the current Unicode listings.

But, the Unicode Consortium has increased it listings often.

That symbol for some kind of noodles seems to me to be like my writing an entire sentence and calling that sentence a symbol. I can accept that logic. Thus, a potential for more than 17 Planes of Unicode.

Returning to my question:
"Should I prepare for Plane 18+ Unicode in my C++ program immediately?"

I think yes.

I think that the military should get ready for it immediately and without delay. I think that commercial and industrial businesses should get ready for it soon. I think that game writers should program in capacity for Planes above 16 now.

I am thus doing that now.

You might want to go back to your code and if you are limiting your UTF-8 to less than 6 bytes per each parse/edit/etc. then expand it's capacity to work with a potential 6 bytes.

It would be nice if there was a strictly "Unicode topic" on this site that was for all programming languages as they dealt with Unicode concerns.

Valid constructive comments please.

Dec 30 '20 #1

Subscribe Reply

1689

Similar topics

5251

Unicode from Web to MySQL

by: Bill Eldridge | last post by:

I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5,...

Python

2713

unicode encoding usablilty problem

by: aurora | last post by:

I have long find the Python default encoding of strict ASCII frustrating. For one thing I prefer to get garbage character than an exception. But the biggest issue is Unicode exception often pop up...

Python

5628

Trouble saving unicode text to file

by: Svennglenn | last post by:

I'm working on a program that is supposed to save different information to text files. Because the program is in swedish i have to use unicode text for ÅÄÖ letters. When I run the following...

Python

21687

Umlaut characters in Unicode

by: Jürgen Kahrs | last post by:

Hello, do you think that this file is a proper Unicode file? http://belnet.dl.sourceforge.net/sourceforge/ganttproject/ganttproject-example3.xml <?xml version="1.0" encoding="UTF-8"?> ......

.NET Framework

4355

will the memory allocated by malloc get released when program exits?

by: keredil | last post by:

Hi, Will the memory allocated by malloc get released when program exits? I guess it will since when the program exits, the OS will free all the memory (global, stack, heap) used by this...

C / C++

2389

Unicode Question

by: David Pratt | last post by:

Hi. I am working through some tutorials on unicode and am hoping that someone can help explain this for me. I am on mac platform using python 2.4.1 at the moment. I am experimenting with unicode...

Python

9869

c++ support for unicode, utf-8, encode/decode, ifstream, wstream?

by: RafaÅ‚ Maj Raf256 | last post by:

Hi, I have an UNICODE text file endcoded in UTF-8. I should store the UNICODE strings in my program for example in std::wstring right? To be able to work on them normally, so that std::wstring...

C / C++

2082

this code should chroot and exec program but fails

by: Þ¾¯ | last post by:

/************************************************** *** *** chrexec.c *** *This shit can be called from root or from any user (in that case executable * should have 06755 permisions) and should...

C / C++

3340

LANG, locale, unicode, setup.py and Debian packaging

by: Donn Ingle | last post by:

Hello, I hope someone can illuminate this situation for me. Here's the nutshell: 1. On start I call locale.setlocale(locale.LC_ALL,''), the getlocale. 2. If this returns "C" or anything...

Python

2121

Unicode fonts vs Seperate Self-Made Font

by: desertavataraz | last post by:

I am going write an application in C++ that allows the user to see two languages at once, and allows them to search each individual language for words or keywords. I have a font that I made...

C / C++

7207

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

7093

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

7357

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

7012

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

7468

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

5598

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

5023

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

1522

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

748

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP