469,573 Members | 1,704 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,573 developers. It's quick & easy.

What are the Minimum requirements for all 17 planes of Unicode in C++?

SwissProgrammer
213 128KB
What are the Minimum requirements for all 17 planes of Unicode in C++?


In YOUR EXPERIENCE !
(Sometimes official descriptions have not been accurate. I want experienced answers.)

C++0 to C++20; Which version is the minimum that can work with ALL of the 17 planes without requiring a work-around or a third party dll?
I am not interested in Visual Studio or .net . I used these in the past and I am aware that they are powerful but I specifically do not want them now. Just C++.
In case you might not know what Unicode "planes" are, see https://en.wikipedia.org/wiki/Plane_%28Unicode%29


I am currently only using plane 0 and C++11. I want to be able to use all of the planes 0-16. I want to know the minimum requirements in C++.


Thank you.
Oct 27 '20 #1

✓ answered by SioSio

Correspondence between UTF-8 bytes and code point range.
The number of bytes when a character is encoded in UTF-8 can be derived as follows.

For 1 byte, the bit pattern is 0xxx xxxx.
The number of valid bits is 7, the maximum value is 0111 1111 = 0x7F.
The range to be represented is 0x0000 to 0x007F (within ASCII range)
For 2 bytes, the bit pattern is 110x xxxx 10xx xxxx.
The effective number of bits is 5 + 6 = 11, the maximum value is 0111 1111 1111 = 0x7FF.
The range to represent is 0x0080 ~ 0x07FF.
For 3 bytes, the bit pattern is 1110 xxxx 10xx xxxx 10xx xxxx.
The number of valid bits is 4 + 6 + 6 = 16 (exactly 2 bytes), and the maximum value is 0xFFFF.
The range to be represented is all the remaining characters in the UCS-2 range.
4 bytes are only the surrogate pair part (surface 00 to surface 10H).

In other words, since the number of continuous bytes can be known from the bit pattern of the first byte, it can be divided character by character by using this.

33 11205
Banfa
9,064 Expert Mod 8TB
I am partly replying because I'd like to know the answer if someone else replies and partly because I'm not sure you are asking the right question.

Support for Unicode is not a programming language or version matter but rather it is related to the execution environment supported character encoding.

Execution environment and source code (or build environment) character sets can be different, although they often aren't in the case of building and executing on the same platform.

So take this example program

Expand|Select|Wrap|Line Numbers
  1. int main()
  2. {
  3.     wchar_t c = '\u0444';
  4.  
  5.     cout << "cout: ф" << endl;
  6.     cout << "cout: " << u8"\u0444" << endl;
  7.     cout << "cout: " << c << endl;
  8.  
  9.     wcout << "wcout: ф" << endl << flush;
  10.     wcout << "wcout: " << u8"\u0444" << endl << flush;
  11.     wcout << "wcout: " << c << endl << flush;
  12.  
  13.     return 0;
  14. }
Compiled as C++14 and run in Power Shell I get the following output

Expand|Select|Wrap|Line Numbers
  1. cout: 
  2. cout: 
  3. cout: d184
  4. wcout: 
Because Power Shell does not understand Unicode characters; run this command in the Power Shell

Expand|Select|Wrap|Line Numbers
  1. $OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding
To tell Power Shell to use UTF8; re-run the same program without recompilation and you get this output

Expand|Select|Wrap|Line Numbers
  1. cout: ф
  2. cout: ф
  3. cout: d184
  4. wcout:
Recompile the program using C++98 and you get this output
Expand|Select|Wrap|Line Numbers
  1. cout: ф
  2. cout: ф
  3. cout: 53636
  4. wcout:
The only thing that has changed is the wchar_t variable is being displayed in decimal instead of hexidecimal.

Support for Unicode is depressingly non-standard across platforms so it is hard to write portable code using Unicode.

P.S. I have no idea why only 1 of the 3 wcout lines is producing output in all cases.
Oct 27 '20 #2
Banfa
9,064 Expert Mod 8TB
Sorry that post didn't even try to answer the question, the point I was trying to make was, ignoring your problem of how to put plane 1+ Unicode characters into standard C++ code, outputting them requires a system that understands them.

Actually using them in code is complicated by standard C++ only really having support for UTF8 (and in theory that only came in with C++11) so outputting a character from plane 1+ then becomes rather painful because, for example character U+1FA0F (Black King rotate from plane 1), even if the system understands this plane and can display it, which isn't a given, in standard C++ your only option would be to use UTF8 encoding which looks something like u8"\xF0\x9f\xA8\x8F" which has to be looked up by hand and is a pain to type and because I haven't got an environment that knows how to interpret it I don't even know if it is correct.

I realise this post also doesn't answer the question, (still hoping someone else can) but at least it doesn't answer the question as opposed to not answering a different question.
Oct 27 '20 #3
SwissProgrammer
213 128KB
Banfa,

I did not want to lead someone with an answer, but your answer says close to what I have found.
UTF-8, in my opinion, is the most universal of the UTF options. I have not found any limit to the expandability of the UTF-8 encoding.
UTF-8, if my memory is correct, was what I was using back in Windows 2000 and (I think) in Windows NT. But, I was not programming in C++ at that time. Therefore, I thought to ask the local C++ experts here.

Before the Unicode consortium expanded their published scope past plane 0, I used Unicode a lot. Currently, as I transition into C++11, my coding time is greatly limited by my struggling through the learning curve. I was wanting someone with experience in the planes above 0 to speak to the issues encountered.

Your response, though you might not think it was so appropriate, I enjoyed.

Thank you.



One, but not the only, goal that I have with C++11 and Unicode is to be able to have a text box in which someone pastes a Unicode character or sentence and my program automatically shows in another text box the Unicode representation of that input:

Example input 办法 .

My program would show the UTF-8 encoding for that and maybe even split it apart into the two words that it contains 办 (ban) , and 法 (fa), each with their own UTF8 encoding.

But, first I wanted to know more about what I am dealing with in C++. Am I using at least the C++ version that can give me that response? Am I using the version that can give me the correct response for every plane?

So, I started with the simplest of the questions: Am I using the minimum version of C++ that can do all of this.

Later, I might have struggled to get the example to work, being confident that the C++ version was capable of doing the job.

Again, Thank you Banfa.

.
Oct 27 '20 #4
dev7060
516 Expert 512MB
P.S. I have no idea why only 1 of the 3 wcout lines is producing output in all cases.
Expand|Select|Wrap|Line Numbers
  1. int main() {
  2.   wchar_t c = '\u0444';
  3.   wcout << "wcout: ф" << endl << flush;
  4.   if (wcout.fail()) {
  5.     cout << "\nwide to narrow conversion didn't succeed; Unicode is not representable in the codepage";
  6.     cout << endl;
  7.     wcout << "\nThis won't get printed. Other wcouts don't have any effect at this point";
  8.     wcout.clear();
  9.   }
  10.   wcout << "wcout: " << u8"\u0444" << endl << flush;
  11.   if (wcout.fail()) {
  12.     cout << "\nattempt #2 didn't succeed";
  13.     cout << endl;
  14.     wcout << "not shown on the console";
  15.     wcout.clear();
  16.     wcout << "hello user\n";
  17.   }
  18.   wcout << "wcout: " << c << endl << flush;
  19.   wcout << "not available on the console as well ";
  20.   return 0;
  21. }
  22.  
also,
A program should not mix output operations on wcout with output operations on cout
(or with other narrow-oriented output operations on stdout): Once an output operation has
been performed on either, the standard output stream acquires an
orientation (either narrow or wide) that can only be safely changed by calling freopen on stdout.
https://www.cplusplus.com/reference/iostream/wcout/

In YOUR EXPERIENCE !
(Sometimes official descriptions have not been accurate. I want experienced answers.)
Disclaimer: As you specifically asked for an experienced answer, I'm a student and not experienced at all when it comes to professional development. The below is just how I view it with my understanding.

My understanding of Unicode is that it isn't concerned with a language, as pointed out by Banfa. Every system or environment has kind of its way of dealing with it, has its own character set, and uses workarounds to set up compatibilities with others for exchanging the data. It depends on how encoding is done; what code points are being used, how many bytes for a character, which two code points are combined to represent a new character, what byte order, endian system, etc. Mapping is implementation-dependent.

Here's a char : 🮕 (it is not showing up on my screen, just copied a random off of Wikipedia)
In JS console,
Expand|Select|Wrap|Line Numbers
  1. console.log("🮕".length) 
shows the output 2.
In PHP,
Expand|Select|Wrap|Line Numbers
  1. echo strlen("🮕")
shows the output 4.

One solution is to use a fixed-length encoding across everything like UTF-32 that uses 4 bytes per code point. the con is that it's space inefficient. Imagine a 5 bytes character array in UTF-8 taking 20 bytes in the UTF-32 representation. literally a mess on a larger scale. ASCII's representation will have many leading 0s consuming memory for no reason. Variable-length encoding like in UTF-8 or UTF-16 allocates memory bytes according to the needs and situation.

Let's say you write a program in an ide. You converted the encodings between char* and wchar_t* back and forth in between function calls. The libraries would process em (or not?) all using the implemented mappings, but the output produced on the terminal may show undefined behavior because it may be supporting the encoding and mappings of the OS. Whatever representation code is sent by our binary to show may not be available in the character map of the OS to produce a relevant output. Windows have UTF-16 implementation hence apps use the same. If you run the same code in Unix or Linux, the output may be different (in UTF-8).

For having uniformity; I guess the engine, language, os, environment, third party compilers, linkers, ide, libraries, dependencies, databases, binaries, web connections, etc. (whatever is interacting with your encoded data in between) all have to agree on a common set of rules to represent the chars; which would be a hypothetical concept (maybe). I mean, for example, to communicate over the networks, you'd need maximum compression for the fast travel of the packets, hence would choose a variable size encoding. And if you see Java, it stores data as UTF-16 internally and on the other hand, UTF-16 is not used in internet websites because it's incompatible with ASCII. Workarounds seem to be the only solution to build the bridge and for the devs; trial and error if you don't know how the encodings are being done in a system. For example, Java docs states clearly:

The Java programming language is based on the Unicode character set, and several libraries implement the Unicode standard. Unicode is an international character set standard which supports all of the major scripts of the world, as well as common technical symbols. The original Unicode specification defined characters as fixed-width 16-bit entities, but the Unicode standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF. An encoding defined by the standard, UTF-16, allows to represent all Unicode code points using one or two 16-bit units.
The primitive data type char in the Java programming language is an unsigned 16-bit integer that can represent a Unicode code point in the range U+0000 to U+FFFF, or the code units of UTF-16. The various types and classes in the Java platform that represent character sequences - char[], implementations of java.lang.CharSequence (such as the String class), and implementations of java.text.CharacterIterator - are UTF-16 sequences. Most Java source code is written in ASCII, a 7-bit character encoding, or ISO-8859-1, an 8-bit character encoding, but is translated into UTF-16 before processing.
Ref: https://www.oracle.com/technical-res...lementary.html
https://docs.oracle.com/javase/8/doc.../overview.html
I've used java references just for the demonstration of a system.

One, but not the only, goal that I have with C++11 and Unicode is to be able to have a text box in which someone pastes a Unicode character or sentence and my program automatically shows in another text box the Unicode representation of that input:
Here's my guess: Whatever gui library you're using is probably making calls to winapi behind the scenes and using the OS's layout and character set to display everything. When you paste something in the text box if the OS's encoding character set couldn't map it, you probably won't see it properly in the text box field in the first place. Even though you may be able to pass the character to the called event handlers' functions of the library and the background processing may interpret everything correctly (or not), but you have to depend on an external OS to see through the output i.e. you need an external environment to interact with your app anyway same as you need a third party compiler (mingw, gcc, etc.) to process the text and produce the binaries. That's where workarounds come into play. You need a way to make the text recognizable to be displayed properly using third part libs or your logic if you can figure out how stuff is happening behind the scenes.
Oct 28 '20 #5
SwissProgrammer
213 128KB
dev7060,

You pointed directly to the issues. I agree.

Maybe if I get help one tiny step at a time.

I do not yet know how to make a Unicode capable, drag-n-drop text box in C++11.

I would like to be able to input to, and to read from, that text box with both wchar_t* and TCHAR*.

Help.

4 hours later (I told you I am new at this) I have the following.

Expand|Select|Wrap|Line Numbers
  1.  
  2. #define _UNICODE
  3. #define UNICODE
  4.  
  5. #include <windows.h>
  6.  
  7. #include<iostream>
  8. #include <string>
  9. using namespace std;
  10.  
  11. #define MAX_LOADSTRING 100
  12.  
  13. HWND Handle_Main_Window = NULL;
  14.  
  15. #include <windows.h>
  16.  
  17. void create_controls( const HWND hwnd );
  18.  
  19.  
  20. LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam);
  21.  
  22. wchar_t g_szClassName[] = L"myWindowClass";
  23.  
  24. int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
  25.     LPSTR lpCmdLine, int nCmdShow)
  26. {
  27.     WNDCLASSEX wc;
  28.     MSG Msg;
  29.  
  30.     wc.cbSize        = sizeof(WNDCLASSEX);
  31.     wc.style         = 0;
  32.     wc.lpfnWndProc   = WndProc;
  33.     wc.cbClsExtra    = 0;
  34.     wc.cbWndExtra    = 0;
  35.     wc.hInstance     = hInstance;
  36.     wc.hIcon         = LoadIcon(nullptr, IDI_APPLICATION);
  37.     wc.hCursor       = LoadCursor(nullptr, IDC_ARROW);
  38.     wc.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
  39.     wc.lpszMenuName  = nullptr;
  40.     wc.lpszClassName = g_szClassName;
  41.     wc.hIconSm       = LoadIcon(nullptr, IDI_APPLICATION);
  42.  
  43.     if(!RegisterClassEx(&wc))
  44.     {
  45.         MessageBox(nullptr, L"Window Registration Failed!", L"Error!",
  46.             MB_ICONEXCLAMATION | MB_OK);
  47.         return 0;
  48.     }
  49.  
  50.     Handle_Main_Window = CreateWindowEx(
  51.         WS_EX_CLIENTEDGE,
  52.         g_szClassName,
  53.         L"Title",
  54.         WS_OVERLAPPEDWINDOW,
  55.         CW_USEDEFAULT,
  56.         CW_USEDEFAULT,
  57.         500,
  58.         500,
  59.         nullptr,
  60.         nullptr,
  61.         hInstance,
  62.         nullptr);
  63.  
  64.     if(Handle_Main_Window == NULL)
  65.     {
  66.         MessageBox(nullptr, L"Window Creation Failed!", L"Error!",
  67.             MB_ICONEXCLAMATION | MB_OK);
  68.         return 0;
  69.     }
  70.  
  71.     ShowWindow(Handle_Main_Window, nCmdShow);
  72.     UpdateWindow(Handle_Main_Window);
  73.  
  74.     while(GetMessage(&Msg, nullptr, 0, 0) > 0)
  75.     {
  76.         TranslateMessage(&Msg);
  77.         DispatchMessage(&Msg);
  78.     }
  79.     return Msg.wParam;
  80. }
  81.  
  82.  
  83. LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)
  84.     {
  85.         switch (msg)
  86.         {
  87.  
  88.             case WM_CREATE:
  89.                 create_controls( hWnd );
  90.                 break;
  91.  
  92.             case WM_COMMAND:
  93.                 switch(LOWORD(wParam)) {
  94.                 case 1:{
  95.                         ::MessageBox( hWnd, L"PUSH BUTTON 1 was clicked", L"message from PUSH BUTTON 1", MB_SETFOREGROUND );
  96.                         break;
  97.                     }
  98.  
  99.                 case 2:{
  100.                         const HWND text_box = GetDlgItem( hWnd, 3 );
  101.                         const int n = GetWindowTextLength( text_box );
  102.                         wstring text( n + 1, L'#' );
  103.                         if( n > 0 )
  104.                             {
  105.                                 GetWindowText( text_box, &text[0], text.length() );
  106.                             }
  107.                         text.resize( n );
  108.                         ::MessageBox(hWnd, text.c_str(), L"The INPUT TEXT WINDOW", MB_SETFOREGROUND );
  109.                         break;
  110.                     }
  111.  
  112.                 case 3:{
  113.                         break;
  114.                     }
  115.  
  116.                 case 4:{
  117.                         break;
  118.                     }
  119.  
  120.                 case 5:{
  121.                         const HWND text_box = GetDlgItem( hWnd, 5 );
  122.                         const int n = GetWindowTextLength( text_box );
  123.                         wstring text( n + 1, L'#' );
  124.                         if( n > 0 )
  125.                             {
  126.                                 GetWindowText( text_box, &text[0], text.length() );
  127.                             }
  128.                         text.resize( n );
  129.                         ::MessageBox(hWnd, L"SAVE BUTTON was clicked", L"message from SAVE BUTTON", MB_SETFOREGROUND );
  130.                         break;
  131.                     }
  132.  
  133.                 default:{
  134.                     }
  135.             }
  136.             break;
  137.  
  138.             case WM_CLOSE:{
  139.                     DestroyWindow(hWnd);
  140.                     break;
  141.                 }
  142.  
  143.             case WM_DESTROY:{
  144.                     PostQuitMessage(0);
  145.                 }
  146.  
  147.             default:{
  148.                     return DefWindowProc(hWnd, msg, wParam, lParam);
  149.                 }
  150.         }
  151.         return FALSE;
  152.     }
  153.  
  154. void create_controls( const HWND hwnd )
  155.     {
  156.  
  157.         CreateWindow( L"BUTTON",
  158.             L"PUSH BUTTON 1",
  159.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  160.             10,10,
  161.             130,20,
  162.             hwnd, (HMENU) 1, GetModuleHandle( nullptr ), nullptr
  163.             )  ;
  164.  
  165.         CreateWindow( L"EDIT",
  166.             L"INPUT TEXT WINDOW",
  167.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  168.             10,50,
  169.             200,25,
  170.             hwnd, (HMENU) 3, GetModuleHandle( nullptr ), nullptr
  171.             );
  172.  
  173.         CreateWindow( L"BUTTON",
  174.             L"SAVE BUTTON",
  175.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  176.             10,80,
  177.             110,20,
  178.             hwnd, (HMENU) 5, GetModuleHandle( nullptr ), nullptr
  179.             );
  180.  
  181.         CreateWindow( L"EDIT",
  182.             L"OUTPUT TEXT WINDOW",
  183.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  184.             10,130,
  185.             300,300,
  186.             hwnd, (HMENU) 4, GetModuleHandle( nullptr ), nullptr
  187.             );
  188.     }
  189.  
  190.  
  191.  
I can paste "Example input 办法 ." into the INPUT BOX, but what do I do with it next? I want to be able to click the button below that and see the UTF8 representation in the bottom box.




Help please.

Banfa said: "Support for Unicode is not a programming language or version matter but rather it is related to the execution environment supported character encoding."
I agree.
This is a start to having my program to be able to adapt to that.
I am trying to get to the final answer of my original question in this post. One step at a time.

Thank you.
Attached Images
File Type: jpg Unicode_UTF8.jpg (39.4 KB, 47 views)
Oct 28 '20 #6
SioSio
264 256MB
I did some research.
UTF-8: In order to be compatible with ASCII characters, the same part as ASCII is encoded with 1 byte, and the other parts are encoded with 2-6 bytes. In a 4-byte sequence, up to 21 bits (0x1FFFFF) can be expressed, but those representing 17 or more planes outside the Unicode range (larger than U + 10FFFF) are not accepted.
UTF-16, UTF-32: Unlike UTF-8, it is not ASCII compatible.
Therefore, the condition that meets the requirement of # 1 is to look for a version of C++ that supports UTF-8.

Support status of UTF-8 depending on the version of C++

C++17 can process UTF-8 data as "char" data. This allows you to use std::regex, std::fstream, std::cout, etc. without loss.
In C++20, we added char8_t and std::u8string for UTF-8. However, UTF is not supported at all due to the lack of std::u8fstream. Therefore, we need a way to convert between UTF-8 and the execution character set.
Oct 29 '20 #7
SioSio
264 256MB
I forgot to write.
In the C++11 Standard Library, UTF-8 is not supported for string and integer conversion functions, and I/O functions. Therefore, it needs to be converted to the system multibyte character code.
Oct 29 '20 #8
Banfa
9,064 Expert Mod 8TB
Looks like you are using WIN32 API. Windows GUI natively uses UTF16, I believe and the WIN32 API has wide char and multibyte versions of many characters, signified by a post fix W or A.

It also has a set of helper functions Unicode and Character Set Functions and I think the one you are interested in is WideCharToMultiByte.

I know there are Googleable examples out there.
Oct 29 '20 #9
dev7060
516 Expert 512MB
I can paste "Example input 办法 ." into the INPUT BOX, but what do I do with it next? I want to be able to click the button below that and see the UTF8 representation in the bottom box.
Like this?
Expand|Select|Wrap|Line Numbers
  1. case 5: {
  2.   HWND InputTextBox = GetDlgItem(hWnd, 3);
  3.   const int n = GetWindowTextLength(InputTextBox);
  4.   wstring text(n + 1, L '#');
  5.   if (n > 0) {
  6.     GetWindowText(InputTextBox, & text[0], text.length());
  7.   }
  8.   const wchar_t * wcs = text.c_str();
  9.   SetDlgItemText(hWnd, 4, wcs);
  10. }
  11.  



If you're using UTF-8 chars inside the Code::Blocks,
Settings -> Editor -> Encoding -> Change it from 'default' to UTF-8

Code::Blocks is smart enough to change the encoding automatically to prevent losing data. But it would do that temporarily for every time you click on build.

Cygwin environment can be used for CLI testing. It supports UTF-8. https://www.cygwin.com/

Attached Images
File Type: png dev7060.png (12.8 KB, 517 views)
File Type: jpg dev7060_2.jpg (93.9 KB, 549 views)
Oct 29 '20 #10
Banfa
9,064 Expert Mod 8TB
How's it going?

You probably need a couple of helper functions

Convert wide character to multibyte character aka UTF16 to UTF8
Expand|Select|Wrap|Line Numbers
  1. // Convert a wide Unicode string to an UTF8 string
  2. std::string utf8_encode(const std::wstring &wstr)
  3. {
  4.     if (wstr.empty())
  5.     {
  6.         return std::string();
  7.     }
  8.  
  9.     int size_needed = WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), NULL, 0, NULL, NULL);
  10.  
  11.     char buffer[size_needed+1];
  12.     WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), buffer, size_needed+1, NULL, NULL);
  13.  
  14.     std::string strTo( buffer );
  15.     return strTo;
  16. }
Get the character values of the multibyte character string
Expand|Select|Wrap|Line Numbers
  1. std::wstring utf8_byte_values(const std::string &str)
  2. {
  3.     if (str.empty())
  4.     {
  5.         return std::wstring();
  6.     }
  7.  
  8.     bool first = true;
  9.     std::wstringstream out;
  10.  
  11.     for(auto iter = str.begin(); iter != str.end(); ++iter)
  12.     {
  13.         if (first)
  14.         {
  15.             first = false;
  16.         }
  17.         else
  18.         {
  19.             out << L" ";
  20.         }
  21.  
  22.         unsigned int value = ((unsigned)*iter) & 0xFF;
  23.         out << L"0x" << std::hex << std::setw(2) << std::setfill(L'0') << value;
  24.     }
  25.  
  26.     return out.str();
  27. }
Then you Save Button code could look something like
Expand|Select|Wrap|Line Numbers
  1.        case BTN_SAVE:
  2.         {
  3.             const HWND in_box = GetDlgItem( hWnd, EDT_INPUT_TEXT );
  4.             const int n = GetWindowTextLength( in_box  );
  5.             if( n > 0 )
  6.             {
  7.                 wchar_t text[n+1]; // +1 for terminator
  8.                 GetWindowText( in_box, text, n+1 );
  9.                 string utf8 = utf8_encode(wstring(text));
  10.                 // Force calling of ASCII/UTF8 version
  11.                 SetDlgItemText( hWnd, EDT_OUTPUT_TEXT, utf8_byte_values(utf8).c_str());
  12.             }
  13. //            text.resize( n );
  14.             break;
  15.         }
Note I defined symbols for all your dialog item ids to aid readability.

More importantly note that WIN32 API is a C API and expects C style strings, that is '\0' terminated. This does not play nicely with C++ particularly C++ strings because they are not '\0' terminated which makes passing &text[0] to a WIN32 API where text is a std::string or std::wstring a very risky business. Instead, if the WIN32 API accepts a constant pointer prefer text.c_str() or if the WIN32 API function expects a non-constant pointer use a standard C array and convert to a std::(w)string later.

Of course I have been slightly naughty in my code and used variable length arrays which a C rather than a C++ feature but my GNU compiler lets me get away with that with a warning :D
Oct 30 '20 #11
SwissProgrammer
213 128KB
SioSio, Thank you.

I feel like I should parse the input into ASCII and non-ASCII first.

Then, I should parse the non-ASCII incoming text and characters and test each as to how well they work in UTF-8 first, then in UTF-16 (to see if it is larger than U + 10FFFF). Compare the results. Thus at least finding out if I am receiving input that is in plane 0 or plane 1+.

Then respond into the second text box with the resultant U.



Separately:
You said, "In the C++11 Standard Library, UTF-8 is not supported for string and integer conversion functions, and I/O functions. Therefore, it needs to be converted to the system multibyte character code."
In my CODE::BLOCKS 17.12 Settings/Editor/General settings/Encoding settings I have been using UTF-8 with the following choices chosen:
/ "As default encoding (bypassing C::B's auto-detection)"
/ "If conversion fails using the settings above, try system local settings".

But, I am concerned about system local settings on a user's computer that is different from my tested systems. Maybe then I should just catch any errors of such and deal with that separately.

I think that this is correct. What do you think? How would you handle this?

Thank you.
Oct 30 '20 #12
SwissProgrammer
213 128KB
Banfa, Thank you.

When I started learning C++11 I used WideCharToMultiByte and MultiByteToWideChar.

They seemed to work. But I read, maybe 2 or 3 places, that these should be avoided. I should have asked here at that time, but I did not. Since I see you using them, I shall use them with more confidence.


You used:
Expand|Select|Wrap|Line Numbers
  1.         std::wstringstream out;
  2.  
For that I got
error: aggregate 'std::wstringstream out' has incomplete type and cannot be defined

For future readers:
I added
Expand|Select|Wrap|Line Numbers
  1. #include <sstream>
which fixed that.
You used:
Expand|Select|Wrap|Line Numbers
  1.              out << L"0x" << std::hex << std::setw(2) << std::setfill(L'0') << value;
For that I got
error: 'setw' is not a member of 'std'

For future readers:
I added
Expand|Select|Wrap|Line Numbers
  1. #include <iomanip>
which fixes that.

You used:
Expand|Select|Wrap|Line Numbers
  1.                 const HWND in_box = GetDlgItem( hWnd, EDT_INPUT_TEXT );
which I changed to:
Expand|Select|Wrap|Line Numbers
  1. const HWND in_box = GetDlgItem(hWnd, 3);
I like the EDT_INPUT_TEXT but I am not certain how to get my CreateWindow to use that. So, I used 3 instead.

You used:
Expand|Select|Wrap|Line Numbers
  1.                     SetDlgItemText( hWnd, EDT_OUTPUT_TEXT, utf8_byte_values(utf8).c_str());
which I changed to
Expand|Select|Wrap|Line Numbers
  1.                     SetDlgItemText(hWnd, 4, utf8_byte_values(utf8).c_str());
Again, I like the way that you did it, but I am having difficulty getting your line to work.

I am not certain what the
Expand|Select|Wrap|Line Numbers
  1. //            text.resize( n );
is. But thank you.


It works. Thank you. I am getting closer to being able to test on different platforms in different versions of C++.

For 办
I get 0xe5 0x8a 0x9e

Getting closer.

Lots of times I have wanted to see an update of the progress of code changes that other people were working on.

For future readers here is what currently works for me:
Expand|Select|Wrap|Line Numbers
  1. #define _UNICODE
  2. #define UNICODE
  3.  
  4. #include <windows.h>
  5.  
  6. #include <iostream>
  7. #include <sstream>      // for std::wstringstream
  8. #include <iomanip>      // for std::setw
  9. #include <string>
  10. using namespace std;
  11.  
  12. #define MAX_LOADSTRING 100
  13.  
  14. HWND Handle_Main_Window = NULL;
  15.  
  16. #include <windows.h>
  17.  
  18. void create_controls( const HWND hwnd );
  19.  
  20.  
  21. LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam);
  22.  
  23. wchar_t g_szClassName[] = L"myWindowClass";
  24.  
  25. // Previous declarations
  26.     std::string utf8_encode(const std::wstring &wstr);
  27.     std::wstring utf8_byte_values(const std::string &str);
  28.  
  29.  
  30.     // Convert a wide Unicode string to an UTF8 string
  31.     std::string utf8_encode(const std::wstring &wstr)
  32.     {
  33.         if (wstr.empty())
  34.         {
  35.             return std::string();
  36.         }
  37.  
  38.         int size_needed = WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), nullptr, 0, nullptr, nullptr);
  39.  
  40.         char buffer[size_needed+1];
  41.         WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), buffer, size_needed+1, nullptr, nullptr);
  42.  
  43.         std::string strTo( buffer );
  44.         return strTo;
  45.     }
  46.  
  47.  
  48.  
  49.     std::wstring utf8_byte_values(const std::string &str)
  50.     {
  51.         if (str.empty())
  52.         {
  53.             return std::wstring();
  54.         }
  55.  
  56.         bool first = true;
  57.         std::wstringstream out;
  58.         // error: aggregate 'std::wstringstream out' has incomplete type and cannot be defined
  59.  
  60.         // I found this in <iosfwd>
  61.         // Class for @c wchar_t mixed input and output memory streams.
  62.         //   typedef basic_stringstream<wchar_t>     wstringstream;
  63.         // Is that something from Visual Studio or maybe a later version of Code:Blocks?
  64.  
  65.         for(auto iter = str.begin(); iter != str.end(); ++iter)
  66.         {
  67.             if (first)
  68.             {
  69.                 first = false;
  70.             }
  71.             else
  72.             {
  73.                 out << L" ";
  74.             }
  75.  
  76.             unsigned int value = ((unsigned)*iter) & 0xFF;
  77.             out << L"0x" << std::hex << std::setw(2) << std::setfill(L'0') << value;
  78.         }
  79.  
  80.         return out.str();
  81.     }
  82.  
  83.  
  84.  
  85. int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
  86.     LPSTR lpCmdLine, int nCmdShow)
  87. {
  88.     WNDCLASSEX wc;
  89.     MSG Msg;
  90.  
  91.     wc.cbSize        = sizeof(WNDCLASSEX);
  92.     wc.style         = 0;
  93.     wc.lpfnWndProc   = WndProc;
  94.     wc.cbClsExtra    = 0;
  95.     wc.cbWndExtra    = 0;
  96.     wc.hInstance     = hInstance;
  97.     wc.hIcon         = LoadIcon(nullptr, IDI_APPLICATION);
  98.     wc.hCursor       = LoadCursor(nullptr, IDC_ARROW);
  99.     wc.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
  100.     wc.lpszMenuName  = nullptr;
  101.     wc.lpszClassName = g_szClassName;
  102.     wc.hIconSm       = LoadIcon(nullptr, IDI_APPLICATION);
  103.  
  104.     if(!RegisterClassEx(&wc))
  105.     {
  106.         MessageBox(nullptr, L"Window Registration Failed!", L"Error!",
  107.             MB_ICONEXCLAMATION | MB_OK);
  108.         return 0;
  109.     }
  110.  
  111.     Handle_Main_Window = CreateWindowEx(
  112.         WS_EX_CLIENTEDGE,
  113.         g_szClassName,
  114.         L"Title",
  115.         WS_OVERLAPPEDWINDOW,
  116.         CW_USEDEFAULT,
  117.         CW_USEDEFAULT,
  118.         500,
  119.         500,
  120.         nullptr,
  121.         nullptr,
  122.         hInstance,
  123.         nullptr);
  124.  
  125.     if(Handle_Main_Window == NULL)
  126.     {
  127.         MessageBox(nullptr, L"Window Creation Failed!", L"Error!",
  128.             MB_ICONEXCLAMATION | MB_OK);
  129.         return 0;
  130.     }
  131.  
  132.     ShowWindow(Handle_Main_Window, nCmdShow);
  133.     UpdateWindow(Handle_Main_Window);
  134.  
  135.     while(GetMessage(&Msg, nullptr, 0, 0) > 0)
  136.     {
  137.         TranslateMessage(&Msg);
  138.         DispatchMessage(&Msg);
  139.     }
  140.     return Msg.wParam;
  141. }
  142.  
  143.  
  144. LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)
  145.     {
  146.         switch (msg)
  147.         {
  148.  
  149.             case WM_CREATE:
  150.                 create_controls( hWnd );
  151.                 break;
  152.  
  153.             case WM_COMMAND:
  154.                 switch(LOWORD(wParam)) {
  155.                 case 1:{
  156.                         ::MessageBox( hWnd, L"PUSH BUTTON 1 was clicked", L"message from PUSH BUTTON 1", MB_SETFOREGROUND );
  157.                         break;
  158.                     }
  159.  
  160.                 case 2:{
  161.                         const HWND text_box = GetDlgItem( hWnd, 3 );
  162.                         const int n = GetWindowTextLength( text_box );
  163.                         wstring text( n + 1, L'#' );
  164.                         if( n > 0 )
  165.                             {
  166.                                 GetWindowText( text_box, &text[0], text.length() );
  167.                             }
  168.                         text.resize( n );
  169.                         ::MessageBox(hWnd, text.c_str(), L"The INPUT TEXT WINDOW", MB_SETFOREGROUND );
  170.                         break;
  171.                     }
  172.  
  173.                 case 3:{
  174.                         break;
  175.                     }
  176.  
  177.                 case 4:{
  178.                         break;
  179.                     }
  180.  
  181.                case 5:  //BTN_SAVE:
  182.                 {
  183. //                    const HWND in_box = GetDlgItem( hWnd, EDT_INPUT_TEXT );
  184.                     const HWND in_box = GetDlgItem(hWnd, 3);
  185.                     const int n = GetWindowTextLength( in_box  );
  186.                     if( n > 0 )
  187.                     {
  188.                         wchar_t text[n+1]; // +1 for terminator
  189.                         GetWindowText( in_box, text, n+1 );
  190.                         string utf8 = utf8_encode(wstring(text));
  191.                         // Force calling of ASCII/UTF8 version
  192. //                        SetDlgItemText( hWnd, EDT_OUTPUT_TEXT, utf8_byte_values(utf8).c_str());
  193.                         SetDlgItemText(hWnd, 4, utf8_byte_values(utf8).c_str());
  194.  
  195.                     }
  196.         //            text.resize( n );
  197.                     break;
  198.                 }
  199.  
  200.                 default:{
  201.                     }
  202.             }
  203.             break;
  204.  
  205.             case WM_CLOSE:{
  206.                     DestroyWindow(hWnd);
  207.                     break;
  208.                 }
  209.  
  210.             case WM_DESTROY:{
  211.                     PostQuitMessage(0);
  212.                 }
  213.  
  214.             default:{
  215.                     return DefWindowProc(hWnd, msg, wParam, lParam);
  216.                 }
  217.         }
  218.         return FALSE;
  219.     }
  220.  
  221. void create_controls( const HWND hwnd )
  222.     {
  223.  
  224.         CreateWindow( L"BUTTON",
  225.             L"PUSH BUTTON 1",
  226.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  227.             10,10,
  228.             130,20,
  229.             hwnd, (HMENU) 1, GetModuleHandle( nullptr ), nullptr
  230.             )  ;
  231.  
  232.         CreateWindow( L"EDIT",
  233.             L"办",
  234.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  235.             10,50,
  236.             200,25,
  237.             hwnd, (HMENU) 3, GetModuleHandle( nullptr ), nullptr
  238.             );
  239.  
  240.         CreateWindow( L"BUTTON",
  241.             L"SAVE BUTTON",
  242.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  243.             10,80,
  244.             110,20,
  245.             hwnd, (HMENU) 5, GetModuleHandle( nullptr ), nullptr
  246.             );
  247.  
  248.         CreateWindow( L"EDIT",
  249.             L"OUTPUT TEXT WINDOW",
  250.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  251.             10,130,
  252.             300,300,
  253.             hwnd, (HMENU) 4, GetModuleHandle( nullptr ), nullptr
  254.             );
  255.     }
  256.  
  257.  

Thank you.
Oct 30 '20 #13
Banfa
9,064 Expert Mod 8TB
You used:

Expand|Select|Wrap|Line Numbers
  1. **SetDlgItemText(*hWnd,*EDT_OUTPUT_TEXT,*utf8_byte_values(utf8).c_str());
which I changed to

Expand|Select|Wrap|Line Numbers
  1. **SetDlgItemText(hWnd,*4,*utf8_byte_values(utf8).c_str());
Again, I like the way that you did it, but I am having difficulty getting your line to work.
Expand|Select|Wrap|Line Numbers
  1. #define EDT_OUTPUT_TEXT 4
At the top of the file.

If you have to use a number more than once it is a magic number. Magic numbers are very poor practice and you remove them by assigning them to a symbol, actually in C++ a const variable should be preferred to create this type of constant.

Expand|Select|Wrap|Line Numbers
  1. const int EDT_OUTPUT_TEXT = 4;
But this is WIN32 which I used with C so a reverted to #define.
Oct 31 '20 #14
SioSio
264 256MB
Tip 1.
An example of determining whether a character string contains non-alphanumeric symbols.
Expand|Select|Wrap|Line Numbers
  1. #include <iostream>
  2. #include <regex>
  3.  
  4. /**
  5.  * @brief Determine if it is an alphanumeric symbol.
  6.  *
  7.  * @return true:only alphanumeric / false:Contains non-alphanumeric symbols
  8.  */
  9. bool IsAlphabetNumericSymbol(std::string src)
  10. {
  11.     std::regex pattern("^[a-zA-Z0-9!-/:-@\[-`{-~]+$");
  12.     std::smatch sm;
  13.     if (std::regex_match(src, sm, pattern))
  14.     {
  15.         return true;
  16.     }
  17.     else
  18.     {
  19.         return false;
  20.     }
  21. }
  22.  
  23. int main()
  24. {
  25.     // Only alphanumeric case
  26.     std::cout << IsAlphabetNumericSymbol("abc012@") << std::endl;
  27.  
  28.     // Contains non-alphanumeric symbols case
  29.     std::cout << IsAlphabetNumericSymbol("1漢字A") << std::endl;
  30.     return 0;
  31. }
Tip 2.
"123漢字ABC" shown in UTF-16 is 16 bytes.

Tip 3.
Mutual conversion UTF-8 <=> UTF-16
Expand|Select|Wrap|Line Numbers
  1. inline std::wstring convertUtf8ToUtf16(char const* iString)
  2. {
  3.     std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> converter;
  4.     return converter.from_bytes(iString);
  5. }
  6.  
  7. inline std::string convertUtf16ToUtf8(wchar_t const* iString)
  8. {
  9.     std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> converter;
  10.     return converter.to_bytes(iString);
  11. }
Referenced URL.
https://docs.microsoft.com/en-us/arc...and-win32-apis

I hope you find this information helpful.
Nov 2 '20 #15
SwissProgrammer
213 128KB
I used
Expand|Select|Wrap|Line Numbers
  1.                          ::MessageBox(hWnd, L"SAVE BUTTON was clicked", L"message from SAVE BUTTON", MB_SETFOREGROUND );
  2.  
I did not need the scope resolution operator :: before the Messagebox. I think it is sometimes used in Visual Studio. I have cleaned that out.
Expand|Select|Wrap|Line Numbers
  1.                          MessageBox(hWnd, L"SAVE BUTTON was clicked", L"message from SAVE BUTTON", MB_SETFOREGROUND );
  2.  


I have been reading that
Expand|Select|Wrap|Line Numbers
  1. using namespace std;
adds a huge amount of code to a program. I am trying to avoid that by using
Expand|Select|Wrap|Line Numbers
  1.     using std::string;
  2.     using std::wstring;
  3.     using std::wstringstream;
  4.     using std::hex;
  5.     using std::setw;
  6.     using std::setfill;
I am open to comments on that.



I worked on those "magic numbers" and I think that I fixed them.
Expand|Select|Wrap|Line Numbers
  1. const int PUSH_BUTTON_1     = 1;
  2. const int IN_put_text_box   = 2;
  3. const int SAVE_button       = 3;
  4. const int OUT_put_text_box  = 4;
That took a lot longer that I had expected.



For
办 办
I get
0xe5 0x8a 0x9e 0x20 0x20 0x20 0xe5 0x8a 0x9e
I know that
is
0xe5 0x8a 0x9e
and the blanks are
0x20
but how do I separate it out automatically?
A simple response like
0xe5 0x8a 0x9e, 0x20, 0x20, 0x20, 0xe5 0x8a 0x9e
would tell me at least where the separation logic is. Then I could go forward and work with each. Someone please?

Here what I have so far
Expand|Select|Wrap|Line Numbers
  1. #define _UNICODE
  2. #define UNICODE
  3.  
  4. #include <windows.h>
  5.  
  6. #include <iostream>
  7. #include <sstream>      // for std::wstringstream
  8. #include <iomanip>      // for std::setw
  9. #include <string>
  10.  
  11. //using namespace std;  // do not need all of this namespace for this small program.
  12.  
  13. // Shortened version of namespace std;
  14.     using std::string;
  15.     using std::wstring;
  16.     using std::wstringstream;
  17.     using std::hex;
  18.     using std::setw;
  19.     using std::setfill;
  20.  
  21. #define MAX_LOADSTRING 100
  22.  
  23. HWND Handle_Main_Window = NULL;
  24.  
  25. #include <windows.h>
  26.  
  27. wchar_t g_szClassName[] = L"myWindowClass";
  28.  
  29. const int PUSH_BUTTON_1     = 1;
  30. const int IN_put_text_box   = 2;
  31. const int SAVE_button       = 3;
  32. const int OUT_put_text_box  = 4;
  33.  
  34.  
  35. // Previous declarations
  36.     void create_controls( const HWND hwnd );
  37.     LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam);
  38.     string utf8_encode(const wstring &wstr);
  39.     wstring utf8_byte_values(const string &str);
  40.  
  41.  
  42. // Convert a wide Unicode string to an UTF8 string
  43. string utf8_encode(const wstring &wstr)
  44.     {
  45.         if (wstr.empty())
  46.             {
  47.                 return string();
  48.             }
  49.  
  50.         int size_needed = WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), nullptr, 0, nullptr, nullptr);
  51.  
  52.         char buffer[size_needed+1];
  53.         WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), buffer, size_needed+1, nullptr, nullptr);
  54.  
  55.         string strTo( buffer );
  56.         return strTo;
  57.     }
  58.  
  59.  
  60.  
  61. wstring utf8_byte_values(const string &str)
  62.     {
  63.         if (str.empty())
  64.             {
  65.                 return wstring();
  66.             }
  67.  
  68.         bool first = true;
  69.         wstringstream out;
  70.  
  71.         for(auto iter = str.begin(); iter != str.end(); ++iter)
  72.             {
  73.                 if (first)
  74.                     {
  75.                         first = false;
  76.                     }
  77.                 else
  78.                     {
  79.                         out << L" ";
  80.                     }
  81.  
  82.                 unsigned int value = ((unsigned)*iter) & 0xFF;
  83.                 out << L"0x" << hex << setw(2) << setfill(L'0') << value;
  84.             }
  85.  
  86.         return out.str();
  87.     }
  88.  
  89.  
  90.  
  91. int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
  92.     LPSTR lpCmdLine, int nCmdShow)
  93.     {
  94.         WNDCLASSEX wc;
  95.         MSG Msg;
  96.  
  97.         wc.cbSize        = sizeof(WNDCLASSEX);
  98.         wc.style         = 0;
  99.         wc.lpfnWndProc   = WndProc;
  100.         wc.cbClsExtra    = 0;
  101.         wc.cbWndExtra    = 0;
  102.         wc.hInstance     = hInstance;
  103.         wc.hIcon         = LoadIcon(nullptr, IDI_APPLICATION);
  104.         wc.hCursor       = LoadCursor(nullptr, IDC_ARROW);
  105.         wc.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
  106.         wc.lpszMenuName  = nullptr;
  107.         wc.lpszClassName = g_szClassName;
  108.         wc.hIconSm       = LoadIcon(nullptr, IDI_APPLICATION);
  109.  
  110.         if(!RegisterClassEx(&wc))
  111.             {
  112.                 MessageBox(nullptr, L"Window Registration Failed!", L"Error!", MB_ICONEXCLAMATION | MB_OK);
  113.                 return 0;
  114.             }
  115.  
  116.         Handle_Main_Window = CreateWindowEx(
  117.             WS_EX_CLIENTEDGE,
  118.             g_szClassName,
  119.             L"Title",
  120.             WS_OVERLAPPEDWINDOW,
  121.             CW_USEDEFAULT,
  122.             CW_USEDEFAULT,
  123.             630,
  124.             470,
  125.             nullptr,
  126.             nullptr,
  127.             hInstance,
  128.             nullptr);
  129.  
  130.         if(Handle_Main_Window == NULL)
  131.             {
  132.                 MessageBox(nullptr, L"Window Creation Failed!", L"Error!", MB_ICONEXCLAMATION | MB_OK);
  133.                 return 0;
  134.             }
  135.  
  136.         ShowWindow(Handle_Main_Window, nCmdShow);
  137.         UpdateWindow(Handle_Main_Window);
  138.  
  139.         while(GetMessage(&Msg, nullptr, 0, 0) > 0)
  140.             {
  141.                 TranslateMessage(&Msg);
  142.                 DispatchMessage(&Msg);
  143.             }
  144.         return Msg.wParam;
  145.     }
  146.  
  147.  
  148. LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)
  149.     {
  150.         switch (msg)
  151.             {
  152.  
  153.                 case WM_CREATE:
  154.                     {
  155.                         create_controls( hWnd );
  156.                         break;
  157.                     }
  158.  
  159.                 case WM_COMMAND:
  160.                     {
  161.                         switch(LOWORD(wParam))
  162.                             {
  163.  
  164.                                 case PUSH_BUTTON_1:
  165.                                     {
  166.                                         MessageBox( hWnd, L"PUSH BUTTON 1 was clicked", L"message from PUSH BUTTON 1", MB_SETFOREGROUND );
  167.                                         break;
  168.                                     }
  169.  
  170.                                 case IN_put_text_box:
  171.                                     {
  172.                                         const HWND in_box = GetDlgItem(hWnd, IN_put_text_box);
  173.                                         const int n = GetWindowTextLength( in_box  );
  174.                                         if( n > 0 )
  175.                                             {
  176.                                                 wchar_t text[n+1]; // +1 for terminator
  177.                                                 GetWindowText( in_box, text, n+1 );
  178.                                                 string utf8 = utf8_encode(wstring(text));
  179.                                                 // Force calling of ASCII/UTF8 version
  180.                                                 SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  181.  
  182.                                             }
  183.                                         else    //if( n = 0 )
  184.                                             {
  185.                                                 wchar_t text[n+1]; // +1 for terminator
  186.                                                 GetWindowText( in_box, text, n+1 );
  187.                                                 string utf8 = utf8_encode(wstring(text));
  188.                                                 // Force calling of ASCII/UTF8 version
  189.                                                 SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  190.                                             }
  191.                                         break;
  192.                                     }
  193.  
  194.                                 case SAVE_button:  //BTN_SAVE:
  195.                                     {
  196.                                         const HWND in_box = GetDlgItem(hWnd, IN_put_text_box);
  197.                                         const int n = GetWindowTextLength( in_box  );
  198.                                         if( n > 0 )
  199.                                             {
  200.                                                 wchar_t text[n+1]; // +1 for terminator
  201.                                                 GetWindowText( in_box, text, n+1 );
  202.                                                 string utf8 = utf8_encode(wstring(text));
  203.                                                 // Force calling of ASCII/UTF8 version
  204.                                                 SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  205.                                             }
  206.                                         else    //if( n = 0 )
  207.                                             {
  208.                                                 wchar_t text[n+1]; // +1 for terminator
  209.                                                 GetWindowText( in_box, text, n+1 );
  210.                                                 string utf8 = utf8_encode(wstring(text));
  211.                                                 // Force calling of ASCII/UTF8 version
  212.                                                 SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  213.                                             }
  214.                                         break;
  215.                                     }
  216.  
  217.                                 default:
  218.                                     {
  219.                                     }
  220.                         }
  221.  
  222.                         break;
  223.                     }
  224.  
  225.                 case WM_CLOSE:
  226.                     {
  227.                         DestroyWindow(hWnd);
  228.                         break;
  229.                     }
  230.  
  231.                 case WM_DESTROY:
  232.                     {
  233.                         PostQuitMessage(0);
  234.                     }
  235.  
  236.                 default:
  237.                     {
  238.                         return DefWindowProc(hWnd, msg, wParam, lParam);
  239.                     }
  240.             }
  241.  
  242.         return FALSE;
  243.     }
  244.  
  245. void create_controls( const HWND hwnd )
  246.     {
  247.  
  248.         CreateWindowW
  249.             (
  250.                 L"BUTTON",
  251.                 L"PUSH BUTTON 1",
  252.                 WS_VISIBLE | WS_CHILD | WS_BORDER,
  253.                 10,10,
  254.                 130,20,
  255.                 hwnd,
  256.                 (HMENU) PUSH_BUTTON_1,
  257.                 GetModuleHandle( nullptr ),
  258.                 nullptr
  259.             );
  260.  
  261.         CreateWindowW
  262.             (
  263.                 L"EDIT",
  264.                 L"办   办",
  265.                 WS_VISIBLE | WS_CHILD | WS_BORDER,
  266.                 10,50,
  267.                 200,25,
  268.                 hwnd,
  269.                 (HMENU) IN_put_text_box,
  270.                 GetModuleHandle( nullptr ),
  271.                 nullptr
  272.             );
  273.  
  274.         CreateWindowW
  275.             (
  276.                 L"BUTTON",
  277.                 L"SAVE BUTTON",
  278.                 WS_VISIBLE | WS_CHILD | WS_BORDER,
  279.                 10,80,
  280.                 110,20,
  281.                 hwnd,
  282.                 (HMENU) SAVE_button,
  283.                 GetModuleHandle( nullptr ),
  284.                 nullptr
  285.             );
  286.  
  287.         CreateWindowW
  288.             (
  289.                 L"EDIT",
  290.                 L"OUTPUT TEXT WINDOW",
  291.                 WS_VISIBLE | WS_CHILD | WS_BORDER,
  292.                 10,130,
  293.                 600,300,
  294.                 hwnd,
  295.                 (HMENU) OUT_put_text_box,
  296.                 GetModuleHandle( nullptr ),
  297.                 nullptr
  298.             );
  299.     }
  300.  
  301.  
Thank you.
Nov 13 '20 #16
Banfa
9,064 Expert Mod 8TB
Mostly the scope resolution operator :: is unnecessary but it is there for the odd occasion where there is a clash between a class member name and a top level symbol name, i.e. suppose you are trying to call ::MessageBox from within a class with a member called MessageBox.

Using using <symbol> rather than using namespace <NamespaceName> is exactly what we do in production code in work for the very same reason, to reduce the number of symbols being imported into the global namespace. I'd definately recommend it although you can end up with a large section of using directives at the top of your source files.

Once you get used to using constants for magic numbers from project start it becomes easier. Make use you decide on a naming convention for these constants an stick to it, all caps with underscores is a common standard.

Maybe try treating the input string 1 character at a time discarding the space characters or try using std::string::find_first_of to locate spaces in either the input or output.
Nov 14 '20 #17
SwissProgrammer
213 128KB
Banfa,

Update: I have been looking at std::string::find_first_of, but It does not seem to work with Unicode. And, I get messages of Visual Studio problems with similar attempts. I am trying to not use VS in any way, so I am still working on this. Thank you. I might eventually get the ability to parse or split Unicode strings into each single or combined character. Thanks for now.
Nov 18 '20 #18
SwissProgrammer
213 128KB
SioSio,

You said, "In the C++11 Standard Library, UTF-8 is not supported for string and integer conversion functions, and I/O functions. Therefore, it needs to be converted to the system multibyte character code."

I have read that std::wstring is better for parsing than multibyte. I have also read that multibyte is more useful. Which direction should I study to be able to parse the following into each individual characters?

For now I am just trying to do the test with Unicode plane 0.


If someone pastes or places into my text box the following:
123漢字ABC
I want the second text box to show the following:
Full Sentence in UTF-8
\x31\x32\x33\xe6\xbc\xa2\xe5\xad\x97\x41\x42\x43


Individual Single Characters in UTF-8

1 = \x31
2 = \x32
3 = \x33
漢 = \xe6\xbc\xa2
字 = \xe5\xad\x97
A = \x41
B = \x42
C = \x43

All that in a single text box showing all those lines.

Should I work at doing this with multibyte or with std::wstring?

I almost got it to work a few times, but I am not certain what I did.

I thought to split the input sentence into individual characters in the following area, but I lost it. Maybe later I can show you what I did if I get it close again.

Expand|Select|Wrap|Line Numbers
  1.                                 case SAVE_button:  //BTN_SAVE:
  2.                                     {
  3.                                         const HWND in_box = GetDlgItem(hWnd, IN_put_text_box);
  4.                                         const int n = GetWindowTextLength( in_box  );
  5.                                         if( n > 0 )
  6.                                             {
  7.                                                 wchar_t text[n+1]; // +1 for terminator
  8.                                                 GetWindowText( in_box, text, n+1 );
  9.                                                 string utf8 = utf8_encode(wstring(text));
  10.                                                 // Force calling of ASCII/UTF8 version
  11.                                                 SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  12.                                             }
  13.                                         else    //if( n = 0 )
  14.                                             {
  15.                                                 wchar_t text[n+1]; // +1 for terminator
  16.                                                 GetWindowText( in_box, text, n+1 );
  17.                                                 string utf8 = utf8_encode(wstring(text));
  18.                                                 // Force calling of ASCII/UTF8 version
  19.                                                 SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  20.                                             }
  21.                                         break;
  22.                                     }

Or should I use UTF-16 or UTF-32?

I also found, "The better way is to use std::u16string (std::basic_string<char16_t>) and std::u32string (std::basic_string<char32_t>). They'll work regardless of system and encoding of the source file" [here].

I have been struggling with this for a while, and there is lots of advice for and against lots of stuff, most of which I have not gotten to work except temporarily and now I know that something did work, but I do not remember what did.

Thank you.
Nov 25 '20 #19
SioSio
264 256MB
Correspondence between UTF-8 bytes and code point range.
The number of bytes when a character is encoded in UTF-8 can be derived as follows.

For 1 byte, the bit pattern is 0xxx xxxx.
The number of valid bits is 7, the maximum value is 0111 1111 = 0x7F.
The range to be represented is 0x0000 to 0x007F (within ASCII range)
For 2 bytes, the bit pattern is 110x xxxx 10xx xxxx.
The effective number of bits is 5 + 6 = 11, the maximum value is 0111 1111 1111 = 0x7FF.
The range to represent is 0x0080 ~ 0x07FF.
For 3 bytes, the bit pattern is 1110 xxxx 10xx xxxx 10xx xxxx.
The number of valid bits is 4 + 6 + 6 = 16 (exactly 2 bytes), and the maximum value is 0xFFFF.
The range to be represented is all the remaining characters in the UCS-2 range.
4 bytes are only the surrogate pair part (surface 00 to surface 10H).

In other words, since the number of continuous bytes can be known from the bit pattern of the first byte, it can be divided character by character by using this.
Nov 26 '20 #20
SwissProgrammer
213 128KB
I am a beginner at C++ and this is where I am in this current process. Maybe other beginners can learn from this.


HELPFUL COMMENTS on the following ARE APPRECIATED !



I will now try to find the following
0xxx
110x
1110
In analyzing equivalents of the string "123漢字ABC"

Using online converters [with reference links];

Unicode text = UTF-8[1] = Decimal[1] = Binary[2]

I added the Decimal equivalents so that I could have that option.



1 = \x31 = 00049 = 00110001

2 = \x32 = 00050 = 00110010

3 = \x33 = 00051 = 00110011

漢 = \xe6\xbc\xa2 = 28450 = 11100110 10111100 10100010

字 = \xe5\xad\x97 = 23383 = 11100101 10101101 10010111

A = \x41 = 00065 = 01000001

B = \x42 = 00066 = 01000010

C = \x43 = 00067 = 01000011


If I start with the Unicode string "123漢字ABC" as it appears in binary form on [2] using full length bytes and placing a space after every byte.

I get the following:
With spacing

00110001 00110010 00110011 11100110 10111100 10100010
11100101 10101101 10010111 01000001 01000010 01000011
or
Without spacing

00110001001100100011001111100110101111001010001011 1001011010110110010111010000010100001001000011

If I split that as SioSio says:

I first look for 0xxx or 110x or 1110 and set those as separate. I am not certain how that works with Unicode planes above 0 (1 to 16), but this is where I am now.

00110001 starts with 0xxx,

00110010 starts with 0xxx,

00110010 starts with 0xxx,

11100110 starts with 1110,

10111100 does not start with any of those so it is a continuation of the previous,

10100010 does not start with any of those so it is a continuation of the previous,

Thus: 11100110 10111100 10100010 are together.

11100101 starts with 1110,

10101101 does not start with any of those so it is a continuation of the previous,

10010111 does not start with any of those so it is a continuation of the previous,

Thus: 11100101 10101101 10010111 are together.

01000001 starts with 0xxx,

01000010 starts with 0xxx,

01000011 starts with 0xxx,


Result (without spaces):
00110001
00110010
00110010
111001101011110010100010
111001011010110110010111
01000001
01000010
01000011
Should I convert this to UTF-8 or just use it in binary form at this time before I convert it to readable characters?

On the link [3] they go from binary to text characters. Maybe I could use C++ to go directly from binary to text characters. (?)



If the previous is correct, I now have a process to put into C++11 code.
  1. Convert text (in the textbox) to binary.
  2. Split the binary into (parts or sections or some other phrase) with spaces.
  3. Search for 0xxx, and 110x, and 1110 in that binary to get the starting of each (what that thing is called).
  4. Save those [things] to an array of them for later use.
  5. Convert each of those [things] to readable Unicode characters.
  6. Save those Unicode characters to an array of them for later use.
  7. Parse the text, change the text, edit the text, by using the arrays.

Comments PLEASE !

Am I doing this right?


Thank you.
Nov 26 '20 #21
SioSio
264 256MB
Hi.
It doesn't have to be that complicated.

The code to split the UTF-8 string character by character is as follows.

Expand|Select|Wrap|Line Numbers
  1. #include <iostream>
  2. #include <string>
  3.  
  4. void print_char(std::string);
  5.  
  6. int main(void)
  7. {
  8.     using namespace std;
  9.     string str = u8"123漢字ABC";
  10.     print_char(str);
  11.     return 0;
  12. }
  13.  
  14. void print_char(std::string str)
  15. {
  16.     using namespace std;
  17.     int pos;
  18.     unsigned char ch;
  19.     int char_size;
  20.     for (pos = 0; pos < str.size(); pos += char_size) {
  21.         ch = str[pos];
  22.         if (ch < 0x80) {
  23.             char_size = 1;
  24.         }
  25.         else if (ch < 0xE0) {
  26.             char_size = 2;
  27.         }
  28.         else if (ch < 0xF0) {
  29.             char_size = 3;
  30.         }
  31.         else {
  32.             char_size = 4;
  33.         }
  34.         cout << str.substr(pos, char_size) << '\n';
  35.     }
  36. }
Nov 27 '20 #22
SwissProgrammer
213 128KB
SioSio,

Expand|Select|Wrap|Line Numbers
  1.         for (pos = 0; pos < str.size(); pos += char_size) {
I get
warning: comparison between signed and unsigned integer expressions [-Wsign-compare]|

and

I tried
Expand|Select|Wrap|Line Numbers
  1. void print_char(std::string str)
  2.     {
  3.         using namespace std;
  4.         unsigned int pos;
  5.         unsigned char ch;
  6.         int char_size;
  7.         for (pos = 0; pos < str.size(); pos += char_size) {
I get
Process terminated with status 0 (0 minute(s), 0 second(s))

and

I tried
Expand|Select|Wrap|Line Numbers
  1. {
  2.         using namespace std;
  3.         int pos;
  4.         unsigned char ch;
  5.         unsigned int char_size;
  6.         for (pos = 0; pos < str.size(); pos += char_size) {
I get
Process terminated with status 0 (0 minute(s), 0 second(s))

and

I tried
Expand|Select|Wrap|Line Numbers
  1. {
  2.         using namespace std;
  3.         int pos;
  4.         char ch;
  5.         int char_size;
  6.         for (pos = 0; pos < str.size(); pos += char_size) {
I get
warning: comparison between signed and unsigned integer expressions [-Wsign-compare]|
With more warnings.

It looks impressive, but why will it not let me make these adjustments to work?

I am working on this.

Thank you.
Nov 27 '20 #23
SioSio
264 256MB
I think that warningdisappears with the following modifications.
Expand|Select|Wrap|Line Numbers
  1.     unsigned int pos;
  2.     unsigned char ch;
  3.     unsigned int char_size;
std::wstring_convert is available in C++11.
How to convert std::wstring to a byte string with std::wstring_convert and display it with std::cout.

Expand|Select|Wrap|Line Numbers
  1. #include <iostream>
  2. #include <string>
  3. #include <codecvt>
  4.  
  5. int main(void)
  6. {
  7.     using namespace std;
  8.     wstring_convert<codecvt_utf8<wchar_t>, wchar_t> cv;
  9.     wstring wstr = L"123漢字ABC";
  10.  
  11.     for (auto v : wstr) {
  12.         cout << cv.to_bytes(v) << '\n';
  13.     }
  14.     return 0;
  15. }
Nov 27 '20 #24
SwissProgrammer
213 128KB
For
Expand|Select|Wrap|Line Numbers
  1.         wstring_convert<codecvt_utf8<wchar_t>, wchar_t> cv;
I get: wstring_convert was not declared in this scope

I found class wstring_convert in locale_conv.h .
When I added the String conversions from that header file, it gave more errors.

I added #include <locale>
and that fixed that part.
Nov 27 '20 #25
SwissProgrammer
213 128KB
OK I got it to work as a console application.

The 漢字 do not show up right, but they are separated out like the rest as they should be. I think that I am beginning to understand what you are doing.

Expand|Select|Wrap|Line Numbers
  1. #include <iostream>
  2. #include <string>
  3. #include <codecvt>
  4. #include <locale>
  5.  
  6. #include <conio.h>
  7. #include <stdio.h>
  8.  
  9. // Previous declarations
  10.     void PressAKeyToContinue();
  11.  
  12.  
  13. int main(void)
  14.     {
  15.         using namespace std;
  16.         wstring_convert<codecvt_utf8<wchar_t>, wchar_t> cv;
  17.         wstring wstr = L"123漢字ABC";
  18.  
  19.         for (auto v : wstr) 
  20.             {
  21.                 cout << cv.to_bytes(v) << '\n';
  22.                 // .to_bytes  Reference: https://en.cppreference.com/w/cpp/locale/wstring_convert/to_bytes
  23.             }
  24.  
  25.         PressAKeyToContinue();
  26.         return 0;
  27.     }
  28.  
  29.  
  30. void PressAKeyToContinue()
  31.     {
  32.         int c;
  33.         printf( "\nPress a key to continue..." );
  34.         c = getch();
  35.         if (c == 0 || c == 224) getch();
  36.     }
  37.  
Thank you.
Nov 27 '20 #26
SwissProgrammer
213 128KB
I think that the code is probably working correctly, but my CLI might be the problem and not showing the correct response.


I am working on putting this into the previous code with the textboxes, etc. I will try to get back to you on that if I get it to work.


Thank you.
Nov 28 '20 #27
SwissProgrammer
213 128KB
Notes for future readers relating to reports on codecvt_utf8 and surrogate pairs.

I have been studying codecvt_utf8 from
Expand|Select|Wrap|Line Numbers
  1. wstring_convert<codecvt_utf8<wchar_t>, wchar_t> cv;
I have read that codecvt_utf8 is reported to be buggy.

"Microsoft's implementation of std::codecvt_utf8 appears to successfully convert any UTF-16 code unit into UTF-8including surrogate pairs. This is a bug, as surrogates are not encodable." There seem to be others with this opinion.

I tried
Expand|Select|Wrap|Line Numbers
  1. wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> cv;
Which worked fine as a replacement. And, if I understand it, it assumes that the wchar_t is utf-16 [X} for the conversion, which I have been reading that Microsoft Windows seems to use internally for all Unicode. Later I read suggestions that this is also buggy.

Why the reports of buggyness? I found the answer:

After further research I found that this buggyness is reported to only apply to narrow character surrogates. Narrow. Not for wide. Not for wide characters which I am using, example wchar or for wchar_t.

I am considering both codecvt_utf8 and codecvt_utf8_utf16 safe to use.

The use of either of these did not cause my CLI to show the correct Unicode characters. So, I researched and found, "Windows 95/98/Me: ExtTextOutW is supported by the Microsoft Layer for Unicode." That meant, to me, that something should work. For CLI, I do not know yet. I found on an Oracle page [X] "James Kass's Code2001 font" which I downloaded from a link on here to Code2000 font by James Kass (CODE2001.TTF). I unzipped it and placed it into the Windows/fonts directory, where it was automatically installed by XP. Now, a test page shows up correctly in my Firefox 43.0b9 browser.
Nov 28 '20 #28
SwissProgrammer
213 128KB
How do I completely delete this one message?
Dec 1 '20 #29
SwissProgrammer
213 128KB
"Use the report button (the triangle icon at the top of each post) and detail the reason why you think the post should be deleted and a moderator will look into the matter for you."

I do not see any triangle icon at the top or bottom of my post.

If that should mean the "report abuse" button or icon below the message, then would that go as a record of the poster being reported as abusive?

I would like to be able to make a reply, then after posting the reply, if I find that it was wrong, to be able to (as the poster) delete or remove the reply. I tried to edit it down to zero text, but I get an error message.

OK, I will try that.

It says, "Note: This is ONLY to be used to report spam, advertising messages, and problematic (harassment, fighting, or rude) posts." I can not un-post?
Dec 1 '20 #31
SioSio
264 256MB
"the triangle icon at the top of each post"
Certainly, I can't find it.
There are cases where I want to delete post, such as when I have posted wrong information. I also want to know how to delete post.
Dec 1 '20 #32
SwissProgrammer
213 128KB
I have fought with this .to_bytes for a week to convert it from working with the CLI to working with my GUI.


I think that I should replace in

Expand|Select|Wrap|Line Numbers
  1.                                     case IN_put_text_box:
  2.                                         {
  3.                                             const HWND in_box = GetDlgItem(hWnd, IN_put_text_box);
  4.                                             const int n = GetWindowTextLength( in_box  );
  5.                                             if( n > 0 )
  6.                                                 {
  7.                                                     wchar_t text[n+1]; // +1 for terminator
  8.                                                     GetWindowText( in_box, text, n+1 );
  9.                                                     string utf8 = utf8_encode(wstring(text));
  10.                                                     // Force calling of ASCII/UTF8 version
  11.                                                     SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  12.  
  13.                                                 }
  14.                                             else    //if( n = 0 )
  15.                                                 {
  16.                                                     wchar_t text[n+1]; // +1 for terminator
  17.                                                     GetWindowText( in_box, text, n+1 );
  18.                                                     string utf8 = utf8_encode(wstring(text));
  19.                                                     // Force calling of ASCII/UTF8 version
  20.                                                     SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  21.                                                 }
  22.                                             break;
  23.  
of where I was so far:

Expand|Select|Wrap|Line Numbers
  1.  
  2.     #define _UNICODE
  3.     #define UNICODE
  4.  
  5.     #include <windows.h>
  6.  
  7.     #include <iostream>
  8.     #include <sstream>      // for std::wstringstream
  9.     #include <iomanip>      // for std::setw
  10.     #include <string>
  11.  
  12.     //using namespace std;  // do not need all of this namespace for this small program.
  13.  
  14.     // Shortened version of namespace std;
  15.         using std::string;
  16.         using std::wstring;
  17.         using std::wstringstream;
  18.         using std::hex;
  19.         using std::setw;
  20.         using std::setfill;
  21.  
  22.     #define MAX_LOADSTRING 100
  23.  
  24.     HWND Handle_Main_Window = NULL;
  25.  
  26.     #include <windows.h>
  27.  
  28.     wchar_t g_szClassName[] = L"myWindowClass";
  29.  
  30.     const int PUSH_BUTTON_1     = 1;
  31.     const int IN_put_text_box   = 2;
  32.     const int SAVE_button       = 3;
  33.     const int OUT_put_text_box  = 4;
  34.  
  35.  
  36.     // Previous declarations
  37.         void create_controls( const HWND hwnd );
  38.         LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam);
  39.         string utf8_encode(const wstring &wstr);
  40.         wstring utf8_byte_values(const string &str);
  41.  
  42.  
  43.     // Convert a wide Unicode string to an UTF8 string
  44.     string utf8_encode(const wstring &wstr)
  45.         {
  46.             if (wstr.empty())
  47.                 {
  48.                     return string();
  49.                 }
  50.  
  51.             int size_needed = WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), nullptr, 0, nullptr, nullptr);
  52.  
  53.             char buffer[size_needed+1];
  54.             WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), buffer, size_needed+1, nullptr, nullptr);
  55.  
  56.             string strTo( buffer );
  57.             return strTo;
  58.         }
  59.  
  60.  
  61.  
  62.     wstring utf8_byte_values(const string &str)
  63.         {
  64.             if (str.empty())
  65.                 {
  66.                     return wstring();
  67.                 }
  68.  
  69.             bool first = true;
  70.             wstringstream out;
  71.  
  72.             for(auto iter = str.begin(); iter != str.end(); ++iter)
  73.                 {
  74.                     if (first)
  75.                         {
  76.                             first = false;
  77.                         }
  78.                     else
  79.                         {
  80.                             out << L" ";
  81.                         }
  82.  
  83.                     unsigned int value = ((unsigned)*iter) & 0xFF;
  84.                     out << L"0x" << hex << setw(2) << setfill(L'0') << value;
  85.                 }
  86.  
  87.             return out.str();
  88.         }
  89.  
  90.  
  91.  
  92.     int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
  93.         LPSTR lpCmdLine, int nCmdShow)
  94.         {
  95.             WNDCLASSEX wc;
  96.             MSG Msg;
  97.  
  98.             wc.cbSize        = sizeof(WNDCLASSEX);
  99.             wc.style         = 0;
  100.             wc.lpfnWndProc   = WndProc;
  101.             wc.cbClsExtra    = 0;
  102.             wc.cbWndExtra    = 0;
  103.             wc.hInstance     = hInstance;
  104.             wc.hIcon         = LoadIcon(nullptr, IDI_APPLICATION);
  105.             wc.hCursor       = LoadCursor(nullptr, IDC_ARROW);
  106.             wc.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
  107.             wc.lpszMenuName  = nullptr;
  108.             wc.lpszClassName = g_szClassName;
  109.             wc.hIconSm       = LoadIcon(nullptr, IDI_APPLICATION);
  110.  
  111.             if(!RegisterClassEx(&wc))
  112.                 {
  113.                     MessageBox(nullptr, L"Window Registration Failed!", L"Error!", MB_ICONEXCLAMATION | MB_OK);
  114.                     return 0;
  115.                 }
  116.  
  117.             Handle_Main_Window = CreateWindowEx(
  118.                 WS_EX_CLIENTEDGE,
  119.                 g_szClassName,
  120.                 L"Title",
  121.                 WS_OVERLAPPEDWINDOW,
  122.                 CW_USEDEFAULT,
  123.                 CW_USEDEFAULT,
  124.                 630,
  125.                 470,
  126.                 nullptr,
  127.                 nullptr,
  128.                 hInstance,
  129.                 nullptr);
  130.  
  131.             if(Handle_Main_Window == NULL)
  132.                 {
  133.                     MessageBox(nullptr, L"Window Creation Failed!", L"Error!", MB_ICONEXCLAMATION | MB_OK);
  134.                     return 0;
  135.                 }
  136.  
  137.             ShowWindow(Handle_Main_Window, nCmdShow);
  138.             UpdateWindow(Handle_Main_Window);
  139.  
  140.             while(GetMessage(&Msg, nullptr, 0, 0) > 0)
  141.                 {
  142.                     TranslateMessage(&Msg);
  143.                     DispatchMessage(&Msg);
  144.                 }
  145.             return Msg.wParam;
  146.         }
  147.  
  148.  
  149.     LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)
  150.         {
  151.             switch (msg)
  152.                 {
  153.  
  154.                     case WM_CREATE:
  155.                         {
  156.                             create_controls( hWnd );
  157.                             break;
  158.                         }
  159.  
  160.                     case WM_COMMAND:
  161.                         {
  162.                             switch(LOWORD(wParam))
  163.                                 {
  164.  
  165.                                     case PUSH_BUTTON_1:
  166.                                         {
  167.                                             MessageBox( hWnd, L"PUSH BUTTON 1 was clicked", L"message from PUSH BUTTON 1", MB_SETFOREGROUND );
  168.                                             break;
  169.                                         }
  170.  
  171.                                     case IN_put_text_box:
  172.                                         {
  173.                                             const HWND in_box = GetDlgItem(hWnd, IN_put_text_box);
  174.                                             const int n = GetWindowTextLength( in_box  );
  175.                                             if( n > 0 )
  176.                                                 {
  177.                                                     wchar_t text[n+1]; // +1 for terminator
  178.                                                     GetWindowText( in_box, text, n+1 );
  179.                                                     string utf8 = utf8_encode(wstring(text));
  180.                                                     // Force calling of ASCII/UTF8 version
  181.                                                     SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  182.  
  183.                                                 }
  184.                                             else    //if( n = 0 )
  185.                                                 {
  186.                                                     wchar_t text[n+1]; // +1 for terminator
  187.                                                     GetWindowText( in_box, text, n+1 );
  188.                                                     string utf8 = utf8_encode(wstring(text));
  189.                                                     // Force calling of ASCII/UTF8 version
  190.                                                     SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  191.                                                 }
  192.                                             break;
  193.                                         }
  194.  
  195.                                     case SAVE_button:  //BTN_SAVE:
  196.                                         {
  197.                                             const HWND in_box = GetDlgItem(hWnd, IN_put_text_box);
  198.                                             const int n = GetWindowTextLength( in_box  );
  199.                                             if( n > 0 )
  200.                                                 {
  201.                                                     wchar_t text[n+1]; // +1 for terminator
  202.                                                     GetWindowText( in_box, text, n+1 );
  203.                                                     string utf8 = utf8_encode(wstring(text));
  204.                                                     // Force calling of ASCII/UTF8 version
  205.                                                     SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  206.                                                 }
  207.                                             else    //if( n = 0 )
  208.                                                 {
  209.                                                     wchar_t text[n+1]; // +1 for terminator
  210.                                                     GetWindowText( in_box, text, n+1 );
  211.                                                     string utf8 = utf8_encode(wstring(text));
  212.                                                     // Force calling of ASCII/UTF8 version
  213.                                                     SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  214.                                                 }
  215.                                             break;
  216.                                         }
  217.  
  218.                                     default:
  219.                                         {
  220.                                         }
  221.                             }
  222.  
  223.                             break;
  224.                         }
  225.  
  226.                     case WM_CLOSE:
  227.                         {
  228.                             DestroyWindow(hWnd);
  229.                             break;
  230.                         }
  231.  
  232.                     case WM_DESTROY:
  233.                         {
  234.                             PostQuitMessage(0);
  235.                         }
  236.  
  237.                     default:
  238.                         {
  239.                             return DefWindowProc(hWnd, msg, wParam, lParam);
  240.                         }
  241.                 }
  242.  
  243.             return FALSE;
  244.         }
  245.  
  246.     void create_controls( const HWND hwnd )
  247.         {
  248.  
  249.             CreateWindowW
  250.                 (
  251.                     L"BUTTON",
  252.                     L"PUSH BUTTON 1",
  253.                     WS_VISIBLE | WS_CHILD | WS_BORDER,
  254.                     10,10,
  255.                     130,20,
  256.                     hwnd,
  257.                     (HMENU) PUSH_BUTTON_1,
  258.                     GetModuleHandle( nullptr ),
  259.                     nullptr
  260.                 );
  261.  
  262.             CreateWindowW
  263.                 (
  264.                     L"EDIT",
  265.                     L"办   办",
  266.                     WS_VISIBLE | WS_CHILD | WS_BORDER,
  267.                     10,50,
  268.                     200,25,
  269.                     hwnd,
  270.                     (HMENU) IN_put_text_box,
  271.                     GetModuleHandle( nullptr ),
  272.                     nullptr
  273.                 );
  274.  
  275.             CreateWindowW
  276.                 (
  277.                     L"BUTTON",
  278.                     L"SAVE BUTTON",
  279.                     WS_VISIBLE | WS_CHILD | WS_BORDER,
  280.                     10,80,
  281.                     110,20,
  282.                     hwnd,
  283.                     (HMENU) SAVE_button,
  284.                     GetModuleHandle( nullptr ),
  285.                     nullptr
  286.                 );
  287.  
  288.             CreateWindowW
  289.                 (
  290.                     L"EDIT",
  291.                     L"OUTPUT TEXT WINDOW",
  292.                     WS_VISIBLE | WS_CHILD | WS_BORDER,
  293.                     10,130,
  294.                     600,300,
  295.                     hwnd,
  296.                     (HMENU) OUT_put_text_box,
  297.                     GetModuleHandle( nullptr ),
  298.                     nullptr
  299.                 );
  300.         }
  301.  
  302.  
  303.  
  304.  

with this

Expand|Select|Wrap|Line Numbers
  1.                                     case IN_put_text_box:
  2.                                         {
  3.                                             const HWND in_box = GetDlgItem(hWnd, IN_put_text_box);
  4.                                             const int n = GetWindowTextLength( in_box  );
  5.                                             if( n > 0 )
  6.                                                 {
  7.                                                     wchar_t text[n+1]; // +1 for terminator
  8.                                                     GetWindowText( in_box, text, n+1 );
  9.                                                     string utf8 = utf8_encode(wstring(text.to_bytes));   // CHANGED HERE
  10.                                                     // Force calling of ASCII/UTF8 version
  11.                                                     SetDlgItemText(hWnd, OUT_put_text_box, (utf8).c_str());   // CHANGED HERE
  12.  
  13.                                                 }
  14.                                             else    //if( n = 0 )
  15.                                                 {
  16.                                                     wchar_t text[n+1]; // +1 for terminator
  17.                                                     GetWindowText( in_box, text, n+1 );
  18.                                                     string utf8 = utf8_encode(wstring(text));
  19.                                                     // Force calling of ASCII/UTF8 version
  20.                                                     SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  21.                                                 }
  22.                                             break;
  23.                                         }
  24.  
I have tried to vary it. It is not working.

What is the logic of how to make this work?

I have been studying .to_bytes and most of what I get is for CLI (which I worked through). I am having difficulty with understanding how to make it work in GUI.

I get that .to_bytes gets each entire character, one at a time. Then what? Then why for that what.

Thank you if you show me how to make it work with GUI, but please tell me why it works that way. Other online explanations are not helping me enough.

Thank you.
Dec 3 '20 #33
SwissProgrammer
213 128KB
SioSio,

I tried using .to-bytes, but I am not certain that it is cross-platform (Windows, Mac, Unix) usable. Is it?

Thank you.

You said,
"the number of continuous bytes can be known from the bit pattern of the first byte"

I might want to reference how I found more of the following and what I found later, and other people might want to read this years later to understand, so I will explain:

I tried the following with a Command Line Interface via Code::Blocks 17.12 on Windows XP Professional (service pack 2) 32-bit:
0xxx
110x
1110
1111
I tested 0xxx and 110x and 1110. I found them to work for me in Plane 0 Unicode.

In testing, by myself of more than 10 Unicode characters from Plane 1, I found that 1111 works.

I found this on a Microsoft page [X]:
All the following bytes start with the mark "10" and the xxx's denote the binary representation of the encoding within the given range.

Unicode Range
UTF-8 Encoded Bytes
0x0000-0x007F
0 xxxxxxx
0x0080-0x07FF
110 xxxxx 10xxxxxx
0x0800-0xFFFF
1110 xxxx 10xxxxxx 10xxxxxx
0x10000-0x1FFFFF
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Similar here: [X], [X], [X], [X]

Since Plane 16 Unicode (currently the highest) goes up to 10FFFF, and the last code point of 11110xxx is the same 10FFFF therefore I think that you have helped me greatly with this.

Thank you SioSio.

Thank you everyone else that helped me to understand what Unicode is doing and what C and C++ are doing as it relates to Unicode.

Please remember that I am still a beginner at C++ and I am still struggling with understanding a lot, but I am not allowing that to stop me.

Here is where I am with this:

Expand|Select|Wrap|Line Numbers
  1.  
  2. #define _UNICODE
  3. #define UNICODE
  4.  
  5. #include <windows.h>
  6. #include <commctrl.h>
  7.  
  8. #include <iostream>
  9. #include <bitset>
  10. using namespace std;
  11.  
  12. string TextToBinaryString(string words) {
  13.     string tempString = "";
  14.     string splitIntoBytes = "Spliting [" + words + "] into Bytes\n";
  15.     string binaryString = "";
  16.     string stringByte;
  17.  
  18.     string combined_1;
  19.     string combined_3;
  20.     string combined_4;
  21.  
  22.     std::string Number1;
  23.     std::string Number2;
  24.     std::string Number3;
  25.     std::string Number4;
  26.  
  27.     int i=1;
  28.     string numToString = "numToString is Not Working";
  29.  
  30.     for (char& _char : words) {
  31.  
  32.         // From Plane 0 Unicode Check for 0xxx or 110x or 1110
  33.         // From plane 1 Unicode Check for 1111                  Reference: http://www.i18nguy.com/unicode-plane1-utf8.html
  34.  
  35.         numToString = to_string (i);;
  36.         stringByte = bitset<8>(_char).to_string();
  37.  
  38.         i = i +1;
  39.  
  40.         binaryString = binaryString + "\n\n" + numToString + " Testing       " +  stringByte + "\n";
  41.  
  42.         binaryString =binaryString + "     _char====" + _char + "\n";
  43.  
  44.         Number1 = stringByte[0];
  45.         Number2 = stringByte[1];
  46.         Number3 = stringByte[2];
  47.         Number4 = stringByte[3];
  48.  
  49.         // 0
  50.         combined_1 = Number1;
  51.  
  52.         // 110 , 111
  53.         combined_3 = Number1 + Number2 + Number3;
  54.  
  55.         // 1111 with verification of not being 1110
  56.         combined_4 = Number1 + Number2 + Number3 + Number4;
  57.  
  58.  
  59.         binaryString = binaryString + "     Number1=[" + Number1 + "]\n";
  60.  
  61.         binaryString = binaryString + "     Number2=[" + Number2 + "]\n";
  62.  
  63.         binaryString = binaryString + "     Number3=[" + Number3 + "]\n";
  64.  
  65.         binaryString = binaryString + "     Number4=[" + Number4 + "]\n\n";
  66.  
  67.  
  68.         binaryString = binaryString + "     combined_1=[" + combined_1 + "]\n";
  69.  
  70.         binaryString = binaryString + "     combined_3=[" + combined_3 + "]\n";
  71.  
  72.         binaryString = binaryString + "     combined_4=[" + combined_4 + "]\n\n";
  73.  
  74.  
  75.         if (combined_1 == "0")  // 0
  76.             {
  77.                 tempString = "\n      [" + _char ;
  78.                 tempString = tempString  + "]=<Combined_1 PLANE 0>" + stringByte;
  79.                 // I get the following error if I do not split up the previous two lines
  80.                 // error: invalid operands of types 'const char*' and 'const char [23]' to binary 'operator+'|
  81.                 // Please answer in separate question here https://bytes.com/topic/c/answers/974655-invalid-operands-const-char-binary-operator Thank you.
  82.  
  83.  
  84.                 binaryString = binaryString += tempString;
  85.                 splitIntoBytes = splitIntoBytes + "\n Unicode Plane 0 [" + _char + "]=" + stringByte;
  86.             }
  87.         else
  88.             {
  89.  
  90.                 if (combined_3 == "110")
  91.                     {                                tempString = "\n      [" + _char ;
  92.                         tempString = tempString  + "]=<Combined_3 PLANE 0>" + stringByte;
  93.  
  94.                         binaryString = binaryString += tempString;
  95.                         splitIntoBytes = splitIntoBytes + "\n Unicode Plane 0 [" + _char + "]=" + stringByte;
  96.                     }
  97.                 else
  98.                     if (combined_4 == "1110")   // Verifying that there is a zero "0" past the 111.
  99.                                                 // Therefore it is still in Plane 0.
  100.                         {
  101.                             tempString = "\n      [" + _char ;
  102.                             tempString = tempString  + "]=<Combined_4 PLANE 0>" + stringByte;
  103.  
  104.                             binaryString = binaryString += tempString;
  105.                             splitIntoBytes = splitIntoBytes + "\n Unicode Plane 0 [" + _char + "]=" + stringByte;
  106.                         }
  107.                     else if (combined_4 == "1111")  // Verifying that there is a one "1" past the 111.
  108.                                                     // Therefore it is now in Plane 1.
  109.                         {
  110.                             tempString = "\n      [" + _char ;
  111.                             tempString = tempString  + "]=<Combined_4 PLANE 0>" + stringByte;
  112.  
  113.                             binaryString = binaryString += tempString;
  114.                             splitIntoBytes = splitIntoBytes + "\n Unicode Plane 1 [" + _char + "]=" + stringByte;
  115.                         }
  116.                     else
  117.                         {
  118.                             tempString = "\n      [" + _char ;
  119.                             tempString = tempString  + "]=<EXTENDED>" + stringByte;
  120.  
  121.                             binaryString = binaryString += tempString;
  122.                             splitIntoBytes = splitIntoBytes + " " + _char + "=" + stringByte;
  123.                         }
  124.  
  125.             }
  126.         binaryString = binaryString + "\n";
  127.  
  128.     }
  129.  
  130.     // For Line by Line testing:
  131. //    binaryString = splitIntoBytes + "\n\n-Line by line testing is next:\n\n" + binaryString + "\n\n";
  132.  
  133.     // For quick results:
  134.     binaryString = splitIntoBytes + "\n\n";
  135.  
  136.     return binaryString;
  137. }
  138.  
  139. int main()
  140. {
  141. //    string testText = "123办456   办"; // From Plane 0 Unicode
  142.     string testText = "a𐑒𐑦𐑙𐑛𐑳𐑥b" ; // From Plane 1 Unicode
  143.  
  144.     // I went to http://www.i18nguy.com/unicode-example-plane1.html and copied some of their characters
  145.     // which are Plane 1 Unicode and then pasted them into the code here
  146.     // and got a bunch of boxes
  147.     // string testText = "a[][][][][][]b";
  148.     //
  149.     // And, when I compiled it sometimes I got warnings that the line did not do anything,
  150.     // but I ran it and IT WORKED !
  151.     //
  152.     // Maybe, even though my laptop might not have the correct fonts to show the higher plane in Unicode characters,
  153.     // maybe C++11 via CODE::BLOCKS 17.12 accepted it and compiled it and ran it correctly.
  154.     //
  155.     // Wow. Thank you bytes.com
  156.  
  157.  
  158.     cout << "Convert [" << testText << "] to Binary\n\n";
  159.     cout << TextToBinaryString(testText) << "\n";
  160.  
  161.     return 0;
  162. }
  163.  
  164.  
Dec 12 '20 #34

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

reply views Thread by Wrathchild | last post: by
reply views Thread by =?Utf-8?B?cm9kY2hhcg==?= | last post: by
2 posts views Thread by =?Utf-8?B?Qm9iIFdhaXRl?= | last post: by
3 posts views Thread by justinrob | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.