By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,395 Members | 1,128 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,395 IT Pros & Developers. It's quick & easy.

What are the Minimum requirements for all 17 planes of Unicode in C++?

SwissProgrammer
100+
P: 127
What are the Minimum requirements for all 17 planes of Unicode in C++?


In YOUR EXPERIENCE !
(Sometimes official descriptions have not been accurate. I want experienced answers.)

C++0 to C++20; Which version is the minimum that can work with ALL of the 17 planes without requiring a work-around or a third party dll?
I am not interested in Visual Studio or .net . I used these in the past and I am aware that they are powerful but I specifically do not want them now. Just C++.
In case you might not know what Unicode "planes" are, see https://en.wikipedia.org/wiki/Plane_%28Unicode%29


I am currently only using plane 0 and C++11. I want to be able to use all of the planes 0-16. I want to know the minimum requirements in C++.


Thank you.
4 Weeks Ago #1
Share this Question
Share on Google+
19 Replies

Banfa
Expert Mod 5K+
P: 8,996
I am partly replying because I'd like to know the answer if someone else replies and partly because I'm not sure you are asking the right question.

Support for Unicode is not a programming language or version matter but rather it is related to the execution environment supported character encoding.

Execution environment and source code (or build environment) character sets can be different, although they often aren't in the case of building and executing on the same platform.

So take this example program

Expand|Select|Wrap|Line Numbers
  1. int main()
  2. {
  3.     wchar_t c = '\u0444';
  4.  
  5.     cout << "cout: ф" << endl;
  6.     cout << "cout: " << u8"\u0444" << endl;
  7.     cout << "cout: " << c << endl;
  8.  
  9.     wcout << "wcout: ф" << endl << flush;
  10.     wcout << "wcout: " << u8"\u0444" << endl << flush;
  11.     wcout << "wcout: " << c << endl << flush;
  12.  
  13.     return 0;
  14. }
Compiled as C++14 and run in Power Shell I get the following output

Expand|Select|Wrap|Line Numbers
  1. cout: 
  2. cout: 
  3. cout: d184
  4. wcout: 
Because Power Shell does not understand Unicode characters; run this command in the Power Shell

Expand|Select|Wrap|Line Numbers
  1. $OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding
To tell Power Shell to use UTF8; re-run the same program without recompilation and you get this output

Expand|Select|Wrap|Line Numbers
  1. cout: ф
  2. cout: ф
  3. cout: d184
  4. wcout:
Recompile the program using C++98 and you get this output
Expand|Select|Wrap|Line Numbers
  1. cout: ф
  2. cout: ф
  3. cout: 53636
  4. wcout:
The only thing that has changed is the wchar_t variable is being displayed in decimal instead of hexidecimal.

Support for Unicode is depressingly non-standard across platforms so it is hard to write portable code using Unicode.

P.S. I have no idea why only 1 of the 3 wcout lines is producing output in all cases.
4 Weeks Ago #2

Banfa
Expert Mod 5K+
P: 8,996
Sorry that post didn't even try to answer the question, the point I was trying to make was, ignoring your problem of how to put plane 1+ Unicode characters into standard C++ code, outputting them requires a system that understands them.

Actually using them in code is complicated by standard C++ only really having support for UTF8 (and in theory that only came in with C++11) so outputting a character from plane 1+ then becomes rather painful because, for example character U+1FA0F (Black King rotate from plane 1), even if the system understands this plane and can display it, which isn't a given, in standard C++ your only option would be to use UTF8 encoding which looks something like u8"\xF0\x9f\xA8\x8F" which has to be looked up by hand and is a pain to type and because I haven't got an environment that knows how to interpret it I don't even know if it is correct.

I realise this post also doesn't answer the question, (still hoping someone else can) but at least it doesn't answer the question as opposed to not answering a different question.
4 Weeks Ago #3

SwissProgrammer
100+
P: 127
Banfa,

I did not want to lead someone with an answer, but your answer says close to what I have found.
UTF-8, in my opinion, is the most universal of the UTF options. I have not found any limit to the expandability of the UTF-8 encoding.
UTF-8, if my memory is correct, was what I was using back in Windows 2000 and (I think) in Windows NT. But, I was not programming in C++ at that time. Therefore, I thought to ask the local C++ experts here.

Before the Unicode consortium expanded their published scope past plane 0, I used Unicode a lot. Currently, as I transition into C++11, my coding time is greatly limited by my struggling through the learning curve. I was wanting someone with experience in the planes above 0 to speak to the issues encountered.

Your response, though you might not think it was so appropriate, I enjoyed.

Thank you.



One, but not the only, goal that I have with C++11 and Unicode is to be able to have a text box in which someone pastes a Unicode character or sentence and my program automatically shows in another text box the Unicode representation of that input:

Example input 办法 .

My program would show the UTF-8 encoding for that and maybe even split it apart into the two words that it contains 办 (ban) , and 法 (fa), each with their own UTF8 encoding.

But, first I wanted to know more about what I am dealing with in C++. Am I using at least the C++ version that can give me that response? Am I using the version that can give me the correct response for every plane?

So, I started with the simplest of the questions: Am I using the minimum version of C++ that can do all of this.

Later, I might have struggled to get the example to work, being confident that the C++ version was capable of doing the job.

Again, Thank you Banfa.

.
4 Weeks Ago #4

dev7060
Expert 100+
P: 333
P.S. I have no idea why only 1 of the 3 wcout lines is producing output in all cases.
Expand|Select|Wrap|Line Numbers
  1. int main() {
  2.   wchar_t c = '\u0444';
  3.   wcout << "wcout: ф" << endl << flush;
  4.   if (wcout.fail()) {
  5.     cout << "\nwide to narrow conversion didn't succeed; Unicode is not representable in the codepage";
  6.     cout << endl;
  7.     wcout << "\nThis won't get printed. Other wcouts don't have any effect at this point";
  8.     wcout.clear();
  9.   }
  10.   wcout << "wcout: " << u8"\u0444" << endl << flush;
  11.   if (wcout.fail()) {
  12.     cout << "\nattempt #2 didn't succeed";
  13.     cout << endl;
  14.     wcout << "not shown on the console";
  15.     wcout.clear();
  16.     wcout << "hello user\n";
  17.   }
  18.   wcout << "wcout: " << c << endl << flush;
  19.   wcout << "not available on the console as well ";
  20.   return 0;
  21. }
  22.  
also,
A program should not mix output operations on wcout with output operations on cout
(or with other narrow-oriented output operations on stdout): Once an output operation has
been performed on either, the standard output stream acquires an
orientation (either narrow or wide) that can only be safely changed by calling freopen on stdout.
https://www.cplusplus.com/reference/iostream/wcout/

In YOUR EXPERIENCE !
(Sometimes official descriptions have not been accurate. I want experienced answers.)
Disclaimer: As you specifically asked for an experienced answer, I'm a student and not experienced at all when it comes to professional development. The below is just how I view it with my understanding.

My understanding of Unicode is that it isn't concerned with a language, as pointed out by Banfa. Every system or environment has kind of its way of dealing with it, has its own character set, and uses workarounds to set up compatibilities with others for exchanging the data. It depends on how encoding is done; what code points are being used, how many bytes for a character, which two code points are combined to represent a new character, what byte order, endian system, etc. Mapping is implementation-dependent.

Here's a char : 🮕 (it is not showing up on my screen, just copied a random off of Wikipedia)
In JS console,
Expand|Select|Wrap|Line Numbers
  1. console.log("🮕".length) 
shows the output 2.
In PHP,
Expand|Select|Wrap|Line Numbers
  1. echo strlen("🮕")
shows the output 4.

One solution is to use a fixed-length encoding across everything like UTF-32 that uses 4 bytes per code point. the con is that it's space inefficient. Imagine a 5 bytes character array in UTF-8 taking 20 bytes in the UTF-32 representation. literally a mess on a larger scale. ASCII's representation will have many leading 0s consuming memory for no reason. Variable-length encoding like in UTF-8 or UTF-16 allocates memory bytes according to the needs and situation.

Let's say you write a program in an ide. You converted the encodings between char* and wchar_t* back and forth in between function calls. The libraries would process em (or not?) all using the implemented mappings, but the output produced on the terminal may show undefined behavior because it may be supporting the encoding and mappings of the OS. Whatever representation code is sent by our binary to show may not be available in the character map of the OS to produce a relevant output. Windows have UTF-16 implementation hence apps use the same. If you run the same code in Unix or Linux, the output may be different (in UTF-8).

For having uniformity; I guess the engine, language, os, environment, third party compilers, linkers, ide, libraries, dependencies, databases, binaries, web connections, etc. (whatever is interacting with your encoded data in between) all have to agree on a common set of rules to represent the chars; which would be a hypothetical concept (maybe). I mean, for example, to communicate over the networks, you'd need maximum compression for the fast travel of the packets, hence would choose a variable size encoding. And if you see Java, it stores data as UTF-16 internally and on the other hand, UTF-16 is not used in internet websites because it's incompatible with ASCII. Workarounds seem to be the only solution to build the bridge and for the devs; trial and error if you don't know how the encodings are being done in a system. For example, Java docs states clearly:

The Java programming language is based on the Unicode character set, and several libraries implement the Unicode standard. Unicode is an international character set standard which supports all of the major scripts of the world, as well as common technical symbols. The original Unicode specification defined characters as fixed-width 16-bit entities, but the Unicode standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF. An encoding defined by the standard, UTF-16, allows to represent all Unicode code points using one or two 16-bit units.
The primitive data type char in the Java programming language is an unsigned 16-bit integer that can represent a Unicode code point in the range U+0000 to U+FFFF, or the code units of UTF-16. The various types and classes in the Java platform that represent character sequences - char[], implementations of java.lang.CharSequence (such as the String class), and implementations of java.text.CharacterIterator - are UTF-16 sequences. Most Java source code is written in ASCII, a 7-bit character encoding, or ISO-8859-1, an 8-bit character encoding, but is translated into UTF-16 before processing.
Ref: https://www.oracle.com/technical-res...lementary.html
https://docs.oracle.com/javase/8/doc.../overview.html
I've used java references just for the demonstration of a system.

One, but not the only, goal that I have with C++11 and Unicode is to be able to have a text box in which someone pastes a Unicode character or sentence and my program automatically shows in another text box the Unicode representation of that input:
Here's my guess: Whatever gui library you're using is probably making calls to winapi behind the scenes and using the OS's layout and character set to display everything. When you paste something in the text box if the OS's encoding character set couldn't map it, you probably won't see it properly in the text box field in the first place. Even though you may be able to pass the character to the called event handlers' functions of the library and the background processing may interpret everything correctly (or not), but you have to depend on an external OS to see through the output i.e. you need an external environment to interact with your app anyway same as you need a third party compiler (mingw, gcc, etc.) to process the text and produce the binaries. That's where workarounds come into play. You need a way to make the text recognizable to be displayed properly using third part libs or your logic if you can figure out how stuff is happening behind the scenes.
4 Weeks Ago #5

SwissProgrammer
100+
P: 127
dev7060,

You pointed directly to the issues. I agree.

Maybe if I get help one tiny step at a time.

I do not yet know how to make a Unicode capable, drag-n-drop text box in C++11.

I would like to be able to input to, and to read from, that text box with both wchar_t* and TCHAR*.

Help.

4 hours later (I told you I am new at this) I have the following.

Expand|Select|Wrap|Line Numbers
  1.  
  2. #define _UNICODE
  3. #define UNICODE
  4.  
  5. #include <windows.h>
  6.  
  7. #include<iostream>
  8. #include <string>
  9. using namespace std;
  10.  
  11. #define MAX_LOADSTRING 100
  12.  
  13. HWND Handle_Main_Window = NULL;
  14.  
  15. #include <windows.h>
  16.  
  17. void create_controls( const HWND hwnd );
  18.  
  19.  
  20. LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam);
  21.  
  22. wchar_t g_szClassName[] = L"myWindowClass";
  23.  
  24. int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
  25.     LPSTR lpCmdLine, int nCmdShow)
  26. {
  27.     WNDCLASSEX wc;
  28.     MSG Msg;
  29.  
  30.     wc.cbSize        = sizeof(WNDCLASSEX);
  31.     wc.style         = 0;
  32.     wc.lpfnWndProc   = WndProc;
  33.     wc.cbClsExtra    = 0;
  34.     wc.cbWndExtra    = 0;
  35.     wc.hInstance     = hInstance;
  36.     wc.hIcon         = LoadIcon(nullptr, IDI_APPLICATION);
  37.     wc.hCursor       = LoadCursor(nullptr, IDC_ARROW);
  38.     wc.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
  39.     wc.lpszMenuName  = nullptr;
  40.     wc.lpszClassName = g_szClassName;
  41.     wc.hIconSm       = LoadIcon(nullptr, IDI_APPLICATION);
  42.  
  43.     if(!RegisterClassEx(&wc))
  44.     {
  45.         MessageBox(nullptr, L"Window Registration Failed!", L"Error!",
  46.             MB_ICONEXCLAMATION | MB_OK);
  47.         return 0;
  48.     }
  49.  
  50.     Handle_Main_Window = CreateWindowEx(
  51.         WS_EX_CLIENTEDGE,
  52.         g_szClassName,
  53.         L"Title",
  54.         WS_OVERLAPPEDWINDOW,
  55.         CW_USEDEFAULT,
  56.         CW_USEDEFAULT,
  57.         500,
  58.         500,
  59.         nullptr,
  60.         nullptr,
  61.         hInstance,
  62.         nullptr);
  63.  
  64.     if(Handle_Main_Window == NULL)
  65.     {
  66.         MessageBox(nullptr, L"Window Creation Failed!", L"Error!",
  67.             MB_ICONEXCLAMATION | MB_OK);
  68.         return 0;
  69.     }
  70.  
  71.     ShowWindow(Handle_Main_Window, nCmdShow);
  72.     UpdateWindow(Handle_Main_Window);
  73.  
  74.     while(GetMessage(&Msg, nullptr, 0, 0) > 0)
  75.     {
  76.         TranslateMessage(&Msg);
  77.         DispatchMessage(&Msg);
  78.     }
  79.     return Msg.wParam;
  80. }
  81.  
  82.  
  83. LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)
  84.     {
  85.         switch (msg)
  86.         {
  87.  
  88.             case WM_CREATE:
  89.                 create_controls( hWnd );
  90.                 break;
  91.  
  92.             case WM_COMMAND:
  93.                 switch(LOWORD(wParam)) {
  94.                 case 1:{
  95.                         ::MessageBox( hWnd, L"PUSH BUTTON 1 was clicked", L"message from PUSH BUTTON 1", MB_SETFOREGROUND );
  96.                         break;
  97.                     }
  98.  
  99.                 case 2:{
  100.                         const HWND text_box = GetDlgItem( hWnd, 3 );
  101.                         const int n = GetWindowTextLength( text_box );
  102.                         wstring text( n + 1, L'#' );
  103.                         if( n > 0 )
  104.                             {
  105.                                 GetWindowText( text_box, &text[0], text.length() );
  106.                             }
  107.                         text.resize( n );
  108.                         ::MessageBox(hWnd, text.c_str(), L"The INPUT TEXT WINDOW", MB_SETFOREGROUND );
  109.                         break;
  110.                     }
  111.  
  112.                 case 3:{
  113.                         break;
  114.                     }
  115.  
  116.                 case 4:{
  117.                         break;
  118.                     }
  119.  
  120.                 case 5:{
  121.                         const HWND text_box = GetDlgItem( hWnd, 5 );
  122.                         const int n = GetWindowTextLength( text_box );
  123.                         wstring text( n + 1, L'#' );
  124.                         if( n > 0 )
  125.                             {
  126.                                 GetWindowText( text_box, &text[0], text.length() );
  127.                             }
  128.                         text.resize( n );
  129.                         ::MessageBox(hWnd, L"SAVE BUTTON was clicked", L"message from SAVE BUTTON", MB_SETFOREGROUND );
  130.                         break;
  131.                     }
  132.  
  133.                 default:{
  134.                     }
  135.             }
  136.             break;
  137.  
  138.             case WM_CLOSE:{
  139.                     DestroyWindow(hWnd);
  140.                     break;
  141.                 }
  142.  
  143.             case WM_DESTROY:{
  144.                     PostQuitMessage(0);
  145.                 }
  146.  
  147.             default:{
  148.                     return DefWindowProc(hWnd, msg, wParam, lParam);
  149.                 }
  150.         }
  151.         return FALSE;
  152.     }
  153.  
  154. void create_controls( const HWND hwnd )
  155.     {
  156.  
  157.         CreateWindow( L"BUTTON",
  158.             L"PUSH BUTTON 1",
  159.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  160.             10,10,
  161.             130,20,
  162.             hwnd, (HMENU) 1, GetModuleHandle( nullptr ), nullptr
  163.             )  ;
  164.  
  165.         CreateWindow( L"EDIT",
  166.             L"INPUT TEXT WINDOW",
  167.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  168.             10,50,
  169.             200,25,
  170.             hwnd, (HMENU) 3, GetModuleHandle( nullptr ), nullptr
  171.             );
  172.  
  173.         CreateWindow( L"BUTTON",
  174.             L"SAVE BUTTON",
  175.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  176.             10,80,
  177.             110,20,
  178.             hwnd, (HMENU) 5, GetModuleHandle( nullptr ), nullptr
  179.             );
  180.  
  181.         CreateWindow( L"EDIT",
  182.             L"OUTPUT TEXT WINDOW",
  183.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  184.             10,130,
  185.             300,300,
  186.             hwnd, (HMENU) 4, GetModuleHandle( nullptr ), nullptr
  187.             );
  188.     }
  189.  
  190.  
  191.  
I can paste "Example input 办法 ." into the INPUT BOX, but what do I do with it next? I want to be able to click the button below that and see the UTF8 representation in the bottom box.




Help please.

Banfa said: "Support for Unicode is not a programming language or version matter but rather it is related to the execution environment supported character encoding."
I agree.
This is a start to having my program to be able to adapt to that.
I am trying to get to the final answer of my original question in this post. One step at a time.

Thank you.
Attached Images
File Type: jpg Unicode_UTF8.jpg (39.4 KB, 12 views)
4 Weeks Ago #6

100+
P: 200
I did some research.
UTF-8: In order to be compatible with ASCII characters, the same part as ASCII is encoded with 1 byte, and the other parts are encoded with 2-6 bytes. In a 4-byte sequence, up to 21 bits (0x1FFFFF) can be expressed, but those representing 17 or more planes outside the Unicode range (larger than U + 10FFFF) are not accepted.
UTF-16, UTF-32: Unlike UTF-8, it is not ASCII compatible.
Therefore, the condition that meets the requirement of # 1 is to look for a version of C++ that supports UTF-8.

Support status of UTF-8 depending on the version of C++

C++17 can process UTF-8 data as "char" data. This allows you to use std::regex, std::fstream, std::cout, etc. without loss.
In C++20, we added char8_t and std::u8string for UTF-8. However, UTF is not supported at all due to the lack of std::u8fstream. Therefore, we need a way to convert between UTF-8 and the execution character set.
4 Weeks Ago #7

100+
P: 200
I forgot to write.
In the C++11 Standard Library, UTF-8 is not supported for string and integer conversion functions, and I/O functions. Therefore, it needs to be converted to the system multibyte character code.
3 Weeks Ago #8

Banfa
Expert Mod 5K+
P: 8,996
Looks like you are using WIN32 API. Windows GUI natively uses UTF16, I believe and the WIN32 API has wide char and multibyte versions of many characters, signified by a post fix W or A.

It also has a set of helper functions Unicode and Character Set Functions and I think the one you are interested in is WideCharToMultiByte.

I know there are Googleable examples out there.
3 Weeks Ago #9

dev7060
Expert 100+
P: 333
I can paste "Example input 办法 ." into the INPUT BOX, but what do I do with it next? I want to be able to click the button below that and see the UTF8 representation in the bottom box.
Like this?
Expand|Select|Wrap|Line Numbers
  1. case 5: {
  2.   HWND InputTextBox = GetDlgItem(hWnd, 3);
  3.   const int n = GetWindowTextLength(InputTextBox);
  4.   wstring text(n + 1, L '#');
  5.   if (n > 0) {
  6.     GetWindowText(InputTextBox, & text[0], text.length());
  7.   }
  8.   const wchar_t * wcs = text.c_str();
  9.   SetDlgItemText(hWnd, 4, wcs);
  10. }
  11.  



If you're using UTF-8 chars inside the Code::Blocks,
Settings -> Editor -> Encoding -> Change it from 'default' to UTF-8

Code::Blocks is smart enough to change the encoding automatically to prevent losing data. But it would do that temporarily for every time you click on build.

Cygwin environment can be used for CLI testing. It supports UTF-8. https://www.cygwin.com/

Attached Images
File Type: png dev7060.png (12.8 KB, 306 views)
File Type: jpg dev7060_2.jpg (93.9 KB, 312 views)
3 Weeks Ago #10

Banfa
Expert Mod 5K+
P: 8,996
How's it going?

You probably need a couple of helper functions

Convert wide character to multibyte character aka UTF16 to UTF8
Expand|Select|Wrap|Line Numbers
  1. // Convert a wide Unicode string to an UTF8 string
  2. std::string utf8_encode(const std::wstring &wstr)
  3. {
  4.     if (wstr.empty())
  5.     {
  6.         return std::string();
  7.     }
  8.  
  9.     int size_needed = WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), NULL, 0, NULL, NULL);
  10.  
  11.     char buffer[size_needed+1];
  12.     WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), buffer, size_needed+1, NULL, NULL);
  13.  
  14.     std::string strTo( buffer );
  15.     return strTo;
  16. }
Get the character values of the multibyte character string
Expand|Select|Wrap|Line Numbers
  1. std::wstring utf8_byte_values(const std::string &str)
  2. {
  3.     if (str.empty())
  4.     {
  5.         return std::wstring();
  6.     }
  7.  
  8.     bool first = true;
  9.     std::wstringstream out;
  10.  
  11.     for(auto iter = str.begin(); iter != str.end(); ++iter)
  12.     {
  13.         if (first)
  14.         {
  15.             first = false;
  16.         }
  17.         else
  18.         {
  19.             out << L" ";
  20.         }
  21.  
  22.         unsigned int value = ((unsigned)*iter) & 0xFF;
  23.         out << L"0x" << std::hex << std::setw(2) << std::setfill(L'0') << value;
  24.     }
  25.  
  26.     return out.str();
  27. }
Then you Save Button code could look something like
Expand|Select|Wrap|Line Numbers
  1.        case BTN_SAVE:
  2.         {
  3.             const HWND in_box = GetDlgItem( hWnd, EDT_INPUT_TEXT );
  4.             const int n = GetWindowTextLength( in_box  );
  5.             if( n > 0 )
  6.             {
  7.                 wchar_t text[n+1]; // +1 for terminator
  8.                 GetWindowText( in_box, text, n+1 );
  9.                 string utf8 = utf8_encode(wstring(text));
  10.                 // Force calling of ASCII/UTF8 version
  11.                 SetDlgItemText( hWnd, EDT_OUTPUT_TEXT, utf8_byte_values(utf8).c_str());
  12.             }
  13. //            text.resize( n );
  14.             break;
  15.         }
Note I defined symbols for all your dialog item ids to aid readability.

More importantly note that WIN32 API is a C API and expects C style strings, that is '\0' terminated. This does not play nicely with C++ particularly C++ strings because they are not '\0' terminated which makes passing &text[0] to a WIN32 API where text is a std::string or std::wstring a very risky business. Instead, if the WIN32 API accepts a constant pointer prefer text.c_str() or if the WIN32 API function expects a non-constant pointer use a standard C array and convert to a std::(w)string later.

Of course I have been slightly naughty in my code and used variable length arrays which a C rather than a C++ feature but my GNU compiler lets me get away with that with a warning :D
3 Weeks Ago #11

SwissProgrammer
100+
P: 127
SioSio, Thank you.

I feel like I should parse the input into ASCII and non-ASCII first.

Then, I should parse the non-ASCII incoming text and characters and test each as to how well they work in UTF-8 first, then in UTF-16 (to see if it is larger than U + 10FFFF). Compare the results. Thus at least finding out if I am receiving input that is in plane 0 or plane 1+.

Then respond into the second text box with the resultant U.



Separately:
You said, "In the C++11 Standard Library, UTF-8 is not supported for string and integer conversion functions, and I/O functions. Therefore, it needs to be converted to the system multibyte character code."
In my CODE::BLOCKS 17.12 Settings/Editor/General settings/Encoding settings I have been using UTF-8 with the following choices chosen:
/ "As default encoding (bypassing C::B's auto-detection)"
/ "If conversion fails using the settings above, try system local settings".

But, I am concerned about system local settings on a user's computer that is different from my tested systems. Maybe then I should just catch any errors of such and deal with that separately.

I think that this is correct. What do you think? How would you handle this?

Thank you.
3 Weeks Ago #12

SwissProgrammer
100+
P: 127
Banfa, Thank you.

When I started learning C++11 I used WideCharToMultiByte and MultiByteToWideChar.

They seemed to work. But I read, maybe 2 or 3 places, that these should be avoided. I should have asked here at that time, but I did not. Since I see you using them, I shall use them with more confidence.


You used:
Expand|Select|Wrap|Line Numbers
  1.         std::wstringstream out;
  2.  
For that I got
error: aggregate 'std::wstringstream out' has incomplete type and cannot be defined

For future readers:
I added
Expand|Select|Wrap|Line Numbers
  1. #include <sstream>
which fixed that.
You used:
Expand|Select|Wrap|Line Numbers
  1.              out << L"0x" << std::hex << std::setw(2) << std::setfill(L'0') << value;
For that I got
error: 'setw' is not a member of 'std'

For future readers:
I added
Expand|Select|Wrap|Line Numbers
  1. #include <iomanip>
which fixes that.

You used:
Expand|Select|Wrap|Line Numbers
  1.                 const HWND in_box = GetDlgItem( hWnd, EDT_INPUT_TEXT );
which I changed to:
Expand|Select|Wrap|Line Numbers
  1. const HWND in_box = GetDlgItem(hWnd, 3);
I like the EDT_INPUT_TEXT but I am not certain how to get my CreateWindow to use that. So, I used 3 instead.

You used:
Expand|Select|Wrap|Line Numbers
  1.                     SetDlgItemText( hWnd, EDT_OUTPUT_TEXT, utf8_byte_values(utf8).c_str());
which I changed to
Expand|Select|Wrap|Line Numbers
  1.                     SetDlgItemText(hWnd, 4, utf8_byte_values(utf8).c_str());
Again, I like the way that you did it, but I am having difficulty getting your line to work.

I am not certain what the
Expand|Select|Wrap|Line Numbers
  1. //            text.resize( n );
is. But thank you.


It works. Thank you. I am getting closer to being able to test on different platforms in different versions of C++.

For 办
I get 0xe5 0x8a 0x9e

Getting closer.

Lots of times I have wanted to see an update of the progress of code changes that other people were working on.

For future readers here is what currently works for me:
Expand|Select|Wrap|Line Numbers
  1. #define _UNICODE
  2. #define UNICODE
  3.  
  4. #include <windows.h>
  5.  
  6. #include <iostream>
  7. #include <sstream>      // for std::wstringstream
  8. #include <iomanip>      // for std::setw
  9. #include <string>
  10. using namespace std;
  11.  
  12. #define MAX_LOADSTRING 100
  13.  
  14. HWND Handle_Main_Window = NULL;
  15.  
  16. #include <windows.h>
  17.  
  18. void create_controls( const HWND hwnd );
  19.  
  20.  
  21. LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam);
  22.  
  23. wchar_t g_szClassName[] = L"myWindowClass";
  24.  
  25. // Previous declarations
  26.     std::string utf8_encode(const std::wstring &wstr);
  27.     std::wstring utf8_byte_values(const std::string &str);
  28.  
  29.  
  30.     // Convert a wide Unicode string to an UTF8 string
  31.     std::string utf8_encode(const std::wstring &wstr)
  32.     {
  33.         if (wstr.empty())
  34.         {
  35.             return std::string();
  36.         }
  37.  
  38.         int size_needed = WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), nullptr, 0, nullptr, nullptr);
  39.  
  40.         char buffer[size_needed+1];
  41.         WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), buffer, size_needed+1, nullptr, nullptr);
  42.  
  43.         std::string strTo( buffer );
  44.         return strTo;
  45.     }
  46.  
  47.  
  48.  
  49.     std::wstring utf8_byte_values(const std::string &str)
  50.     {
  51.         if (str.empty())
  52.         {
  53.             return std::wstring();
  54.         }
  55.  
  56.         bool first = true;
  57.         std::wstringstream out;
  58.         // error: aggregate 'std::wstringstream out' has incomplete type and cannot be defined
  59.  
  60.         // I found this in <iosfwd>
  61.         // Class for @c wchar_t mixed input and output memory streams.
  62.         //   typedef basic_stringstream<wchar_t>     wstringstream;
  63.         // Is that something from Visual Studio or maybe a later version of Code:Blocks?
  64.  
  65.         for(auto iter = str.begin(); iter != str.end(); ++iter)
  66.         {
  67.             if (first)
  68.             {
  69.                 first = false;
  70.             }
  71.             else
  72.             {
  73.                 out << L" ";
  74.             }
  75.  
  76.             unsigned int value = ((unsigned)*iter) & 0xFF;
  77.             out << L"0x" << std::hex << std::setw(2) << std::setfill(L'0') << value;
  78.         }
  79.  
  80.         return out.str();
  81.     }
  82.  
  83.  
  84.  
  85. int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
  86.     LPSTR lpCmdLine, int nCmdShow)
  87. {
  88.     WNDCLASSEX wc;
  89.     MSG Msg;
  90.  
  91.     wc.cbSize        = sizeof(WNDCLASSEX);
  92.     wc.style         = 0;
  93.     wc.lpfnWndProc   = WndProc;
  94.     wc.cbClsExtra    = 0;
  95.     wc.cbWndExtra    = 0;
  96.     wc.hInstance     = hInstance;
  97.     wc.hIcon         = LoadIcon(nullptr, IDI_APPLICATION);
  98.     wc.hCursor       = LoadCursor(nullptr, IDC_ARROW);
  99.     wc.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
  100.     wc.lpszMenuName  = nullptr;
  101.     wc.lpszClassName = g_szClassName;
  102.     wc.hIconSm       = LoadIcon(nullptr, IDI_APPLICATION);
  103.  
  104.     if(!RegisterClassEx(&wc))
  105.     {
  106.         MessageBox(nullptr, L"Window Registration Failed!", L"Error!",
  107.             MB_ICONEXCLAMATION | MB_OK);
  108.         return 0;
  109.     }
  110.  
  111.     Handle_Main_Window = CreateWindowEx(
  112.         WS_EX_CLIENTEDGE,
  113.         g_szClassName,
  114.         L"Title",
  115.         WS_OVERLAPPEDWINDOW,
  116.         CW_USEDEFAULT,
  117.         CW_USEDEFAULT,
  118.         500,
  119.         500,
  120.         nullptr,
  121.         nullptr,
  122.         hInstance,
  123.         nullptr);
  124.  
  125.     if(Handle_Main_Window == NULL)
  126.     {
  127.         MessageBox(nullptr, L"Window Creation Failed!", L"Error!",
  128.             MB_ICONEXCLAMATION | MB_OK);
  129.         return 0;
  130.     }
  131.  
  132.     ShowWindow(Handle_Main_Window, nCmdShow);
  133.     UpdateWindow(Handle_Main_Window);
  134.  
  135.     while(GetMessage(&Msg, nullptr, 0, 0) > 0)
  136.     {
  137.         TranslateMessage(&Msg);
  138.         DispatchMessage(&Msg);
  139.     }
  140.     return Msg.wParam;
  141. }
  142.  
  143.  
  144. LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)
  145.     {
  146.         switch (msg)
  147.         {
  148.  
  149.             case WM_CREATE:
  150.                 create_controls( hWnd );
  151.                 break;
  152.  
  153.             case WM_COMMAND:
  154.                 switch(LOWORD(wParam)) {
  155.                 case 1:{
  156.                         ::MessageBox( hWnd, L"PUSH BUTTON 1 was clicked", L"message from PUSH BUTTON 1", MB_SETFOREGROUND );
  157.                         break;
  158.                     }
  159.  
  160.                 case 2:{
  161.                         const HWND text_box = GetDlgItem( hWnd, 3 );
  162.                         const int n = GetWindowTextLength( text_box );
  163.                         wstring text( n + 1, L'#' );
  164.                         if( n > 0 )
  165.                             {
  166.                                 GetWindowText( text_box, &text[0], text.length() );
  167.                             }
  168.                         text.resize( n );
  169.                         ::MessageBox(hWnd, text.c_str(), L"The INPUT TEXT WINDOW", MB_SETFOREGROUND );
  170.                         break;
  171.                     }
  172.  
  173.                 case 3:{
  174.                         break;
  175.                     }
  176.  
  177.                 case 4:{
  178.                         break;
  179.                     }
  180.  
  181.                case 5:  //BTN_SAVE:
  182.                 {
  183. //                    const HWND in_box = GetDlgItem( hWnd, EDT_INPUT_TEXT );
  184.                     const HWND in_box = GetDlgItem(hWnd, 3);
  185.                     const int n = GetWindowTextLength( in_box  );
  186.                     if( n > 0 )
  187.                     {
  188.                         wchar_t text[n+1]; // +1 for terminator
  189.                         GetWindowText( in_box, text, n+1 );
  190.                         string utf8 = utf8_encode(wstring(text));
  191.                         // Force calling of ASCII/UTF8 version
  192. //                        SetDlgItemText( hWnd, EDT_OUTPUT_TEXT, utf8_byte_values(utf8).c_str());
  193.                         SetDlgItemText(hWnd, 4, utf8_byte_values(utf8).c_str());
  194.  
  195.                     }
  196.         //            text.resize( n );
  197.                     break;
  198.                 }
  199.  
  200.                 default:{
  201.                     }
  202.             }
  203.             break;
  204.  
  205.             case WM_CLOSE:{
  206.                     DestroyWindow(hWnd);
  207.                     break;
  208.                 }
  209.  
  210.             case WM_DESTROY:{
  211.                     PostQuitMessage(0);
  212.                 }
  213.  
  214.             default:{
  215.                     return DefWindowProc(hWnd, msg, wParam, lParam);
  216.                 }
  217.         }
  218.         return FALSE;
  219.     }
  220.  
  221. void create_controls( const HWND hwnd )
  222.     {
  223.  
  224.         CreateWindow( L"BUTTON",
  225.             L"PUSH BUTTON 1",
  226.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  227.             10,10,
  228.             130,20,
  229.             hwnd, (HMENU) 1, GetModuleHandle( nullptr ), nullptr
  230.             )  ;
  231.  
  232.         CreateWindow( L"EDIT",
  233.             L"办",
  234.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  235.             10,50,
  236.             200,25,
  237.             hwnd, (HMENU) 3, GetModuleHandle( nullptr ), nullptr
  238.             );
  239.  
  240.         CreateWindow( L"BUTTON",
  241.             L"SAVE BUTTON",
  242.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  243.             10,80,
  244.             110,20,
  245.             hwnd, (HMENU) 5, GetModuleHandle( nullptr ), nullptr
  246.             );
  247.  
  248.         CreateWindow( L"EDIT",
  249.             L"OUTPUT TEXT WINDOW",
  250.             WS_VISIBLE | WS_CHILD | WS_BORDER,
  251.             10,130,
  252.             300,300,
  253.             hwnd, (HMENU) 4, GetModuleHandle( nullptr ), nullptr
  254.             );
  255.     }
  256.  
  257.  

Thank you.
3 Weeks Ago #13

Banfa
Expert Mod 5K+
P: 8,996
You used:

Expand|Select|Wrap|Line Numbers
  1. **SetDlgItemText(*hWnd,*EDT_OUTPUT_TEXT,*utf8_byte_values(utf8).c_str());
which I changed to

Expand|Select|Wrap|Line Numbers
  1. **SetDlgItemText(hWnd,*4,*utf8_byte_values(utf8).c_str());
Again, I like the way that you did it, but I am having difficulty getting your line to work.
Expand|Select|Wrap|Line Numbers
  1. #define EDT_OUTPUT_TEXT 4
At the top of the file.

If you have to use a number more than once it is a magic number. Magic numbers are very poor practice and you remove them by assigning them to a symbol, actually in C++ a const variable should be preferred to create this type of constant.

Expand|Select|Wrap|Line Numbers
  1. const int EDT_OUTPUT_TEXT = 4;
But this is WIN32 which I used with C so a reverted to #define.
3 Weeks Ago #14

100+
P: 200
Tip 1.
An example of determining whether a character string contains non-alphanumeric symbols.
Expand|Select|Wrap|Line Numbers
  1. #include <iostream>
  2. #include <regex>
  3.  
  4. /**
  5.  * @brief Determine if it is an alphanumeric symbol.
  6.  *
  7.  * @return true:only alphanumeric / false:Contains non-alphanumeric symbols
  8.  */
  9. bool IsAlphabetNumericSymbol(std::string src)
  10. {
  11.     std::regex pattern("^[a-zA-Z0-9!-/:-@\[-`{-~]+$");
  12.     std::smatch sm;
  13.     if (std::regex_match(src, sm, pattern))
  14.     {
  15.         return true;
  16.     }
  17.     else
  18.     {
  19.         return false;
  20.     }
  21. }
  22.  
  23. int main()
  24. {
  25.     // Only alphanumeric case
  26.     std::cout << IsAlphabetNumericSymbol("abc012@") << std::endl;
  27.  
  28.     // Contains non-alphanumeric symbols case
  29.     std::cout << IsAlphabetNumericSymbol("1漢字A") << std::endl;
  30.     return 0;
  31. }
Tip 2.
"123漢字ABC" shown in UTF-16 is 16 bytes.

Tip 3.
Mutual conversion UTF-8 <=> UTF-16
Expand|Select|Wrap|Line Numbers
  1. inline std::wstring convertUtf8ToUtf16(char const* iString)
  2. {
  3.     std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> converter;
  4.     return converter.from_bytes(iString);
  5. }
  6.  
  7. inline std::string convertUtf16ToUtf8(wchar_t const* iString)
  8. {
  9.     std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> converter;
  10.     return converter.to_bytes(iString);
  11. }
Referenced URL.
https://docs.microsoft.com/en-us/arc...and-win32-apis

I hope you find this information helpful.
3 Weeks Ago #15

SwissProgrammer
100+
P: 127
I used
Expand|Select|Wrap|Line Numbers
  1.                          ::MessageBox(hWnd, L"SAVE BUTTON was clicked", L"message from SAVE BUTTON", MB_SETFOREGROUND );
  2.  
I did not need the scope resolution operator :: before the Messagebox. I think it is sometimes used in Visual Studio. I have cleaned that out.
Expand|Select|Wrap|Line Numbers
  1.                          MessageBox(hWnd, L"SAVE BUTTON was clicked", L"message from SAVE BUTTON", MB_SETFOREGROUND );
  2.  


I have been reading that
Expand|Select|Wrap|Line Numbers
  1. using namespace std;
adds a huge amount of code to a program. I am trying to avoid that by using
Expand|Select|Wrap|Line Numbers
  1.     using std::string;
  2.     using std::wstring;
  3.     using std::wstringstream;
  4.     using std::hex;
  5.     using std::setw;
  6.     using std::setfill;
I am open to comments on that.



I worked on those "magic numbers" and I think that I fixed them.
Expand|Select|Wrap|Line Numbers
  1. const int PUSH_BUTTON_1     = 1;
  2. const int IN_put_text_box   = 2;
  3. const int SAVE_button       = 3;
  4. const int OUT_put_text_box  = 4;
That took a lot longer that I had expected.



For
办 办
I get
0xe5 0x8a 0x9e 0x20 0x20 0x20 0xe5 0x8a 0x9e
I know that
is
0xe5 0x8a 0x9e
and the blanks are
0x20
but how do I separate it out automatically?
A simple response like
0xe5 0x8a 0x9e, 0x20, 0x20, 0x20, 0xe5 0x8a 0x9e
would tell me at least where the separation logic is. Then I could go forward and work with each. Someone please?

Here what I have so far
Expand|Select|Wrap|Line Numbers
  1. #define _UNICODE
  2. #define UNICODE
  3.  
  4. #include <windows.h>
  5.  
  6. #include <iostream>
  7. #include <sstream>      // for std::wstringstream
  8. #include <iomanip>      // for std::setw
  9. #include <string>
  10.  
  11. //using namespace std;  // do not need all of this namespace for this small program.
  12.  
  13. // Shortened version of namespace std;
  14.     using std::string;
  15.     using std::wstring;
  16.     using std::wstringstream;
  17.     using std::hex;
  18.     using std::setw;
  19.     using std::setfill;
  20.  
  21. #define MAX_LOADSTRING 100
  22.  
  23. HWND Handle_Main_Window = NULL;
  24.  
  25. #include <windows.h>
  26.  
  27. wchar_t g_szClassName[] = L"myWindowClass";
  28.  
  29. const int PUSH_BUTTON_1     = 1;
  30. const int IN_put_text_box   = 2;
  31. const int SAVE_button       = 3;
  32. const int OUT_put_text_box  = 4;
  33.  
  34.  
  35. // Previous declarations
  36.     void create_controls( const HWND hwnd );
  37.     LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam);
  38.     string utf8_encode(const wstring &wstr);
  39.     wstring utf8_byte_values(const string &str);
  40.  
  41.  
  42. // Convert a wide Unicode string to an UTF8 string
  43. string utf8_encode(const wstring &wstr)
  44.     {
  45.         if (wstr.empty())
  46.             {
  47.                 return string();
  48.             }
  49.  
  50.         int size_needed = WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), nullptr, 0, nullptr, nullptr);
  51.  
  52.         char buffer[size_needed+1];
  53.         WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), buffer, size_needed+1, nullptr, nullptr);
  54.  
  55.         string strTo( buffer );
  56.         return strTo;
  57.     }
  58.  
  59.  
  60.  
  61. wstring utf8_byte_values(const string &str)
  62.     {
  63.         if (str.empty())
  64.             {
  65.                 return wstring();
  66.             }
  67.  
  68.         bool first = true;
  69.         wstringstream out;
  70.  
  71.         for(auto iter = str.begin(); iter != str.end(); ++iter)
  72.             {
  73.                 if (first)
  74.                     {
  75.                         first = false;
  76.                     }
  77.                 else
  78.                     {
  79.                         out << L" ";
  80.                     }
  81.  
  82.                 unsigned int value = ((unsigned)*iter) & 0xFF;
  83.                 out << L"0x" << hex << setw(2) << setfill(L'0') << value;
  84.             }
  85.  
  86.         return out.str();
  87.     }
  88.  
  89.  
  90.  
  91. int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
  92.     LPSTR lpCmdLine, int nCmdShow)
  93.     {
  94.         WNDCLASSEX wc;
  95.         MSG Msg;
  96.  
  97.         wc.cbSize        = sizeof(WNDCLASSEX);
  98.         wc.style         = 0;
  99.         wc.lpfnWndProc   = WndProc;
  100.         wc.cbClsExtra    = 0;
  101.         wc.cbWndExtra    = 0;
  102.         wc.hInstance     = hInstance;
  103.         wc.hIcon         = LoadIcon(nullptr, IDI_APPLICATION);
  104.         wc.hCursor       = LoadCursor(nullptr, IDC_ARROW);
  105.         wc.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
  106.         wc.lpszMenuName  = nullptr;
  107.         wc.lpszClassName = g_szClassName;
  108.         wc.hIconSm       = LoadIcon(nullptr, IDI_APPLICATION);
  109.  
  110.         if(!RegisterClassEx(&wc))
  111.             {
  112.                 MessageBox(nullptr, L"Window Registration Failed!", L"Error!", MB_ICONEXCLAMATION | MB_OK);
  113.                 return 0;
  114.             }
  115.  
  116.         Handle_Main_Window = CreateWindowEx(
  117.             WS_EX_CLIENTEDGE,
  118.             g_szClassName,
  119.             L"Title",
  120.             WS_OVERLAPPEDWINDOW,
  121.             CW_USEDEFAULT,
  122.             CW_USEDEFAULT,
  123.             630,
  124.             470,
  125.             nullptr,
  126.             nullptr,
  127.             hInstance,
  128.             nullptr);
  129.  
  130.         if(Handle_Main_Window == NULL)
  131.             {
  132.                 MessageBox(nullptr, L"Window Creation Failed!", L"Error!", MB_ICONEXCLAMATION | MB_OK);
  133.                 return 0;
  134.             }
  135.  
  136.         ShowWindow(Handle_Main_Window, nCmdShow);
  137.         UpdateWindow(Handle_Main_Window);
  138.  
  139.         while(GetMessage(&Msg, nullptr, 0, 0) > 0)
  140.             {
  141.                 TranslateMessage(&Msg);
  142.                 DispatchMessage(&Msg);
  143.             }
  144.         return Msg.wParam;
  145.     }
  146.  
  147.  
  148. LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)
  149.     {
  150.         switch (msg)
  151.             {
  152.  
  153.                 case WM_CREATE:
  154.                     {
  155.                         create_controls( hWnd );
  156.                         break;
  157.                     }
  158.  
  159.                 case WM_COMMAND:
  160.                     {
  161.                         switch(LOWORD(wParam))
  162.                             {
  163.  
  164.                                 case PUSH_BUTTON_1:
  165.                                     {
  166.                                         MessageBox( hWnd, L"PUSH BUTTON 1 was clicked", L"message from PUSH BUTTON 1", MB_SETFOREGROUND );
  167.                                         break;
  168.                                     }
  169.  
  170.                                 case IN_put_text_box:
  171.                                     {
  172.                                         const HWND in_box = GetDlgItem(hWnd, IN_put_text_box);
  173.                                         const int n = GetWindowTextLength( in_box  );
  174.                                         if( n > 0 )
  175.                                             {
  176.                                                 wchar_t text[n+1]; // +1 for terminator
  177.                                                 GetWindowText( in_box, text, n+1 );
  178.                                                 string utf8 = utf8_encode(wstring(text));
  179.                                                 // Force calling of ASCII/UTF8 version
  180.                                                 SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  181.  
  182.                                             }
  183.                                         else    //if( n = 0 )
  184.                                             {
  185.                                                 wchar_t text[n+1]; // +1 for terminator
  186.                                                 GetWindowText( in_box, text, n+1 );
  187.                                                 string utf8 = utf8_encode(wstring(text));
  188.                                                 // Force calling of ASCII/UTF8 version
  189.                                                 SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  190.                                             }
  191.                                         break;
  192.                                     }
  193.  
  194.                                 case SAVE_button:  //BTN_SAVE:
  195.                                     {
  196.                                         const HWND in_box = GetDlgItem(hWnd, IN_put_text_box);
  197.                                         const int n = GetWindowTextLength( in_box  );
  198.                                         if( n > 0 )
  199.                                             {
  200.                                                 wchar_t text[n+1]; // +1 for terminator
  201.                                                 GetWindowText( in_box, text, n+1 );
  202.                                                 string utf8 = utf8_encode(wstring(text));
  203.                                                 // Force calling of ASCII/UTF8 version
  204.                                                 SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  205.                                             }
  206.                                         else    //if( n = 0 )
  207.                                             {
  208.                                                 wchar_t text[n+1]; // +1 for terminator
  209.                                                 GetWindowText( in_box, text, n+1 );
  210.                                                 string utf8 = utf8_encode(wstring(text));
  211.                                                 // Force calling of ASCII/UTF8 version
  212.                                                 SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  213.                                             }
  214.                                         break;
  215.                                     }
  216.  
  217.                                 default:
  218.                                     {
  219.                                     }
  220.                         }
  221.  
  222.                         break;
  223.                     }
  224.  
  225.                 case WM_CLOSE:
  226.                     {
  227.                         DestroyWindow(hWnd);
  228.                         break;
  229.                     }
  230.  
  231.                 case WM_DESTROY:
  232.                     {
  233.                         PostQuitMessage(0);
  234.                     }
  235.  
  236.                 default:
  237.                     {
  238.                         return DefWindowProc(hWnd, msg, wParam, lParam);
  239.                     }
  240.             }
  241.  
  242.         return FALSE;
  243.     }
  244.  
  245. void create_controls( const HWND hwnd )
  246.     {
  247.  
  248.         CreateWindowW
  249.             (
  250.                 L"BUTTON",
  251.                 L"PUSH BUTTON 1",
  252.                 WS_VISIBLE | WS_CHILD | WS_BORDER,
  253.                 10,10,
  254.                 130,20,
  255.                 hwnd,
  256.                 (HMENU) PUSH_BUTTON_1,
  257.                 GetModuleHandle( nullptr ),
  258.                 nullptr
  259.             );
  260.  
  261.         CreateWindowW
  262.             (
  263.                 L"EDIT",
  264.                 L"办   办",
  265.                 WS_VISIBLE | WS_CHILD | WS_BORDER,
  266.                 10,50,
  267.                 200,25,
  268.                 hwnd,
  269.                 (HMENU) IN_put_text_box,
  270.                 GetModuleHandle( nullptr ),
  271.                 nullptr
  272.             );
  273.  
  274.         CreateWindowW
  275.             (
  276.                 L"BUTTON",
  277.                 L"SAVE BUTTON",
  278.                 WS_VISIBLE | WS_CHILD | WS_BORDER,
  279.                 10,80,
  280.                 110,20,
  281.                 hwnd,
  282.                 (HMENU) SAVE_button,
  283.                 GetModuleHandle( nullptr ),
  284.                 nullptr
  285.             );
  286.  
  287.         CreateWindowW
  288.             (
  289.                 L"EDIT",
  290.                 L"OUTPUT TEXT WINDOW",
  291.                 WS_VISIBLE | WS_CHILD | WS_BORDER,
  292.                 10,130,
  293.                 600,300,
  294.                 hwnd,
  295.                 (HMENU) OUT_put_text_box,
  296.                 GetModuleHandle( nullptr ),
  297.                 nullptr
  298.             );
  299.     }
  300.  
  301.  
Thank you.
1 Week Ago #16

Banfa
Expert Mod 5K+
P: 8,996
Mostly the scope resolution operator :: is unnecessary but it is there for the odd occasion where there is a clash between a class member name and a top level symbol name, i.e. suppose you are trying to call ::MessageBox from within a class with a member called MessageBox.

Using using <symbol> rather than using namespace <NamespaceName> is exactly what we do in production code in work for the very same reason, to reduce the number of symbols being imported into the global namespace. I'd definately recommend it although you can end up with a large section of using directives at the top of your source files.

Once you get used to using constants for magic numbers from project start it becomes easier. Make use you decide on a naming convention for these constants an stick to it, all caps with underscores is a common standard.

Maybe try treating the input string 1 character at a time discarding the space characters or try using std::string::find_first_of to locate spaces in either the input or output.
1 Week Ago #17

SwissProgrammer
100+
P: 127
Banfa,

Update: I have been looking at std::string::find_first_of, but It does not seem to work with Unicode. And, I get messages of Visual Studio problems with similar attempts. I am trying to not use VS in any way, so I am still working on this. Thank you. I might eventually get the ability to parse or split Unicode strings into each single or combined character. Thanks for now.
1 Week Ago #18

P: 1
Support for Unicode is depressingly non-standard across platforms so it is hard to write portable code using Unicode.
2 Days Ago #19

SwissProgrammer
100+
P: 127
SioSio,

You said, "In the C++11 Standard Library, UTF-8 is not supported for string and integer conversion functions, and I/O functions. Therefore, it needs to be converted to the system multibyte character code."

I have read that std::wstring is better for parsing than multibyte. I have also read that multibyte is more useful. Which direction should I study to be able to parse the following into each individual characters?

For now I am just trying to do the test with Unicode plane 0.


If someone pastes or places into my text box the following:
123漢字ABC
I want the second text box to show the following:
Full Sentence in UTF-8
\x31\x32\x33\xe6\xbc\xa2\xe5\xad\x97\x41\x42\x43


Individual Single Characters in UTF-8

1 = \x31
2 = \x32
3 = \x33
漢 = \xe6\xbc\xa2
字 = \xe5\xad\x97
A = \x41
B = \x42
C = \x43

All that in a single text box showing all those lines.

Should I work at doing this with multibyte or with std::wstring?

I almost got it to work a few times, but I am not certain what I did.

I thought to split the input sentence into individual characters in the following area, but I lost it. Maybe later I can show you what I did if I get it close again.

Expand|Select|Wrap|Line Numbers
  1.                                 case SAVE_button:  //BTN_SAVE:
  2.                                     {
  3.                                         const HWND in_box = GetDlgItem(hWnd, IN_put_text_box);
  4.                                         const int n = GetWindowTextLength( in_box  );
  5.                                         if( n > 0 )
  6.                                             {
  7.                                                 wchar_t text[n+1]; // +1 for terminator
  8.                                                 GetWindowText( in_box, text, n+1 );
  9.                                                 string utf8 = utf8_encode(wstring(text));
  10.                                                 // Force calling of ASCII/UTF8 version
  11.                                                 SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  12.                                             }
  13.                                         else    //if( n = 0 )
  14.                                             {
  15.                                                 wchar_t text[n+1]; // +1 for terminator
  16.                                                 GetWindowText( in_box, text, n+1 );
  17.                                                 string utf8 = utf8_encode(wstring(text));
  18.                                                 // Force calling of ASCII/UTF8 version
  19.                                                 SetDlgItemText(hWnd, OUT_put_text_box, utf8_byte_values(utf8).c_str());
  20.                                             }
  21.                                         break;
  22.                                     }

Or should I use UTF-16 or UTF-32?

I also found, "The better way is to use std::u16string (std::basic_string<char16_t>) and std::u32string (std::basic_string<char32_t>). They'll work regardless of system and encoding of the source file" [here].

I have been struggling with this for a while, and there is lots of advice for and against lots of stuff, most of which I have not gotten to work except temporarily and now I know that something did work, but I do not remember what did.

Thank you.
10 Hours Ago #20

Post your reply

Sign in to post your reply or Sign up for a free account.