Ok, here is a sample program. $LANG is set to "en_US.UTF-8" on my
console. When I invoke this program from command line as:
$ ./a.out "आ१२३४५६"
argument argv[1] gets set to an array of multibyte characters in UTF-8
format. Correct? Then as a test, I convert argv[1] to a wstring as
follows:
wstring sSql = make_wide(argv[1]);
and finally I print it out back to the console as:
wcout << sSql << endl;
The problem is that nothing is getting printed out on the console. I
tried using locale, but that didn't help either. Could someone please
explain whats going on and how I can print out wstring here. Full
sample code is given below. My compiler is g++ 3.4.6. OS is red hat
linux 4ES.
If I invoke the same program using english characters, everything
works as expected:
$ ./a.out "select"
select
The problem occurs only when I enter multi-byte characters as command
line args.
------------------------------------------------ SAMPLE PROGRAM
------------------------------------------------
#include <iostream>
#include <iomanip>
#include <locale>
using namespace std;
string make_narrow(const wstring sArg);
wstring make_wide(const string sArg);
string make_narrow(const wstring sArg)
{
std::string result;
std::locale loc;
for(unsigned int i= 0; i < sArg.size(); ++i)
{
result += std::use_facet<std::ctype<wchar_t>
Quote:
>(loc).narrow(sArg[i], 0);
}
return result;
}
wstring make_wide(const string sArg)
{
std::wstring result;
std::locale loc;
for(unsigned int i= 0; i < sArg.size(); ++i)
{
result += std::use_facet<std::ctype<wchar_t>
Quote:
>(loc).widen(sArg[i]);
}
return result;
}
/**
* main is supposed to take command line argument string argv[1] as
* input and print it out back on console in std::wstring form.
*
* $LANG is set to "en_US.UTF-8" on console.
*
* This function works with argv[1] contains only english characters.
* Function does not work with argv[1] contains a string containing
* multibyte characters.
*
*/
int main(int argc, char *argv[])
{
locale aLoc("en_US.UTF-8");
locale::global(aLoc);
wcout.imbue(aLoc);
wstring sSql = make_wide(argv[1]);
wcout << sSql << endl;
}
----------------------------------------------------------------------------------------------------------------
On Mar 23, 6:46 pm, SasQ <s...@go2.plwrote:
Quote:
Dnia Fri, 23 Mar 2007 14:54:52 -0700, interec napisa³(a):
>
Quote:
I am writing a c++ program on redhat linux using
main(int argc, wchar_t *argv[]).
>
C++ Standard specifies only those two signatures for main:
>
int main();
int main(int argc, char** argv); //or char* argv[], whatever
>
Other signatures are allowed only in embedded environments.
>
Quote:
$LANG on console is set to "en_US.UTF-8".
>
Console character encoding doesn't matter.
>
Quote:
Q1. what is the encoding of data that I get in argv[] ?
>
Probably the one used for command line.
>
Quote:
Q2. what is encoding of string constants defined in
programs (for example L"--count") ?
>
The encoding of string literal constants isn't precisely
defined and in most cases it is mapped from source character
set to machine code directly, without changes.
>
Quote:
Q3. when I run the program as:
>
>
Quote:
Why does (wcscmp(argv[1], L"--count") == 0) always
evaluate to false?
>
Because L"--count" is composed from wide characters [stored
in more bytes than a sizeof(char) and of 'wchar_t' type].
It is encoded in most cases using Unicode UTF-16. So, unlike
your console, it is not UTF-8].
>
Quote:
What is the workaround. How do I make it evaluate to true?
>
Use standard signature:
>
int main(int argc, char* argv[]);
>
and interpret argv[1] as UTF-8. Characters in UTF-8 encoded
strings are plain one-byte 'char's but the international
characters are encoded as more-than-one-byte sequences.
If you program for Unix-like system, you may use iconv library
to transcode strings from/to multiple character sets.
>
>
Quote:
wstring mystring = argv[1];
>
Quote:
is mystring considered to be a UTF-8 string?
>
No. It is considered to be wide-character string.
UTF-8 is not a wide-character string, it's normal
one-byte-character string, but some characters are
encoded as multiple bytes.
>
Quote:
Q5: In case I use main(int argc, char** argv),
what is the encoding of characters in argv[] ?
>
The same as in the command line.
>
Quote:
Q6. What is the difference between
>
Quote:
main(int arc, wchar_t *argv[])
>
Quote:
and
main(int argc, char *argv[]).
>
The former is not standard.
>
Quote:
Any document that describes this?
>
ANSI/ISO/IEC Standard 14882 - The C++ Programming Language
>
--
SasQ