473,387 Members | 1,486 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

converting from windows wchar_t to linux wchar_t

Hello experts,
I am dealing now in porting our server from windows to linux. our
client is running only on windows machine.
to avoid the wchar_t size problem ( in windows its 2 bytes and linux
is 4 bytes ) we defined

#ifdef WIN32
#define t_wchar_t wchar_t
#else // LINUX
#define t_wchar_t short
#endif

on the server I get a buffer that contains windows t_wchar_t string.
something like

struct user_data
{
t_wchar_t name[32];
.....
.....
};

all the data transfer is working great as long as the server don't
care what's in the string
my problem start when I want to print out some logs on the server
using the content of the buffer.

my Q is : is there a simple way to convert a 2 bytes wchar_t (windows
version ) to 4 bytes wchar_t ( linux version ).

Thanks
Aug 14 '08 #1
5 16924
ya*****@gmail.com wrote:
Hello experts,
I am dealing now in porting our server from windows to linux. our
client is running only on windows machine.
to avoid the wchar_t size problem ( in windows its 2 bytes and linux
is 4 bytes ) we defined

#ifdef WIN32
#define t_wchar_t wchar_t
#else // LINUX
#define t_wchar_t short
#endif
You might be better off with a typedef, although it's not a very
significant difference. Also, for some reason I seem to remember that
wchar_t is an unsigned type. Since 'char' is often signed (though
different from 'singed char', of course), perhaps I remember incorrectly...
>
on the server I get a buffer that contains windows t_wchar_t string.
something like

struct user_data
{
t_wchar_t name[32];
.....
.....
};

all the data transfer is working great as long as the server don't
care what's in the string
my problem start when I want to print out some logs on the server
using the content of the buffer.
What kind of "problem"?
my Q is : is there a simple way to convert a 2 bytes wchar_t (windows
version ) to 4 bytes wchar_t ( linux version ).
What's the problem? Can't you just copy (and it will expand the sign)?
If you have a buffer

wchar_t localname[32];

and you want to "convert"

t_wchar_t name[32];

to it, just use std::copy

std::copy(name, name + 32, localname);

Every element will be assigned.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Aug 14 '08 #2
ya*****@gmail.com wrote:
my Q is : is there a simple way to convert a 2 bytes wchar_t (windows
version ) to 4 bytes wchar_t ( linux version ).
I suggest to use something like libiconv(http://en.wikipedia.org/wiki/Iconv)
to convert to a common character set on both sides.

Aug 14 '08 #3
On Aug 14, 5:30 pm, Victor Bazarov <v.Abaza...@comAcast.netwrote:
yaki...@gmail.com wrote:
Hello experts,
I am dealing now in porting our server from windows to linux. our
client is running only on windows machine.
to avoid the wchar_t size problem ( in windows its 2 bytes and linux
is 4 bytes ) we defined
#ifdef WIN32
#define t_wchar_t wchar_t
#else // LINUX
#define t_wchar_t short
#endif
You might be better off with a typedef, although it's not a
very significant difference.
I would be if the second were unsigned short. Something like
"t_wchar_t( something )" would be legal if it were a typedef,
not if it were a #define.
Also, for some reason I seem to remember that wchar_t is an
unsigned type. Since 'char' is often signed (though different
from 'singed char', of course), perhaps I remember
incorrectly...
Both are very implementation defined. In practice, you
generally shouldn't be using wchar_t in portable code:-(.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Aug 14 '08 #4
my Q is : is there a simple way to convert a 2 bytes wchar_t (windows
version ) to 4 bytes wchar_t ( linux version ).
wchar_t is a particularly useless type : Because its implementation defined it doesn't have (in protable code) any kind of assurance of what type of character encoding it may be using or capable of using.

The next point is that *unicode* characters are unsigned. so use an unsigned short for your UCS-2 / UTF-16 representation. http://en.wikipedia.org/wiki/UTF-16 has loads more information.

Finally, conversion for simple UCS-2 to UTF-32 is simple... Simply pad out the data by doing a direct characterwise copy:

typedef ucs2char unsigned short;
typedef utf32char unsigned long;

void convert_ucs2_2_utf32(ucs2char const* src; utf32char* dest)
{
do {
*dest++ = *src;
} while(*src++);
}

If you want to properly convert characters outside the basic multilingual plane, and the B.M.P covers all displayable characters from all modern languages that are in use :- european and eastern - then you need to be aware of surrogate pairs: Unicode codepoints in the range U+D800-U+DFFF are not assigned to valid characters, this range is used by UTF-16 to encode pairs of UTF-16 character each of which encodes 10 bits of the final codepoint.

So, something like this will do the translation of UTF-16 to UTF-32

typedef utf16char unsigned short;
void convert_utf16_to_utf32(ucs2char const* src; utf32char* dest)
{
do {
if(*src & 0xD800 == 0xD800) {
*dest++ = (*src++ & 0x07ff) << 10 + (*src & 0x7ff) + 0x10000;
} else
*dest++ = *src;
} while(*src++);
}

Aug 15 '08 #5
On Aug 15, 9:23 am, "Chris Becke" <chris.be...@gmail.comwrote:
my Q is : is there a simple way to convert a 2 bytes wchar_t
(windows version ) to 4 bytes wchar_t ( linux version ).
wchar_t is a particularly useless type : Because its
implementation defined it doesn't have (in protable code) any
kind of assurance of what type of character encoding it may be
using or capable of using.
That's partially true of char as well; in addition, the
character encoding can depend on the source of the data. But at
least, char is guaranteed to be at least 8 bits, so you know
that it can hold all useful external encodings. (For better or
for worse, the external world is 8 bits, and any attempt to do
otherwise is bound to fail in the long run.)
The next point is that *unicode* characters are unsigned.
I'm not sure what that's supposed to mean. ALL character
encodings I've ever seen use only non-negative values: ASCII
doesn't define any negative encodings, nor do any of the ISO
8859 encodings. The fact that char can be (and often is) a
signed 8 bit value causes no end of problems because of this.
The character value isn't really signed or unsigned: it's just a
value (that happens never to be negative).

What is true is that the Unicode encoding formats UTF-16 and
UTF-8 require values in the range of 0-0xFFFF and 0-0xFF,
respectively, and that if you're short is 16 bits or your char 8
(both relatively frequent cases), those values won't fit in the
corresponding signed types. (For historical reasons, we still
manage to make do putting UTF-8, and other 8 bit encodings, in
an 8 bit signed char. It's a hack, and it's not, at least in
theory, guaranteed to work, but in practice, it's often the
least bad choice available.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Aug 15 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

22
by: Keith MacDonald | last post by:
Hello, Is there a portable (at least for VC.Net and g++) method to convert text between wchar_t and char, using the standard library? I may have missed something obvious, but the section on...
2
by: Exits Funnel | last post by:
Hello, I've inherited a bunch of code which was written on windows and makes frequent calls to _wtol( ) which converts a 2 byte char array to a long integer. I'm pretty sure it is a Microsoft...
5
by: Sonu | last post by:
Hello everyone and thanks in advance. I have a multilingual application which has been built in MFC VC++ 6.0 (non-Unicode). It support English German Hungarian so far, which has been fine. But...
4
by: diDE | last post by:
I want to convert a managed string array f.e. array<string^>^ Texts; // Elements 0: "ABC", 1: "HJO" to a TCHAR** or wchar_t** any ideas?
39
by: James Brown | last post by:
could someone please tell me when the wchar_t type was introduced into the C language (and with what version).....perhaps it was introduced as an extension by alot of compiler venders before it...
7
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32,...
4
by: interec | last post by:
Hi Folks, I am writing a c++ program on redhat linux using main(int argc, wchar_t *argv). $LANG on console is set to "en_US.UTF-8". g++ compiler version is 3.4.6. Q1. what is the encoding of...
0
by: clinnebur | last post by:
We have an ASP.NET web application (C#) that copies videos from a CCTV truck to a Linux server. What I am trying to do is convert the .AVI videos(which is how they are created on the truck) to .WMV...
4
by: =?ISO-8859-2?Q?Boris_Du=B9ek?= | last post by:
Hi, I have an API that returns UTF-8 encoded strings. I have a utf8 codevt facet available to do the conversion from UTF-8 to wchar_t encoding defined by the platform. I have no trouble...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.