converting from windows wchar_t to linux wchar_t

yakir22

Hello experts,
I am dealing now in porting our server from windows to linux. our
client is running only on windows machine.
to avoid the wchar_t size problem ( in windows its 2 bytes and linux
is 4 bytes ) we defined

#ifdef WIN32
#define t_wchar_t wchar_t
#else // LINUX
#define t_wchar_t short
#endif

on the server I get a buffer that contains windows t_wchar_t string.
something like

struct user_data
{
t_wchar_t name[32];
.....
.....
};

all the data transfer is working great as long as the server don't
care what's in the string
my problem start when I want to print out some logs on the server
using the content of the buffer.

my Q is : is there a simple way to convert a 2 bytes wchar_t (windows
version ) to 4 bytes wchar_t ( linux version ).

Thanks

Aug 14 '08 #1

Subscribe Post Reply

16924

Victor Bazarov

ya*****@gmail.com wrote:

Hello experts,
I am dealing now in porting our server from windows to linux. our
client is running only on windows machine.
to avoid the wchar_t size problem ( in windows its 2 bytes and linux
is 4 bytes ) we defined

#ifdef WIN32
#define t_wchar_t wchar_t
#else // LINUX
#define t_wchar_t short
#endif

You might be better off with a typedef, although it's not a very
significant difference. Also, for some reason I seem to remember that
wchar_t is an unsigned type. Since 'char' is often signed (though
different from 'singed char', of course), perhaps I remember incorrectly...

>
on the server I get a buffer that contains windows t_wchar_t string.
something like

struct user_data
{
t_wchar_t name[32];
.....
.....
};

all the data transfer is working great as long as the server don't
care what's in the string
my problem start when I want to print out some logs on the server
using the content of the buffer.

What kind of "problem"?

my Q is : is there a simple way to convert a 2 bytes wchar_t (windows
version ) to 4 bytes wchar_t ( linux version ).

What's the problem? Can't you just copy (and it will expand the sign)?
If you have a buffer

wchar_t localname[32];

and you want to "convert"

t_wchar_t name[32];

to it, just use std::copy

std::copy(name, name + 32, localname);

Every element will be assigned.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

Aug 14 '08 #2

Rolf Magnus

ya*****@gmail.com wrote:

my Q is : is there a simple way to convert a 2 bytes wchar_t (windows
version ) to 4 bytes wchar_t ( linux version ).

I suggest to use something like libiconv(http://en.wikipedia.org/wiki/Iconv)
to convert to a common character set on both sides.

Aug 14 '08 #3

James Kanze

On Aug 14, 5:30 pm, Victor Bazarov <v.Abaza...@comAcast.netwrote:

yaki...@gmail.com wrote:
Hello experts,
I am dealing now in porting our server from windows to linux. our
client is running only on windows machine.
to avoid the wchar_t size problem ( in windows its 2 bytes and linux
is 4 bytes ) we defined

#ifdef WIN32
#define t_wchar_t wchar_t
#else // LINUX
#define t_wchar_t short
#endif

You might be better off with a typedef, although it's not a
very significant difference.

I would be if the second were unsigned short. Something like
"t_wchar_t( something )" would be legal if it were a typedef,
not if it were a #define.

Also, for some reason I seem to remember that wchar_t is an
unsigned type. Since 'char' is often signed (though different
from 'singed char', of course), perhaps I remember
incorrectly...

Both are very implementation defined. In practice, you
generally shouldn't be using wchar_t in portable code:-(.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Aug 14 '08 #4

Chris Becke

my Q is : is there a simple way to convert a 2 bytes wchar_t (windows

version ) to 4 bytes wchar_t ( linux version ).

wchar_t is a particularly useless type : Because its implementation defined it doesn't have (in protable code) any kind of assurance of what type of character encoding it may be using or capable of using.

The next point is that *unicode* characters are unsigned. so use an unsigned short for your UCS-2 / UTF-16 representation. http://en.wikipedia.org/wiki/UTF-16 has loads more information.

Finally, conversion for simple UCS-2 to UTF-32 is simple... Simply pad out the data by doing a direct characterwise copy:

typedef ucs2char unsigned short;
typedef utf32char unsigned long;

void convert_ucs2_2_utf32(ucs2char const* src; utf32char* dest)
{
do {
*dest++ = *src;
} while(*src++);
}

If you want to properly convert characters outside the basic multilingual plane, and the B.M.P covers all displayable characters from all modern languages that are in use :- european and eastern - then you need to be aware of surrogate pairs: Unicode codepoints in the range U+D800-U+DFFF are not assigned to valid characters, this range is used by UTF-16 to encode pairs of UTF-16 character each of which encodes 10 bits of the final codepoint.

So, something like this will do the translation of UTF-16 to UTF-32

typedef utf16char unsigned short;
void convert_utf16_to_utf32(ucs2char const* src; utf32char* dest)
{
do {
if(*src & 0xD800 == 0xD800) {
*dest++ = (*src++ & 0x07ff) << 10 + (*src & 0x7ff) + 0x10000;
} else
*dest++ = *src;
} while(*src++);
}

Aug 15 '08 #5

James Kanze

On Aug 15, 9:23 am, "Chris Becke" <chris.be...@gmail.comwrote:

my Q is : is there a simple way to convert a 2 bytes wchar_t
(windows version ) to 4 bytes wchar_t ( linux version ).

wchar_t is a particularly useless type : Because its
implementation defined it doesn't have (in protable code) any
kind of assurance of what type of character encoding it may be
using or capable of using.

That's partially true of char as well; in addition, the
character encoding can depend on the source of the data. But at
least, char is guaranteed to be at least 8 bits, so you know
that it can hold all useful external encodings. (For better or
for worse, the external world is 8 bits, and any attempt to do
otherwise is bound to fail in the long run.)

The next point is that *unicode* characters are unsigned.

I'm not sure what that's supposed to mean. ALL character
encodings I've ever seen use only non-negative values: ASCII
doesn't define any negative encodings, nor do any of the ISO
8859 encodings. The fact that char can be (and often is) a
signed 8 bit value causes no end of problems because of this.
The character value isn't really signed or unsigned: it's just a
value (that happens never to be negative).

What is true is that the Unicode encoding formats UTF-16 and
UTF-8 require values in the range of 0-0xFFFF and 0-0xFF,
respectively, and that if you're short is 16 bits or your char 8
(both relatively frequent cases), those values won't fit in the
corresponding signed types. (For historical reasons, we still
manage to make do putting UTF-8, and other 8 bit encodings, in
an 8 bit signed char. It's a hack, and it's not, at least in
theory, guaranteed to work, but in practice, it's often the
least bad choice available.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Aug 15 '08 #6

by: Keith MacDonald | last post by:

Hello, Is there a portable (at least for VC.Net and g++) method to convert text between wchar_t and char, using the standard library? I may have missed something obvious, but the section on...

C / C++

converting wide strings to long

by: Exits Funnel | last post by:

Hello, I've inherited a bunch of code which was written on windows and makes frequent calls to _wtol( ) which converts a 2 byte char array to a long integer. I'm pretty sure it is a Microsoft...

C / C++

converting application to UNICODE

by: Sonu | last post by:

Hello everyone and thanks in advance. I have a multilingual application which has been built in MFC VC++ 6.0 (non-Unicode). It support English German Hungarian so far, which has been fine. But...

.NET Framework

converting string:array to tchar** , wchar_t**

by: diDE | last post by:

I want to convert a managed string array f.e. array<string^>^ Texts; // Elements 0: "ABC", 1: "HJO" to a TCHAR** or wchar_t** any ideas?

.NET Framework

wchar_t

by: James Brown | last post by:

could someone please tell me when the wchar_t type was introduced into the C language (and with what version).....perhaps it was introduced as an extension by alot of compiler venders before it...

C / C++

Converting from UTF-16 to UTF-32

by: Jimmy Shaw | last post by:

Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32,...

C / C++

main(int argc, wchar_t *argv[]) on linux i386 (redhat)

by: interec | last post by:

Hi Folks, I am writing a c++ program on redhat linux using main(int argc, wchar_t *argv). $LANG on console is set to "en_US.UTF-8". g++ compiler version is 3.4.6. Q1. what is the encoding of...

C / C++

C# WEB:Trouble converting videos on virtual directories using Windows Media Encoder 9

by: clinnebur | last post by:

We have an ASP.NET web application (C#) that copies videos from a CCTV truck to a Linux server. What I am trying to do is convert the .AVI videos(which is how they are created on the truck) to .WMV...

.NET Framework

std::wstringbuf and imbue to convert from utf-8 to wchar_t?

by: =?ISO-8859-2?Q?Boris_Du=B9ek?= | last post by:

Hi, I have an API that returns UTF-8 encoded strings. I have a utf8 codevt facet available to do the conversion from UTF-8 to wchar_t encoding defined by the platform. I have no trouble...

C / C++

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

converting from windows wchar_t to linux wchar_t

Similar topics