Using wchar_t instead of char

Michael Brennan

I guess this question only applies to programming applications for UNIX,
Windows and similiar. If one develops something for an embedded system
I can understand that wchar_t would be unnecessary.

I wonder if there is any point in using char over wchar_t? I don't see
much code using wchar_t when reading other people's code (but then I
haven't really looked much) or when following this newsgroup. To me it
sounds reasonable to make sure your program can handle multibyte
characters so that it can be used at as many places as possible.
Is there any reason I should not use wchar_t for all my future programs?

I am aware that on UNIX at least, if you use UTF-8, char works pretty
well. But if you use wchar_t you don't need to rely on UTF-8 and thus
makes it more portable, correct?

(I of course do not mean just the type wchar_t, but all of the things
in wide character land)

Thanks

--
Michael Brennan

Jul 8 '08 #1

Subscribe Reply

4369

CBFalconer

Michael Brennan wrote:

>
I guess this question only applies to programming applications for
UNIX, Windows and similiar. If one develops something for an
embedded system I can understand that wchar_t would be unnecessary.

I wonder if there is any point in using char over wchar_t? I don't
see much code using wchar_t when reading other people's code (but
then I haven't really looked much) or when following this newsgroup.
To me it sounds reasonable to make sure your program can handle
multibyte characters so that it can be used at as many places as
possible. Is there any reason I should not use wchar_t for all my
future programs?

I am aware that on UNIX at least, if you use UTF-8, char works
pretty well. But if you use wchar_t you don't need to rely on UTF-8
and thus makes it more portable, correct?

I believe that wchar etc. are only available in C99. Using them
may seriously reduce your code portability.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home .att.net>
Try the download section.

Jul 8 '08 #2

viza

On Tue, 08 Jul 2008 21:12:54 +0000, Michael Brennan wrote:

I wonder if there is any point in using char over wchar_t? I don't see
much code using wchar_t when reading other people's code (but then I
haven't really looked much) or when following this newsgroup. To me it
sounds reasonable to make sure your program can handle multibyte
characters so that it can be used at as many places as possible. Is
there any reason I should not use wchar_t for all my future programs?

I am aware that on UNIX at least, if you use UTF-8, char works pretty
well. But if you use wchar_t you don't need to rely on UTF-8 and thus
makes it more portable, correct?

wchar_t is 32 bits on my system. That's a lot of space to use when I
only need 7. Also, there aren't many well distributed apps using
wchar_t, just for one example: editors.

More fundamentally all sorts of I/O is done specifically in 8 bit bytes.
IP is 8 bit based, as are files under Linux and most other operating
systems. The problem is that it is very difficult to do a partial
changeover. Every application would spend half of its time and code
converting back and forth, and then what do you do when it doesn't go?
How long in wchar_t is a seven byte file? One, perhaps, but then you
have to add a whole load of error handling code to every part of the
program that interfaces with the char based world.

In C, memory is always dealt with in sizeof(char) units. Life might be
made easier for the C programmer in a UTF16/24/32 world by increasing
CHAR_BIT, but you still have the problems when you interface with the
rest of the world.

Jul 8 '08 #3

Ben Bacarisse

CBFalconer <cb********@yah oo.comwrites:

Michael Brennan wrote:
>>
I guess this question only applies to programming applications for
UNIX, Windows and similiar. If one develops something for an
embedded system I can understand that wchar_t would be unnecessary.

I wonder if there is any point in using char over wchar_t? I don't
see much code using wchar_t when reading other people's code (but
then I haven't really looked much) or when following this newsgroup.
To me it sounds reasonable to make sure your program can handle
multibyte characters so that it can be used at as many places as
possible. Is there any reason I should not use wchar_t for all my
future programs?

I am aware that on UNIX at least, if you use UTF-8, char works
pretty well. But if you use wchar_t you don't need to rely on UTF-8
and thus makes it more portable, correct?

I believe that wchar etc. are only available in C99. Using them
may seriously reduce your code portability.

I don't have a real copy of ISO C90 (ANSI C 89) so I am winging it a
bit, but I am pretty sure that wchar_t was in there. C95 added some
more related things (all of which ended up in C99) but using wchar_t
should be very portable indeed[1]. Do you have a reference to C90
without wchar_t? All I can site is online versions of the ANSI
standard as a .txt file and the C90 rationale at:

http://www.lysator.liu.se/c/rat/title.html

As soon as anyone with a copy to hand tells me otherwise, I will
withdraw, but then again maybe someone will back me up.

--
Ben.

Jul 9 '08 #4

Ben Bacarisse

Michael Brennan <br************ @gmail.comwrite s:

I guess this question only applies to programming applications for UNIX,
Windows and similiar. If one develops something for an embedded system
I can understand that wchar_t would be unnecessary.

I'd be very surprised if this were true, but I do not know much about
embedded systems. My audio player seems to support all sorts of
characters.

I wonder if there is any point in using char over wchar_t? I don't see
much code using wchar_t when reading other people's code (but then I
haven't really looked much) or when following this newsgroup. To me it
sounds reasonable to make sure your program can handle multibyte
characters so that it can be used at as many places as possible.
Is there any reason I should not use wchar_t for all my future
programs?

It is not a simple "use one or the other".

I am aware that on UNIX at least, if you use UTF-8, char works pretty
well.

Yes, but a truly portable program won't assume UTF-8. Even if you can
assume it, converting to wide characters helps when you are doing lots
of character counting operations. For example, finding the longest
match of a pattern is complex if you keep everything in a multi-byte
encoding like UTF-8.

But if you use wchar_t you don't need to rely on UTF-8 and thus
makes it more portable, correct?

It is one of the components you need. Another is to use C's locale
support. How portable you can be depends on what systems you are
targeting since not all of the features of C99's wide character
support are available on all compiler/library combinations. In fact,
the maximally portable set of things you can do with a wchar_t (or and
array of them) is very small. Here I hope an expert steps in a gives
you real experience-based wisdom about portable use of wide-character
support.

(I of course do not mean just the type wchar_t, but all of the things
in wide character land)

--
Ben.

Jul 9 '08 #5

CBFalconer

Ben Bacarisse wrote:

CBFalconer <cb********@yah oo.comwrites:
>Michael Brennan wrote:
>>>
I guess this question only applies to programming applications for
UNIX, Windows and similiar. If one develops something for an
embedded system I can understand that wchar_t would be unnecessary.

I wonder if there is any point in using char over wchar_t? I don't
see much code using wchar_t when reading other people's code (but
then I haven't really looked much) or when following this newsgroup.
To me it sounds reasonable to make sure your program can handle
multibyte characters so that it can be used at as many places as
possible. Is there any reason I should not use wchar_t for all my
future programs?

I am aware that on UNIX at least, if you use UTF-8, char works
pretty well. But if you use wchar_t you don't need to rely on UTF-8
and thus makes it more portable, correct?

I believe that wchar etc. are only available in C99. Using them
may seriously reduce your code portability.

I don't have a real copy of ISO C90 (ANSI C 89) so I am winging it a
bit, but I am pretty sure that wchar_t was in there. C95 added some
more related things (all of which ended up in C99) but using wchar_t
should be very portable indeed[1]. Do you have a reference to C90
without wchar_t? All I can site is online versions of the ANSI
standard as a .txt file and the C90 rationale at:

I am basing it on this excerpt from the C99 standard (N869):

[#5] This edition replaces the previous edition, ISO/IEC
9899:1990, as amended and corrected by ISO/IEC
9899/COR1:1994, ISO/IEC 9899/COR2:1995, and ISO/IEC
9899/AMD1:1995. Major changes from the previous edition
include:

-- restricted character set support in <iso646.h>
(originally specified in AMD1)

-- wide-character library support in <wchar.h and
<wctype.h(origi nally specified in AMD1)

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home .att.net>
Try the download section.

Jul 9 '08 #6

Nick Bowler

On Tue, 08 Jul 2008 21:02:34 -0400, CBFalconer wrote:

Ben Bacarisse wrote:
>CBFalconer <cb********@yah oo.comwrites:
>>Michael Brennan wrote:
I believe that wchar etc. are only available in C99. Using them may
seriously reduce your code portability.

I don't have a real copy of ISO C90 (ANSI C 89) so I am winging it a
bit, but I am pretty sure that wchar_t was in there. C95 added some
more related things (all of which ended up in C99) but using wchar_t
should be very portable indeed[1]. Do you have a reference to C90
without wchar_t? All I can site is online versions of the ANSI
standard as a .txt file and the C90 rationale at:

I am basing it on this excerpt from the C99 standard (N869):

[#5] This edition replaces the previous edition, ISO/IEC
9899:1990, as amended and corrected by ISO/IEC
9899/COR1:1994, ISO/IEC 9899/COR2:1995, and ISO/IEC
9899/AMD1:1995. Major changes from the previous edition include:

-- restricted character set support in <iso646.h>
(originally specified in AMD1)

-- wide-character library support in <wchar.h and
<wctype.h(origi nally specified in AMD1)

The headers specified in that excerpt and all functions declared within
are indeed new in AMD1/C99.

The type wchar_t (from <stddef.h>) was present in C90. Additionally, the
library functions mblen, mbtowc, wctomb, mbstowcs and wcstombs are
available from <stdlib.h>.

AMD1 is fairly widely implemented, anyway.

Jul 9 '08 #7

Michael Brennan

On 2008-07-09, Ben Bacarisse <be********@bsb .me.ukwrote:

Michael Brennan <br************ @gmail.comwrite s:

>I guess this question only applies to programming applications for UNIX,
Windows and similiar. If one develops something for an embedded system
I can understand that wchar_t would be unnecessary.

I'd be very surprised if this were true, but I do not know much about
embedded systems. My audio player seems to support all sorts of
characters.

My mistake, please ignore what I said about that.

>I wonder if there is any point in using char over wchar_t? I don't see
much code using wchar_t when reading other people's code (but then I
haven't really looked much) or when following this newsgroup. To me it
sounds reasonable to make sure your program can handle multibyte
characters so that it can be used at as many places as possible.
Is there any reason I should not use wchar_t for all my future
programs?

It is not a simple "use one or the other".

No, I understand now that it's more complicated, unfortunantely.

>I am aware that on UNIX at least, if you use UTF-8, char works pretty
well.

Yes, but a truly portable program won't assume UTF-8. Even if you can
assume it, converting to wide characters helps when you are doing lots
of character counting operations. For example, finding the longest
match of a pattern is complex if you keep everything in a multi-byte
encoding like UTF-8.

>But if you use wchar_t you don't need to rely on UTF-8 and thus
makes it more portable, correct?

It is one of the components you need. Another is to use C's locale
support. How portable you can be depends on what systems you are
targeting since not all of the features of C99's wide character
support are available on all compiler/library combinations. In fact,
the maximally portable set of things you can do with a wchar_t (or and
array of them) is very small. Here I hope an expert steps in a gives
you real experience-based wisdom about portable use of wide-character
support.

This wasn't easy, I need to rely on C99 stuff and according to viza
programs will be inefficient. I always aim for writing portable
programs but I also need to be able to use CJK characters, so I'm not
really sure on what to do here.

I currently have a program that reads names and birthdates from a file
and then does some calculations to show how many days left until their
birthday and so on. It works well, but I also need to have names in
Japanese in the file. My options are UTF-8 or wchar_t. I have to give up
a lot of portability by choosing either of them. Any recommendation on
which to choose?

--
Michael Brennan

Jul 9 '08 #8

viza

On Wed, 09 Jul 2008 11:19:57 +0000, Michael Brennan wrote:

On 2008-07-09, Ben Bacarisse <be********@bsb .me.ukwrote:
>Michael Brennan <br************ @gmail.comwrite s:

I currently have a program that reads names and birthdates from a file
and then does some calculations to show how many days left until their
birthday and so on. It works well, but I also need to have names in
Japanese in the file. My options are UTF-8 or wchar_t. I have to give up
a lot of portability by choosing either of them. Any recommendation on
which to choose?

What about UTF16 (probably as unsigned short)? It has the simplicity of
programming with fixed width characters and you will be able to find text
editors that can read and write the file more easily.

Just a thought. As you've realised there isn't a perfect solution.

viza

Jul 9 '08 #9

Rui Maciel

On Wed, 09 Jul 2008 11:39:08 +0000, viza wrote:

What about UTF16 (probably as unsigned short)? It has the simplicity of
programming with fixed width characters and you will be able to find
text editors that can read and write the file more easily.

Isn't UTF16 a variable-length format?
Rui Maciel

Jul 9 '08 #10

Similar topics

24092

wchar_t wstring char string transformations

by: Adrian Cornish | last post by:

Hi all, Is there a portable way of transforming a wchar_t to a char and/or wstring to a string. Are there any gurantees for the layout of a wchar_t, like every other byte is a char? I am not worried about data loss.

C / C++

2987

Trying to get wchar_t... from a lookup array but type error... pls help!

by: Julius Mong | last post by:

Hi all, I'm doing this: // Test char code wchar_t lookup = {0x8364, 0x5543, 0x3432, 0xabcd, 0xef01}; for (int x=0; x<5; x++) { wchar_t * string = (wchar_t*) malloc(sizeof(wchar_t)); string = (wchar_t*)lookup; string = '\0'; CComBSTR bstrTest = SysAllocString(string); }

C / C++

1692

Undefined symbol using STLport lib on AIX-4.3.3

by: John Graat | last post by:

Hi all, I've built the STLport-462 library on AIX-4.3.3 using gcc-3.3.2. No errors during compilation. However, during linking the following error occurs: ld: 0711-317 ERROR: Undefined symbol: _STL::_Node_Alloc_Lock<(bool)1, (int)0>::_S_lock Besides this error, the following warnings are given: ld: 0711-224 WARNING: Duplicate symbol: _STL::money_get<char,

C / C++

6899

char to wchar_t conversion

by: Marcin Kalicinski | last post by:

wchar_t c1 = wchar_t('A'); wchar_t c2 = L'A'; Is c1 equal to c2? If they are not equal, how can I create wchar_t character representing the same character as some char value? cheers, Marcin

C / C++

8819

Reading unicode (utf-16 le) using wifstream

by: anubis | last post by:

Heay, i've got this problem: http://rafb.net/paste/results/lpNgbn49.html i'm using wifstream to read utf-16 file and i've got this problem, that each byte is read into seperate char while little-endian uses at least 2 bytes for one sign. the code of method is in the above letter, also with the problem i'm attaching below:

C / C++

3703

Can I get a wchar_t from a std::string?

by: Angus | last post by:

I can see how to get a char* but is it possible to get a wide char - eg wchar_t?

C / C++

23906

Problem using wchar_t and wprintf

by: Rui Maciel | last post by:

I've just started learning how to use the wchar_t data type as the basis for Unicode strings and unfortunately I'm having quite a bit of problems, both in the C front and the Unicode front. In this case,it seems that the wprintf function isn't able to print a string beyond the first character. I don't have a clue why this is happening. Here is the test code: <code> #include <stdlib.h>

C / C++

3386

Style question - using LPCTSTR as a pointer

by: gw7rib | last post by:

I'm using a system in which TCHAR is typedef-ed to represent a character, in this case a wchar_t to hold Unicode characters, and LPCTSTR is typedef-ed to be a pointer to constant wchar_t. I presume it's supposed to be a pointer to constant TCHAR, though they seem to be defined in parallel rather than one typedef using the other. I'm perfectly happy using LPCTSTR for a constant string, but for some reason it seems odd to use LPCTSTR as an...

C / C++

6876

std::wstringbuf and imbue to convert from utf-8 to wchar_t?

by: =?ISO-8859-2?Q?Boris_Du=B9ek?= | last post by:

Hi, I have an API that returns UTF-8 encoded strings. I have a utf8 codevt facet available to do the conversion from UTF-8 to wchar_t encoding defined by the platform. I have no trouble converting when a UTF-8 encoded string comes from file - I just create a std::wifstream and imbue it with a locale that uses the utf-8 facet for std::locale::ctype. Then I just use operator>to get wstring properly decoded from UTF-8. I thought I could...

C / C++

9672

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9519

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10438

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10164

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

10001

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

6780

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5563

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3727

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

2920

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General