473,772 Members | 2,402 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

What to prefer - TCHAR arrays, std::string or std::wstring ?

Hi

While developing any software, developer need to think about it's
possible enhancement for international usage and considering UNICODE.

I have read many nice articles/items in advanced C++ books (Effective
C++, More Effective C++, Exceptional C++, More Exceptional C++, C++
FAQs, Addison Wesley 2nd Edition)

Authors of these books have not considered UNICODE. So many of their
suggestions/guidelines confuse developers regarding what to use for
character-string members of class (considering exception-safety,
reusability and maintenance of code).

Many books have stated that:
Instead of using character arrays, always prefer using std::string.

My Questions is:

While developing generic Win32 app using C++ for Windows
(98/NT/2000/2003/XP), considering unicode for Windows NT/2000/2003/XP,
What to prefer - TCHAR arrays, std::string or std::wstring
for character-string members (name, address, city, state, country etc.)

of classes like Address, Customer, Vendor, Employee ?

What to prefer - TCHAR arrays, std::string or std::wstring ?

I truly appreciate any help or guideline.
Anand

Aug 2 '06 #1
14 12195
ro************@ yahoo.com wrote:
My Questions is:

While developing generic Win32 app using C++ for Windows
(98/NT/2000/2003/XP), considering unicode for Windows NT/2000/2003/XP,
What to prefer - TCHAR arrays, std::string or std::wstring
for character-string members (name, address, city, state, country etc.)

of classes like Address, Customer, Vendor, Employee ?

What to prefer - TCHAR arrays, std::string or std::wstring ?

I truly appreciate any help or guideline.
Standard C++ does not know about the TCHAR type (I know what it
represents, but it is not a standard language feature), and formally
also does not know about Unicode (std::wstring isn't quite Unicode).
Handling Unicode can be a complex topic, and one on which I cannot claim
to be well versed in.

Your question is probably better suited for a Windows newsgroup.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Aug 2 '06 #2
rohitpatel9999 wrote:
While developing any software, developer need to think about it's
possible enhancement for international usage and considering UNICODE.
Negative. Programmers must prepare for _anything_. The requirement for
Unicode may or may not come next.

Prepare for anything by writing copious unit tests, and by folding as much
duplication as possible. If you duplicate the word "the" in two strings,
fold them into one.

If you then need to localize, read this:

http://flea.sourceforge.net/TFUI_localization.doc

Then incrementally move your strings into a pluggable resource, and
incrementally widen or convert your string variables. "Incrementa lly" means
one at a time, passing all tests after each small edit.

The myth that some important decisions must be made early, to avoid the cost
of a late change, is a self-fulfilling prophecy of defeat.
Authors of these books have not considered UNICODE. So many of their
suggestions/guidelines confuse developers regarding what to use for
character-string members of class (considering exception-safety,
reusability and maintenance of code).
Right. They all use std::string, because many programmers learned C first,
where a character array is still the simplest and most robust way to
represent a fixed-length string. So std::string should be the default,
without a real reason to use anything else. Such a reason could then switch
you to TCHAR, or to std::wstring, or to something else.
My Questions is:

While developing generic Win32 app using C++ for Windows
(98/NT/2000/2003/XP), considering unicode for Windows NT/2000/2003/XP,
What to prefer - TCHAR arrays, std::string or std::wstring
for character-string members (name, address, city, state, country etc.)
Tell your "customer liaison", the person authorized to request features, if
you should spend 9 days working on their next feature, or 18 days working on
that feature + internationaliz ation.

If they need only English, then use std::string everywhere you possibly can,
and something like CString for the remainder.

When they schedule a port to another language, you obtain a glossary for
that language _first_. Then you refactor your code to use something like
std::basic_stri ng<TCHAR>.

If you truly need TCHAR in its WCHAR mode, then you must configure your
tests to run (and pass) with the _UNICODE version of your binary. You should
always pass all such tests, each time you change anything. Otherwise you
might make an innocent change that works in one mode, but breaks in another.

Further, not all code-pages can use WCHAR or wchar_t. Spanish, for example,
is the same code-page as English. Greek is a different code-page, but it
still uses 8-bit bytes. So you should only enable the few features you need
to support another language, and not all those languages need Unicode. Some
versions of Chinese don't need it.

If you truly need "one binary that presents all languages, mixed together",
then you need Unicode. And if you need a rare language like Sanskrit or
Inuit, that has no independent 8-bit code-page, then you will need Unicode.
Otherwise you probably don't.

From here, you must read a book on internationaliz ation. Yet you don't do
_any_ of that research until your business side has selected a target
language. Otherwise you will just be writing speculative features that
_might_ work with any language.

So default to std::string, and keep your programming velocity high. That
helps ensure that your clients will be _able_ to eventually target the
international markets...

--
Phlip
http://c2.com/cgi/wiki?ZeekLand <-- NOT a blog!!!
Aug 2 '06 #3
Thank you for helpful suggestions.
Suggestion of using std::basic_stri ng<TCHARis also good.

Client is sure that they will need UNICODE for few languages (e.g.
Japanese).
Client req. document did specify to make code C++ generic for UNICODE
consideration (but should not use MFC specific CString).

So (in Microfost Visual C++)
application build for Win98/ME will have MBCS defined
application build for Win2000/NT/2003/XP will have UNICODE and _UNICODE
defined.

Please guide me, (considering exception-safety, reusability and
maintenance of code).

What to prefer - TCHAR arrays, std::string or std::wstring ?

or Which of the following three classes is preferable ?

e.g.

/* Option 1 */
class Address
{
_TCHAR name[30];
_TCHAR addressline1[30];
_TCHAR addressline2[30];
_TCHAR city[30];
}
/* Option 2 */
class Address
{
std::basic_stri ng<TCHARname;
std::basic_stri ng<TCHARaddress line1;
std::basic_stri ng<TCHARaddress line2;
std::basic_stri ng<TCHARcity;
}
/* Option 3 */
#ifdef UNICODE
typedef std::wstring tstring
#else
typedef std::string tstring
#endif
class Address
{
tstring name;
tstring addressline1;
tstring addressline2;
tstring city;
}

Thanks again.
Anand (Rohit)

Aug 3 '06 #4

ro************@ yahoo.com wrote:
Hi

While developing any software, developer need to think about it's
possible enhancement for international usage and considering UNICODE.

I have read many nice articles/items in advanced C++ books (Effective
C++, More Effective C++, Exceptional C++, More Exceptional C++, C++
FAQs, Addison Wesley 2nd Edition)

Authors of these books have not considered UNICODE. So many of their
suggestions/guidelines confuse developers regarding what to use for
character-string members of class (considering exception-safety,
reusability and maintenance of code).

Many books have stated that:
Instead of using character arrays, always prefer using std::string.

My Questions is:

While developing generic Win32 app using C++ for Windows
(98/NT/2000/2003/XP), considering unicode for Windows NT/2000/2003/XP,
What to prefer - TCHAR arrays, std::string or std::wstring
for character-string members (name, address, city, state, country etc.)

of classes like Address, Customer, Vendor, Employee ?

What to prefer - TCHAR arrays, std::string or std::wstring ?

I truly appreciate any help or guideline.
Anand
I don't use TCHAR as it's a horrid kludge and has problems of its own.
Although it pretends to support both wchar_t and char it's slightly
broken. The _T macro that may or may not put the L in front of string
literals is even more broken.

As you're developing on Windows then just use wchar_t (and tell MSVC to
define it as a base type, not a typedef to short). You will get exactly
zero benefit from trying to compile the same program with and without
Unicode support.

It is normally much better to just use Unicode internally and then
convert to eight bit in whatever localised form you need when you have
to do so. You will find that you have to do all of this anyway for any
non-trivial program.
K

Aug 3 '06 #5
rohitpatel9999 wrote:
Client is sure that they will need UNICODE for few languages (e.g.
Japanese).
There are requirements and then there are requirements.

I once ported an application to Greek. The original author had added lots of
calls to convert between code-pages. Then the program never converted to any
code pages - it all worked in Western Europe with just one code-page.

I had a lot of fun diagnosing and fixing each bug, the first time any of
these conversion functions ever got called. Oh, and I was implicitly blamed
for the slow velocity, not the original programmer.

So, has this client arranged to provide a real Japanese locale, with a
glossary, for you to port the app to _now_?

Without the critical step of actually using this speculative code, the
client will instead order you to waste time twice, now when you proactively
code for Unicode, and later when you actually provide a new locale.
Client req. document did specify to make code C++ generic for UNICODE
consideration (but should not use MFC specific CString).

So (in Microfost Visual C++)
application build for Win98/ME will have MBCS defined
application build for Win2000/NT/2003/XP will have UNICODE and _UNICODE
defined.

Please guide me, (considering exception-safety, reusability and
maintenance of code).
From here on, I can't. The question is now only on-topic for, roughly,
news:microsoft. public.vc.langu age , or possibly a localization forum
thereof. However, MBCS might provide for as much Japanese as UNICODE would.
You need to ask your client for a real Japanese locale, and then you need to
match your work to it. (And don't get me started about UCS.)

If they give you a glossary in the JIS201 code-page, then an 8-bit non-MBCS
would work for both the Win95s and the WinNTs. If you first enabled UNICODE,
and only then discover your glossary is in JIS201, then you would have
wasted that effort.

(You could use iconv to convert the glossary to UNICODE or back. The goal is
to match which code-page Japanese customers will accept. Has your client
actually researched this?)
What to prefer - TCHAR arrays, std::string or std::wstring ?
Joel Spolky sez "there's no such thing as raw text". The rejoinder is that
wchar_t does not a localized application make.

If you need UNICODE, and if you truly need to pack all kinds of text into
any string, then you need a kind of UTF to encode it. UNICODE is a character
set, not an encoding. And if you can go with UTF-8, even on a Win95 machine,
then you don't need std::wstring.
_TCHAR name[30];
Never. The fixed-length string itself will cause untold horror.
std::basic_stri ng<TCHARname;
Only if you actually test both modes, as you program.

And please introduce a typedef:

typedef std::basic_stri ng<TCHARtstring ;
/* Option 3 */
#ifdef UNICODE
typedef std::wstring tstring
This is a clumsy version of Option 2.

The next complaint is that neither wchar_t or WCHAR are "UNICODE". Sometimes
they are UTF-16. (And on some compilers wchar_t is UTF-32.)

The more you seek a simple answer, the harder this problem will get. The
answer would be simple if you had enough evidence to back up your decision.
Always get as much evidence as possible - preferrably from live deployed
code - before making hard and irreversible decisions. Your client clearly
has experience with source code that created problems when it localized.
They _cannot_ fix this by just guessing you will need the _UNICODE flag
turned on. You must work with them to either defer the requirement, and
write clean code, or promote the requirement, targetting a real release
candidate that a real international user will accept.

--
Phlip
http://c2.com/cgi/wiki?ZeekLand <-- NOT a blog!!!
Aug 3 '06 #6
Kirit Sælensminde wrote:
As you're developing on Windows then just use wchar_t (and tell MSVC to
define it as a base type, not a typedef to short). You will get exactly
zero benefit from trying to compile the same program with and without
Unicode support.
Except that turning on _UNICODE will automagically make the compiler and
program interpret your RC file in UTF-16 instead of a code-paged 8-bit
encoding.
It is normally much better to just use Unicode internally and then
convert to eight bit in whatever localised form you need when you have
to do so. You will find that you have to do all of this anyway for any
non-trivial program.
The OP also has the requirement to target the Win95s, which can't run in
Wide mode.

Aren't there strap-on DLL sets that provide a kind of Wide mode for the
Win95s? If so, the OP could deploy these with the application, build
everything for UNICODE, and safely neglect to enable any other code-pages.

--
Phlip
http://c2.com/cgi/wiki?ZeekLand <-- NOT a blog!!!
Aug 3 '06 #7
ro************@ yahoo.com wrote :
What to prefer - TCHAR arrays, std::string or std::wstring ?
Just make anything Unicode-aware without using any specific stupidity
from the win32 API.
However, if you rely heavily on that API it may be annoying to interface
with it if you don't follow its internationaliz ation concepts.
But anyway if you rely that much on it you're coding something so
specific that you should ask in another group.

std::wstring will allow UCS-2 (on win32) and UCS-4 (on most unices).
You can use std::string for 'unsafe' utf-8, which is in most of the
cases enough.

Or you could use ICU or glibmm for advanced Unicode support.
Aug 3 '06 #8

"Phlip" <ph******@yahoo .comskrev i meddelandet
news:md******** ********@newssv r25.news.prodig y.net...
Kirit Sælensminde wrote:
>As you're developing on Windows then just use wchar_t (and tell
MSVC to
define it as a base type, not a typedef to short). You will get
exactly
zero benefit from trying to compile the same program with and
without
Unicode support.

Except that turning on _UNICODE will automagically make the compiler
and program interpret your RC file in UTF-16 instead of a code-paged
8-bit encoding.
You can turn that option on as well, if it has any advantage. Using
wchar_t and std::wstring in your application makes it independent of
those settings.
>
>It is normally much better to just use Unicode internally and then
convert to eight bit in whatever localised form you need when you
have
to do so. You will find that you have to do all of this anyway for
any
non-trivial program.

The OP also has the requirement to target the Win95s, which can't
run in Wide mode.
Windows 95, 98, and NT are officially unsupported both as OSs and as
targets for the present compiler. All currently supported Windows
versions use wchar_t internally. New applications could do that as
well.

Using TCHAR to optionally compile a new application for a dead OS
doesn't seem very useful to me. :-)
>
Aren't there strap-on DLL sets that provide a kind of Wide mode for
the Win95s? If so, the OP could deploy these with the application,
build everything for UNICODE, and safely neglect to enable any other
code-pages.
Except that these are as dead as their OSs. Can't be distributed after
their end-of-life.
Bo Persson
Aug 3 '06 #9
Phlip wrote :
The OP also has the requirement to target the Win95s, which can't run in
Wide mode.
Actually, you can probably do it with MSLU (the Microsoft Layer for
Unicode on Windows 95, 98, and Me systems)

Aug 3 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
12287
by: red floyd | last post by:
I have a an app that I'm writing which uses char and std::string. I'm using a library which expects wchar_t arrays. Is there a standard way to convert between std::string and std::wstring, or do I need to use something like std::transform()? Thanks
12
28199
by: Flzw | last post by:
How to convert a std::string to a WCHAR* ? is there any methods or something ? I can't find. Thanks
3
5510
by: Lars Nielsen | last post by:
Hey there I have a win32 application written i c++. I have a std::vector of std::string's i will fill with filenames. typedef vector<std::string> strvector; strvector vFiles; WIN32_FIND_DATA fd;
9
22564
by: vsgdp | last post by:
Hi, Is there a unicode equivalent to std::string?
5
48700
by: Karthik | last post by:
Hello, How can I convert a BSTR data type to std::string??? Thanks a much! Karthik
37
3813
by: jortizclaver | last post by:
Hi, I'm about to develop a new framework for my corporative applications and my first decision point is what kind of strings to use: std::string or classical C char*. Performance in my system is quite importante - it's not a realtime system, but almost - and I concern about std::string performance in terms of speed. No doubt to use std implementation is a lot easier but I can't sacrifice speed.
10
10137
by: Jeffrey Walton | last post by:
Hi All, I've done a little homework (I've read responses to similar from P.J. Plauger and Dietmar Kuehl), and wanted to verify with the Group. Below is what I am performing (Stroustrup's Appendix D recommendation won't compile in Microsoft VC++ 6.0). My question is in reference to MultiByte Character Sets. Will this code perform as expected? I understand every problem has a simple and elegant solution that is wrong.
3
3703
by: Angus | last post by:
I can see how to get a char* but is it possible to get a wide char - eg wchar_t?
10
8274
bajajv
by: bajajv | last post by:
Hi, I was trying to implement CString with std::string and std::wstring. class CString { public: std::string str; CString() {str = "ABCD";} //this works }: class CString {
0
9454
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10103
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10038
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9911
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8934
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7460
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
1
4007
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3609
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2850
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.