Portable Code that supports Unicode

Tomás

Let's start off with:

class Nation {
public:
virtual const char* GetName() const = 0;
}

class Norway : public Nation {
public:
virtual const char* GetName() const
{
return "Norway";
}
};
Let's say we want to give the name of the nation in the nation's official
language... and so we want to use the Unicode character set to achieve this.

How does one go about using Unicode in portable code? Something like the
following?:

typedef wchar_t UnicodeChar;

class Nation {
public:
virtual const UnicodeChar* GetName() const = 0;
}

class Norway : public Nation {
public:
virtual const UnicodeChar* GetName() const
{
return L"Norway"; //Note the preceding L
}
};
Would you use "wchar_t", or would you use "unsigned short"? (Unicode is 16-
bit).

Furthermore, how do you go about making your code in such a way that it can
use either normal characters or wide characters. Microsoft do it something
like the following: (You define the UNICODE macro if you're using Unicode)

#ifdef UNICODE
typedef wchar_t Character;
#define StringLiteral(x ) Lx
#else
typedef char Character;
#define StringLiteral(x ) x
#endif

class Nation {
public:
virtual const Character* GetName() const = 0;
}

class Norway : public Nation {
public:
virtual const Character* GetName() const
{
return StringLiteral(" Norway");
}
};
What do you think of this? At the moment I'm writing code which I want to
support the normal character set and also Unicode... but I want to keep it
portable!

Any suggestions on how to go about this? Is the Microsoft way decent enough?

-Tomás

Feb 28 '06 #1

Subscribe Reply

3306

Ben Pope

Tomás wrote:

Let's start off with:

class Nation {
public:
virtual const char* GetName() const = 0;
}

class Norway : public Nation {
public:
virtual const char* GetName() const
{
return "Norway";
}
};
Why are you using char* instead of std::basic_stri ng<char_type>?
Let's say we want to give the name of the nation in the nation's official
language... and so we want to use the Unicode character set to achieve this.
WHICH unicode "character set"? There are several, such as UTF-8,
UTF-16, UTF-32, UCS-2, UCS-4 as well as big and little endian versions.
How does one go about using Unicode in portable code? Something like the
following?:
Unicode is still not part of the standard, so it is not portable.
typedef wchar_t UnicodeChar;

class Nation {
public:
virtual const UnicodeChar* GetName() const = 0;
}

class Norway : public Nation {
public:
virtual const UnicodeChar* GetName() const
{
return L"Norway"; //Note the preceding L
}
};
Would you use "wchar_t", or would you use "unsigned short"? (Unicode is 16-
bit).
Not all Unicode is 16 bit, and not all 16 bit encodings are Unicode.
wchar_t is often not suitable for Unicode.

Until I was sure what I was doing, I would probably use:

class unicode_char {
/* wrap wchar_t */
}

typedef std::basic_stri ng<unicode_char > ustring;
Furthermore, how do you go about making your code in such a way that it can
use either normal characters or wide characters. Microsoft do it something
like the following: (You define the UNICODE macro if you're using Unicode)

#ifdef UNICODE
typedef wchar_t Character;
#define StringLiteral(x ) Lx
#else
typedef char Character;
#define StringLiteral(x ) x
#endif
That's ugly and is not a modal to be copied. If you need Unicode
support, just support Unicode.

Anyway, this is merely a way of supporting wide and narrow characters,
not encodings.
class Nation {
public:
virtual const Character* GetName() const = 0;
}

class Norway : public Nation {
public:
virtual const Character* GetName() const
{
return StringLiteral(" Norway");
}
};
What do you think of this? At the moment I'm writing code which I want to
support the normal character set and also Unicode... but I want to keep it
portable!

Any suggestions on how to go about this? Is the Microsoft way decent enough?

I think you need to decide what exactly it is you are doing, and read up
on Unicode.

So far you have only demonstrated wide and narrow character support, and
nothing to do with encodings.

You need to decide on an internal representation, and then you need to
provide mappings to your OS of choice, probably through stream operators
and facets. I don't know what your definition of portable is.

Ben Pope
--
I'm not just a number. To many, I'm known as a string...

Feb 28 '06 #2

loufoque

Tomas wrote:

(Unicode is 16-
bit).

Unicode is defined on 21 bits.
You can use various encodings to represent it, like UTF-8, UTF-16 or
UTF-32 alias UCS-4.
There is also UCS-2 that Microsoft uses, but it doesn't support the
whole Unicode range.

If you need something with Random Access, you can only take UCS-2 or UCS-4.
If you only need a Reversible Container, UTF-8 or UTF-16 will do.

Anyway you shouldn't use pointers for strings, but strings objects.

std::wstring can be used for UCS-2 or UCS-4 depending on your system.
Be aware than in the standard, though, std::wstring wasn't made for
unicode. You'd better use something dedicated IMO.

I don't think the UNICODE macro of Microsoft is a good idea. That makes
libs compiled with unicode support incompatible with the ones which
aren't etc.
Just make your application unicode aware, compiling flags to mess
everything up are useless.

I would advise to use Glib::ustring from glibmm.
It contains some nice tools about general Unicode stuff too.

There is also ICU from IBM that you could check out.

Feb 28 '06 #3

Gianni Mariani

loufoque wrote:

Tomas wrote:
(Unicode is 16-
bit).

Unicode is defined on 21 bits.
You can use various encodings to represent it, like UTF-8, UTF-16 or
UTF-32 alias UCS-4.
There is also UCS-2 that Microsoft uses, but it doesn't support the
whole Unicode range.

If you need something with Random Access, you can only take UCS-2 or UCS-4.
If you only need a Reversible Container, UTF-8 or UTF-16 will do.

What is "Reversible " ? If UTF-16 is "reversible " then so must be UTF-32.

Anyway you shouldn't use pointers for strings, but strings objects.

std::wstring can be used for UCS-2 or UCS-4 depending on your system.
Be aware than in the standard, though, std::wstring wasn't made for
unicode. You'd better use something dedicated IMO.

I don't think the UNICODE macro of Microsoft is a good idea. That makes
libs compiled with unicode support incompatible with the ones which
aren't etc.
Just make your application unicode aware, compiling flags to mess
everything up are useless.
I second that.

UTF-16 is also a big waste of time IMHO.

I would advise to use Glib::ustring from glibmm.
It contains some nice tools about general Unicode stuff too.

There is also ICU from IBM that you could check out.

Feb 28 '06 #4

loufoque

Ben Pope a écrit :

WHICH unicode "character set"? There are several, such as UTF-8,
UTF-16, UTF-32, UCS-2, UCS-4 as well as big and little endian versions.
I think those are character encodings, not character sets.

Character sets specify a table that maps characters to integers and
character encodings define ways to encode that integer in bytes.

Unicode would indeed be a character set.

It is actually rather confusing because "charset" is "character
encoding" because of its usage in various protocols for character encoding.

Unicode is still not part of the standard, so it is not portable.

Having a sequence of bytes in memory representing a character according
to a well defined encoding and character set is very portable.

Making the OS display the characters correctly is another thing.

It's not because something isn't part of the standard that it isn't
portable, one can write a portable std::string-like rather easily.

Feb 28 '06 #5

Ben Pope

loufoque wrote:

Ben Pope a écrit :
WHICH unicode "character set"? There are several, such as UTF-8,
UTF-16, UTF-32, UCS-2, UCS-4 as well as big and little endian versions.
I think those are character encodings, not character sets.

Character sets specify a table that maps characters to integers and
character encodings define ways to encode that integer in bytes.

Unicode would indeed be a character set.

It is actually rather confusing because "charset" is "character
encoding" because of its usage in various protocols for character encoding.

Yeah, sorry. I'm not helping the confusion. I actually started with
"charset" and expanded it as a scanned through for mistakes. D'oh!

Unicode is still not part of the standard, so it is not portable.

Having a sequence of bytes in memory representing a character according
to a well defined encoding and character set is very portable.

Of course, but there is no native support. In order to get full Unicode
support, you need a rather large library, or at least a decent framework
in which to stick encodings.
Making the OS display the characters correctly is another thing.
....that was my point.
It's not because something isn't part of the standard that it isn't
portable, one can write a portable std::string-like rather easily.

Indeed, which is fine for internal use, it's the outside world which is
the problem. That's where standardisation (and support) needs to be.

Thanks for the clarifications.

Ben Pope
--
I'm not just a number. To many, I'm known as a string...

Feb 28 '06 #6

Tomás

Tomás posted:

Let's start off with:

class Nation {
public:
virtual const char* GetName() const = 0;
}

class Norway : public Nation {
public:
virtual const char* GetName() const
{
return "Norway";
}
};
Let's say we want to give the name of the nation in the nation's
official language... and so we want to use the Unicode character set to
achieve this.

How does one go about using Unicode in portable code? Something like
the following?:

typedef wchar_t UnicodeChar;

class Nation {
public:
virtual const UnicodeChar* GetName() const = 0;
}

class Norway : public Nation {
public:
virtual const UnicodeChar* GetName() const
{
return L"Norway"; //Note the preceding L
}
};
Would you use "wchar_t", or would you use "unsigned short"? (Unicode is
16- bit).

Furthermore, how do you go about making your code in such a way that it
can use either normal characters or wide characters. Microsoft do it
something like the following: (You define the UNICODE macro if you're
using Unicode)

#ifdef UNICODE
typedef wchar_t Character;
#define StringLiteral(x ) Lx
#else
typedef char Character;
#define StringLiteral(x ) x
#endif

class Nation {
public:
virtual const Character* GetName() const = 0;
}

class Norway : public Nation {
public:
virtual const Character* GetName() const
{
return StringLiteral(" Norway");
}
};
What do you think of this? At the moment I'm writing code which I want
to support the normal character set and also Unicode... but I want to
keep it portable!

Any suggestions on how to go about this? Is the Microsoft way decent
enough?

-Tomás

I always try to keep my posts implementation independant... but anywho
here's what I'm doing:

(About to drift off-topic...)

I'm writing a Windows control that you can place on a dialog box. As some of
you may know, the earlier versions of Windows (95, 98, Me) all used ASCII
internally when dealing with strings. Characters were stored in 8-Bits.

Now, all the Windows versions are using Unicode. My control will display
text, and so I want it to be able to display Unicode text. Unicode
characters are stored using 16 bits on Windows.

There's two flavours of each Windows function, the ASCII one and the Unicode
one, for instance:

SetWindowTextA ( ASCII version )
SetWindowTextW ( Unicode version )

A person can use my control by adding a header file and source file to their
project. Like this:

#inclue <control.hpp>
using namespace Control;

int main()
{
PlaceCtrlOnDial og();
}
Anyway, the whole point is that I while I want the control to support
Unicode, I also want it to support ASCII. I think the best way to do this is
to have a project-wide preprocessor directive such as UNICODE. Then, I could
have:

#ifdef UNICODE
typedef wchar_t Character;
#define StringLiteral(x ) Lx
#else
typedef char Character;
#define StringLiteral(x ) x
#endif

const Character* GetAuthorName()
{
return StringLiteral(" Tomás");
}
You may not think it's the most beautiful code, but it achieves its
objective.

Any thoughts?
-Tomás

Feb 28 '06 #7

loufoque

Gianni Mariani a écrit :

What is "Reversible " ? If UTF-16 is "reversible " then so must be UTF-32.

This is Standard C++ terminology.
A Reversible Container is a Forward Container whose iterators are
Bidirectional Iterators.
A Random Access Container is a Reversible Container whose iterator type
is a Random Access Iterator.

As you can see, UTF-32/UCS-4 being a possible implementation for a
Random Access Container, it is "reversible " too.

Feb 28 '06 #8

Gianni Mariani

loufoque wrote:

Gianni Mariani a Ã©crit :
What is "Reversible " ? If UTF-16 is "reversible " then so must be UTF-32.

This is Standard C++ terminology.
A Reversible Container is a Forward Container whose iterators are
Bidirectional Iterators.
A Random Access Container is a Reversible Container whose iterator type
is a Random Access Iterator.

As you can see, UTF-32/UCS-4 being a possible implementation for a
Random Access Container, it is "reversible " too.

Ah. I thought you were referring to Unicode terminology.

The problem with utf-8 and utf-16 is that they're multibyte
(multi-value) in nature. Making a reversible iterator is non-trivial.

Then again, when you look at the requirements for Unicode's composing
characters, it's a problem as well, for any encoding.

G

Feb 28 '06 #9

Tomás skrev:
<snip>

#ifdef UNICODE
typedef wchar_t Character;
#define StringLiteral(x ) Lx

#define StringLiteral(x ) L##x

<snip>

--
TB @ SWEDEN

Feb 28 '06 #10

Similar topics

429

parser generator which supports unicode?

by: alederer | last post by:

Hallo! Does anybody know a parser generator that supports unicode (UTF-16), and is based on something like ICU. The parser is used in a platform independent and cross-platform communicating application. thanks andreas

C / C++

2149

Pear::DB, mysqli Is is Portable?

by: webguynow | last post by:

I'm trying to build a good DB Layer on top of Pear::DB Are there any forums or knowledge base sites on this direct topic ? I've been using the documentation at: http://pear.php.net/package/DB/docs But since I'm not always online, is there a way I can download this ? Or if the documentation is derived from the classes, How can I build it?

PHP

4811

Portable Test for long long Support

by: Mark Shelor | last post by:

Problem: find a portable way to determine whether a compiler supports the "long long" type of C99. I thought I had this one solved with the following code: #include <limits.h> #ifdef ULONG_LONG_MAX

C / C++

1376

writing platform-portable code in vc++

by: Abubakar | last post by:

Hi, we are finding out ways in which we could develop libraries that could be written in c++ in a way that we can use them in windows, linux, and mac os. We want to write portable code, so that it could be build on mac os, linux, and windows. The code would involve lots of multi-threading and network/socket programming. Since I have read that vc++ 2k5 supports a lot of standard c++ stuff, can it help me to acheive what I want or I will...

.NET Framework

3306

Portable 'lowercase' function for stl string?

by: Steve Edwards | last post by:

Hi, I'm re-writing some code that had relied on some platform/third-party dependent utility functions, as I want to make it more portable. Is there a standard C/C++/stl routine for changing an stl string to all lowercase? (I know how to do it manually, but in the interests of portability...) Thanks Steve

C / C++

4473

strftime replacement which supports Unicode format strings?

by: Dennis Benzinger | last post by:

Is there a library with a strftime replacement which supports Unicode format strings? Bye, Dennis

Python

3666

A python IDE for teaching that supports cyrillic i/o

by: Kirill Simonov | last post by:

Hi, Could anyone suggest me a simple IDE suitable for teaching Python as a first programming language to high school students? It is necessary that it has a good support for input/output in Cyrillic. Unfortunately, most IDEs I tried failed miserably in this respect. My test was simple: I've run the code name = raw_input("What's your name? ") # written in Russian print "Hello, %s!" % name # in Russian as well

Python

21767

portable typeof macro

by: rkk | last post by:

Hi, Is there an equivalent typeof macro/method to determine the type of a variable in runtime & most importantly that works well with most known C compilers? gcc compiler supports typeof() macro, but the same code is not getting compiled in solaris forte compiler and in microsoft VS 2003 compiler. I tried something like below:

C / C++

3444

A Portable C Compiler

by: jacob navia | last post by:

http://slashdot.org/ "The leaner, lighter, faster, and most importantly, BSD Licensed, Compiler PCC has been imported into OpenBSD's CVS and NetBSD's pkgsrc. The compiler is based on the original Portable C Compiler by S. C. Johnson, written in the late 70's. Even though much of the compiler has been rewritten, some of the basics still remain. It is currently not bug-free, but it compiles on x86 platform, and work is being done on it to...

C / C++

8969

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

8788

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

9476

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

9263

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8210

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

6751

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

4570

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4825

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2745

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP