473,560 Members | 3,069 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Multibyte VS. Wide

Is it true that

Multibyte characters are : char arrays (witch represent a string from
the basic characters set). In this case Wide characters are the way
for encoding characters from the extended characters set.

or

Multibyte characters are: characters from the extended character set
which need more than one byte to encode. And in this case wide
characters are a subset of the multibyte character encoding.

Both the ISO/IEC 9899:1999 and the libc info page (the gnu c library
documentation) are a little bit vague in this area.

I tend to believe the second explanation but want to make sure.

Yazan jaber
Nov 13 '05 #1
3 6836
In <a3************ **************@ posting.google. com> ya*********@yah oo.com (yazan jab) writes:
Is it true that

Multibyte characters are : char arrays (witch represent a string from
the basic characters set). In this case Wide characters are the way
for encoding characters from the extended characters set.

or

Multibyte characters are: characters from the extended character set
which need more than one byte to encode. And in this case wide
characters are a subset of the multibyte character encoding.


Neither is true, but the latter is closer to the truth. The definition
of the multibyte character is correct, but wide characters are not a
subset of the multibyte character encoding. They are wide enough to
represent *every* character from the extended character set.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #2
ya*********@yah oo.com (yazan jab) wrote:
# Is it true that
#
# Multibyte characters are : char arrays (witch represent a string from
# the basic characters set). In this case Wide characters are the way
# for encoding characters from the extended characters set.

For something like Unicode, the character codes range from 0 to 65535 (or 0 to
4 billion to include ideographs as single characters). A wide character
would be an integer sufficient to hold the character code as a fixed size
unit, either 16 or 32 bit integers (typically a short or a long). When you
use wchars for these code, you have the same advantage that you have for
ASCII and char: and n-character string will require exactly n+1 storage
units to store.

However there are still many old and useful programs designed only for char
width characters that would not be able to cope with wchar characters. Instead
of recoding and recompiling all that software, some clever and not so clever
ways have been invented to represent one large 16 or 32 bit characters as a
sequence of one or more 8-bit characters. UTF coding for example represents
16-bit Unicode as 1 to 3 8-bit multibyte characters. UTF has the additional
property that the ASCII subset of Unicode in UTF is the exact same byte
codings as the ASCII codes, and that a multibyte UTF character does not
include any bytes in the 0-127 range.

This means when old ASCII software is given a multibyte encoding like UTF, if
it simply passes through bytes 128-255 unchanged, it is upgraded without coding
changes to being new Unicode software as well.

The disadvantage of multibyte characters is that a n character Unicode string
can take anywhere from n+1 through 3n+1 char storage units; you won't know
with examining the actual characters.

--
Derk Gwen http://derkgwen.250free.com/html/index.html
Where do you get those wonderful toys?
Nov 13 '05 #3
On Thu, 06 Nov 2003 11:55:13 -0500, yazan jab wrote:
Is it true that

Multibyte characters are : char arrays (witch represent a string from
the basic characters set). In this case Wide characters are the way for
encoding characters from the extended characters set.

or

Multibyte characters are: characters from the extended character set
which need more than one byte to encode. And in this case wide


It's important to distinquish between characters (or charsets) and
character encodings. They are two different things. A charset is a map
that defines which numeric value represents a particular glyph. A
character encoding defines how numeric values are serialized into a
stream of bytes. For example Unicode can be encoded as UTF-8 which which
is space effecient and provides compatibility with the ASCII and ISO-8859-1
charsets. Or it could be encoded as UCS4-LE which is not space effient
but it can be easier to do heavy text processing with it.

Here's a nice link about programming with extended charsets although it
is a little UTF-8/*nix centric:

http://www.cl.cam.ac.uk/~mgk25/unicode.html

Mike
Nov 13 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
5595
by: Zygmunt Krynicki | last post by:
Hello I've browsed the FAQ but apparently it lacks any questions concenring wide character strings. I'd like to calculate the length of a multibyte string without converting the whole string. Zygmunt PS: The whole multibyte string vs wide character string concept is broken IMHO since it allows wchar_t not to be large enough to contain a...
2
4132
by: Billow | last post by:
And how about MultiByte to unicode string?
3
2443
by: Jordan Abel | last post by:
Is there a function to find the length, in wide characters, of a multibyte string?
1
4542
by: miner49er | last post by:
Hi there, Here's my problem, please help - I think i'm going insane :-) I have written a DLL that returns Wide Char Unicode Chinese Strings. I have a 3rd party Graph control (OCX) that requires Multibyte strings. When I convert the Wide char to Multibyte, in the debugger they are
1
6014
by: Marcel Ruff | last post by:
Hi, i have the question on how to determine the string length of a wide string and a multibyte string: 1. Number of letters (one letter may use three bytes) 2. Number of bytes In the code snippet *p points to one chinese word which i copy/pasted from my browser from some chinese homepage,
0
1394
by: Munch | last post by:
my C program deals with single byte characters but now i want to fetch multibyte data stored in the datbase so what all changes i need to make to the code so that it handles multibyte data as well. i know about wide characters and convertion from wide to multibyte.but m not sure of where to start from. what all header files to include? do i...
0
1347
by: Munch | last post by:
my C program deals with single byte characters but now i want to fetch multibyte data stored in the datbase so what all changes i need to make to the code so that it handles multibyte data as well. i know about wide characters and convertion from wide to multibyte.but m not sure of where to start from. what all header files to include? do i...
13
3515
by: TK | last post by:
Hi, how can I handle multibyte characters like ä, ü (german vowel mutation)? This does't work: switch(c) case 'ä': ... some action
2
1484
by: George2 | last post by:
Hello everyone, I need to know the wide character (unicode) and multibyte (UTF-8) values of a character string of czech. I personally know nothing about czech. Is the following approach correct? 1. I use L on the character string and watch memory to get the wide character representation of the character string in little endian form; 2....
0
7636
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
8070
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7603
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7922
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6194
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5461
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5176
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3590
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1171
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.