473,324 Members | 2,535 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

Multibyte VS. Wide

Is it true that

Multibyte characters are : char arrays (witch represent a string from
the basic characters set). In this case Wide characters are the way
for encoding characters from the extended characters set.

or

Multibyte characters are: characters from the extended character set
which need more than one byte to encode. And in this case wide
characters are a subset of the multibyte character encoding.

Both the ISO/IEC 9899:1999 and the libc info page (the gnu c library
documentation) are a little bit vague in this area.

I tend to believe the second explanation but want to make sure.

Yazan jaber
Nov 13 '05 #1
3 6813
In <a3**************************@posting.google.com > ya*********@yahoo.com (yazan jab) writes:
Is it true that

Multibyte characters are : char arrays (witch represent a string from
the basic characters set). In this case Wide characters are the way
for encoding characters from the extended characters set.

or

Multibyte characters are: characters from the extended character set
which need more than one byte to encode. And in this case wide
characters are a subset of the multibyte character encoding.


Neither is true, but the latter is closer to the truth. The definition
of the multibyte character is correct, but wide characters are not a
subset of the multibyte character encoding. They are wide enough to
represent *every* character from the extended character set.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #2
ya*********@yahoo.com (yazan jab) wrote:
# Is it true that
#
# Multibyte characters are : char arrays (witch represent a string from
# the basic characters set). In this case Wide characters are the way
# for encoding characters from the extended characters set.

For something like Unicode, the character codes range from 0 to 65535 (or 0 to
4 billion to include ideographs as single characters). A wide character
would be an integer sufficient to hold the character code as a fixed size
unit, either 16 or 32 bit integers (typically a short or a long). When you
use wchars for these code, you have the same advantage that you have for
ASCII and char: and n-character string will require exactly n+1 storage
units to store.

However there are still many old and useful programs designed only for char
width characters that would not be able to cope with wchar characters. Instead
of recoding and recompiling all that software, some clever and not so clever
ways have been invented to represent one large 16 or 32 bit characters as a
sequence of one or more 8-bit characters. UTF coding for example represents
16-bit Unicode as 1 to 3 8-bit multibyte characters. UTF has the additional
property that the ASCII subset of Unicode in UTF is the exact same byte
codings as the ASCII codes, and that a multibyte UTF character does not
include any bytes in the 0-127 range.

This means when old ASCII software is given a multibyte encoding like UTF, if
it simply passes through bytes 128-255 unchanged, it is upgraded without coding
changes to being new Unicode software as well.

The disadvantage of multibyte characters is that a n character Unicode string
can take anywhere from n+1 through 3n+1 char storage units; you won't know
with examining the actual characters.

--
Derk Gwen http://derkgwen.250free.com/html/index.html
Where do you get those wonderful toys?
Nov 13 '05 #3
On Thu, 06 Nov 2003 11:55:13 -0500, yazan jab wrote:
Is it true that

Multibyte characters are : char arrays (witch represent a string from
the basic characters set). In this case Wide characters are the way for
encoding characters from the extended characters set.

or

Multibyte characters are: characters from the extended character set
which need more than one byte to encode. And in this case wide


It's important to distinquish between characters (or charsets) and
character encodings. They are two different things. A charset is a map
that defines which numeric value represents a particular glyph. A
character encoding defines how numeric values are serialized into a
stream of bytes. For example Unicode can be encoded as UTF-8 which which
is space effecient and provides compatibility with the ASCII and ISO-8859-1
charsets. Or it could be encoded as UCS4-LE which is not space effient
but it can be easier to do heavy text processing with it.

Here's a nice link about programming with extended charsets although it
is a little UTF-8/*nix centric:

http://www.cl.cam.ac.uk/~mgk25/unicode.html

Mike
Nov 13 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
by: Zygmunt Krynicki | last post by:
Hello I've browsed the FAQ but apparently it lacks any questions concenring wide character strings. I'd like to calculate the length of a multibyte string without converting the whole string. ...
2
by: Billow | last post by:
And how about MultiByte to unicode string?
3
by: Jordan Abel | last post by:
Is there a function to find the length, in wide characters, of a multibyte string?
1
by: miner49er | last post by:
Hi there, Here's my problem, please help - I think i'm going insane :-) I have written a DLL that returns Wide Char Unicode Chinese Strings. I have a 3rd party Graph control (OCX) that...
1
by: Marcel Ruff | last post by:
Hi, i have the question on how to determine the string length of a wide string and a multibyte string: 1. Number of letters (one letter may use three bytes) 2. Number of bytes In the code...
0
by: Munch | last post by:
my C program deals with single byte characters but now i want to fetch multibyte data stored in the datbase so what all changes i need to make to the code so that it handles multibyte data as well....
0
by: Munch | last post by:
my C program deals with single byte characters but now i want to fetch multibyte data stored in the datbase so what all changes i need to make to the code so that it handles multibyte data as well. ...
13
by: TK | last post by:
Hi, how can I handle multibyte characters like ä, ü (german vowel mutation)? This does't work: switch(c) case 'ä': ... some action
2
by: George2 | last post by:
Hello everyone, I need to know the wide character (unicode) and multibyte (UTF-8) values of a character string of czech. I personally know nothing about czech. Is the following approach correct?...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.