Multi-byte chars

Bill Cunningham

I've been reading the C standard online and I'm puzzled as to what multibyte
chars are. Wide chars I believe would be characters for languages such as
cantonese or Japanese. I know the ASCII character set specifies that each
character such as 'b' or 'B' is an 8 bit character. So what's a multibyte
character?
Also how would you use the function parameter main (char argc, char
**argv) if that's correct?

Bill

-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 80,000 Newsgroups - 16 Different Servers! =-----

Nov 13 '05 #1

Subscribe Reply

4807

1
2
3
>
Last »

Richard Heathfield

Bill Cunningham wrote:

I've been reading the C standard online and I'm puzzled as to what
multibyte chars are.
A multibyte character is a "sequence of one or more bytes representing a
member of the extended character set of either the source or the execution
environment", if I have the quote from 3.7.2 right.
Wide chars I believe would be characters for
languages such as cantonese or Japanese.
C isn't as specific as that. See 3.7.3.
I know the ASCII character set
specifies that each character such as 'b' or 'B' is an 8 bit character.

7 bits, not 8. ASCII is a 7-bit code.

<snip>

--
Richard Heathfield : bi****@eton.pow ernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton

Nov 13 '05 #2

lawrence.jones

Bill Cunningham <so**@some.ne t> wrote:

I've been reading the C standard online and I'm puzzled as to what multibyte
chars are. Wide chars I believe would be characters for languages such as
cantonese or Japanese. I know the ASCII character set specifies that each
character such as 'b' or 'B' is an 8 bit character. So what's a multibyte
character?

A single logical character that requires more than one byte to express.
For example, consider the UTF-8 encoding format for ISO 10646: normal
ASCII characters (between \x00 and \x7f) are encoded as a single byte
with the same value. Other characters are encoded as multiple bytes,
each of which has the top bit set; the first byte is in the range \xc0
to \xfd and indicates the number of bytes that follow, subsequent bytes
are in the range \x80 to \xbf. UTF-8 encoded characters can be any
length between one and six bytes. So 'A' is encoded as \x41 but '©'
(the copyright sign) is encoded as \xc2\xa9.

Multibyte encodings can be very space efficient, but they are difficult
to process since different characters have different lengths. Wide
characters, on the other hand, are intended to be efficient for
processing, but not necessarily space efficient. Wide characters are
integers that are large enough so that every logical character can be
represented in just one wide character.

-Larry Jones

If I get a bad grade, it'll be YOUR fault for not doing the work for me!
-- Calvin

Nov 13 '05 #3

Jun Woong

<la************ @eds.com> wrote in message news:nv******** **@cvg-65-27-189-87.cinci.rr.com ...

Bill Cunningham <so**@some.ne t> wrote:

I've been reading the C standard online and I'm puzzled as to what multibyte
chars are. Wide chars I believe would be characters for languages such as
cantonese or Japanese. I know the ASCII character set specifies that each
character such as 'b' or 'B' is an 8 bit character. So what's a multibyte
character?

A single logical character that requires more than one byte to express.
For example, consider the UTF-8 encoding format for ISO 10646: normal
ASCII characters (between \x00 and \x7f) are encoded as a single byte
with the same value.

My understanding is that the standard requires 'A' == L'A' by the fact
that the basic character set must be a subset of the extended
character set. Do this and what you mentioned above mean that a
character set whose code values differ from ASCII's can't be the basic
set on an implementation where code values of Unicode is used as those
of the extended set?
--
Jun, Woong (my******@hanma il.net)
Dept. of Physics, Univ. of Seoul

Nov 13 '05 #4

Dan Pop

In <be**********@n ews.hananet.net > "Jun Woong" <my******@hanma il.net> writes:

<la*********** *@eds.com> wrote in message news:nv******** **@cvg-65-27-189-87.cinci.rr.com ...
Bill Cunningham <so**@some.ne t> wrote:
>
> I've been reading the C standard online and I'm puzzled as to what multibyte
> chars are. Wide chars I believe would be characters for languages such as
> cantonese or Japanese. I know the ASCII character set specifies that each
> character such as 'b' or 'B' is an 8 bit character. So what's a multibyte
> character?
A single logical character that requires more than one byte to express.
For example, consider the UTF-8 encoding format for ISO 10646: normal
ASCII characters (between \x00 and \x7f) are encoded as a single byte
with the same value.

My understanding is that the standard requires 'A' == L'A' by the fact
that the basic character set must be a subset of the extended
character set.

Non sequitur. The fact that A belongs to the basic character set has
no relevance on the value of L'A', AFAICT. All the standard has to say
on the issue is:

11 A wide character constant has type wchar_t, an integer type
defined in the <stddef.h> header. The value of a wide character
constant containing a single multibyte character that maps to
a member of the extended execution character set is the wide
character corresponding to that multibyte character, as defined
by the mbtowc function, with an implementation-defined current
locale.
Do this and what you mentioned above mean that a
character set whose code values differ from ASCII's can't be the basic
set on an implementation where code values of Unicode is used as those
of the extended set?

Nope, he was merely describing what happens on an implementation using
ASCII for normal characters and UCS for wide characters (therefore UTF-8
for multi-byte characters).

There is nothing preventing an implementation from using EBCDIC for
normal characters and UCS for wide characters, in which case it is foolish
to expect 'A' == L'A'.

Furthermore, there is nothing preventing an implementation from using
ASCII for normal characters and EBCDIC for wide characters (or vice
versa). The fact that C99 supports UCNs in source code means nothing WRT
the execution character set (whose extended version need not contain any
additional characters).

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 13 '05 #5

Jun Woong

"Dan Pop" <Da*****@cern.c h> wrote in message news:be******** **@sunnews.cern .ch...

In <be**********@n ews.hananet.net > "Jun Woong" <my******@hanma il.net> writes:
<la*********** *@eds.com> wrote in message news:nv******** **@cvg-65-27-189-87.cinci.rr.com ...
Bill Cunningham <so**@some.ne t> wrote:
>
> I've been reading the C standard online and I'm puzzled as to what multibyte
> chars are. Wide chars I believe would be characters for languages such as
> cantonese or Japanese. I know the ASCII character set specifies that each
> character such as 'b' or 'B' is an 8 bit character. So what's a multibyte
> character?

A single logical character that requires more than one byte to express.
For example, consider the UTF-8 encoding format for ISO 10646: normal
ASCII characters (between \x00 and \x7f) are encoded as a single byte
with the same value.

My understanding is that the standard requires 'A' == L'A' by the fact
that the basic character set must be a subset of the extended
character set.

Non sequitur. The fact that A belongs to the basic character set has
no relevance on the value of L'A', AFAICT. All the standard has to say
on the issue is:

11 A wide character constant has type wchar_t, an integer type
defined in the <stddef.h> header. The value of a wide character
constant containing a single multibyte character that maps to
a member of the extended execution character set is the wide
character corresponding to that multibyte character, as defined
by the mbtowc function, with an implementation-defined current
locale.

And in 7.17p2:

wchar_t

which is an integer type whose range of values can represent
distinct codes for all members of the largest extended character
set specified among the supported locales; the null character
shall have the code value zero and each member of the basic
character set shall have a code value equal to its value when used
as the lone character in an integer character constant.
--
Jun, Woong (my******@hanma il.net)
Dept. of Physics, Univ. of Seoul

Nov 13 '05 #6

lawrence.jones

Jun Woong <my******@hanma il.net> wrote:

My understanding is that the standard requires 'A' == L'A' by the fact
that the basic character set must be a subset of the extended
character set. Do this and what you mentioned above mean that a
character set whose code values differ from ASCII's can't be the basic
set on an implementation where code values of Unicode is used as those
of the extended set?

Yes, but. That requirement is a hold-over from the very earliest days of
extended character set support, before there were functions to convert
between wide and narrow characters. Now that those functions exist,
there is no longer any reason for the requirement, and the committee has
voted to remove it. See the committee's response to DR #279:

<http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/dr_279.htm>

-Larry Jones

Somebody's always running my life. I never get to do what I want to do.
-- Calvin

Nov 13 '05 #7

Dan Pop

In <be**********@n ews.hananet.net > "Jun Woong" <my******@hanma il.net> writes:

And in 7.17p2:

wchar_t

which is an integer type whose range of values can represent
distinct codes for all members of the largest extended character
set specified among the supported locales; the null character
shall have the code value zero and each member of the basic
character set shall have a code value equal to its value when used
as the lone character in an integer character constant.

This requirement, carried on from C89, is simply broken: implementations
that don't use ASCII for normal characters wouldn't be able to use *any*
of the ASCII extensions (UCS, most importantly) for wide characters.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 13 '05 #8

Jun Woong

"Dan Pop" <Da*****@cern.c h> wrote in message news:be******** **@sunnews.cern .ch...

In <be**********@n ews.hananet.net > "Jun Woong" <my******@hanma il.net> writes:

And in 7.17p2:

wchar_t

which is an integer type whose range of values can represent
distinct codes for all members of the largest extended character
set specified among the supported locales; the null character
shall have the code value zero and each member of the basic
character set shall have a code value equal to its value when used
as the lone character in an integer character constant.

This requirement, carried on from C89, is simply broken: implementations
that don't use ASCII for normal characters wouldn't be able to use *any*
of the ASCII extensions (UCS, most importantly) for wide characters.

Then, the proper answer to my previous question should be mention of
the DR in process, not citation of an irrelevant wording.
--
Jun, Woong (my******@hanma il.net)
Dept. of Physics, Univ. of Seoul

Nov 13 '05 #9

Jun Woong

<la************ @eds.com> wrote in message news:73******** ***@cvg-65-27-189-87.cinci.rr.com ...
[...]

Yes, but. That requirement is a hold-over from the very earliest days of
extended character set support, before there were functions to convert
between wide and narrow characters. Now that those functions exist,
there is no longer any reason for the requirement,

Weren't there some conversion functions between wide and multibyte
characters in C90? Do you mean that the wording in question was
written before the C89 committee decided to put those functions into
the standard, or that now we have more complete set of functions to
deal with wide and multibyte characters so don't need the requirement
any more?
--
Jun, Woong (my******@hanma il.net)
Dept. of Physics, Univ. of Seoul

Nov 13 '05 #10

Similar topics

4895

multi threading in multi processor (computer)

by: ajikoe | last post by:

Hello, Is anyone has experiance in running python code to run multi thread parallel in multi processor. Is it possible ? Can python manage which cpu shoud do every thread? Sincerely Yours, Pujo

Python

4673

Intellisense with C# and multi-file assembly

by: Frank Jona | last post by:

Intellisense with C# and a multi-file assembly is not working. With VB.NET it is working. Is there a fix availible? We're using VisualStudio 2003 Regards Frank

.NET Framework

3879

cross posting vs. multi posting ( both *APPEAR* to be widely accepted )

by: * ProteanThread * | last post by:

but depends upon the clique: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=954drf%24oca%241%40agate.berkeley.edu&rnum=2&prev=/groups%3Fq%3D%2522cross%2Bposting%2Bversus%2Bmulti%2Bposting%2522%26ie%3DUTF-8%26oe%3DUTF-8%26hl%3Den ...

HTML / CSS

3783

GDI+ Error - problem with JPEG compression in Multi-page tiffs

by: frankenberry | last post by:

I have multi-page tiff files. I need to extract individual frames from the multi-page tiffs and save them as single-page tiffs. 95% of the time I receive multi-page tiffs containing 1 or more black and white CCITT4 compressed files (frames) inside the tiff. Every now and then I receive a mixture of black and white CCITT4 and JPEG compressed files, and sometimes just multi-page tiffs with JPEG only. The code runs great when dealing with the...

.NET Framework

8177

what are multi file assemblies good for?

by: cody | last post by:

What are multi file assemblies good for? What are the advantages of using multiple assemblies (A.DLL+B.DLL) vs. a single multi file assembly (A.DLL+A.NETMODULE)?

.NET Framework

17873

warning: multi-character character constant...help me!

by: mimmo | last post by:

Hi! I should convert the accented letters of a string in the correspondent letters not accented. But when I compile with -Wall it give me: warning: multi-character character constant Do the problem is the charset? How I can avoid this warning? But the worst thing isn't the warning, but that the program doesn't work! The program execute all other operations well, but it don't print the converted letters: for example, in the string...

C / C++

5996

Graphics: Get Dimensions of a frame in a multi frame tiff

by: Shane Story | last post by:

I can seem to get the dimensions of a frame in a multiframe tiff. After selecting activeframe, the Width/Height is still really much larger than the page's actual dimensions. When I split a TIFF to several PNG files this causes a problem, becuase the resulting image is (the page to the far left and a lot of black space surrounding it and a filesize that is larger than needed. Any ideas?

Visual Basic .NET

5766

Multi user programs

by: bobwansink | last post by:

Hi, I'm relatively new to programming and I would like to create a C++ multi user program. It's for a project for school. This means I will have to write a paper about the theory too. Does anyone know a good place to start looking for some theory on the subject of multi user applications? I know only bits and pieces, like about transactions, but a compendium of possible approches to multi user programming would be very appreciated!

C / C++

2327

CFP: 2008 International Workshop on Multi-Core Computing Systems

by: Sabri.Pllana | last post by:

We apologize if you receive multiple copies of this call for papers. *********************************************************************** 2008 International Workshop on Multi-Core Computing Systems (MuCoCoS'08) Barcelona, Spain, March 4 - 7, 2008; in conjunction with CISIS'08. <http://www.par.univie.ac.at/~pllana/mucocos08> *********************************************************************** Context

Python

9314

Getting Error: The multi-part identifier "Table.Field" could not be bound.

by: mknoll217 | last post by:

I am recieving this error from my code: The multi-part identifier "PAR.UniqueID" could not be bound. The multi-part identifier "Salary.UniqueID" could not be bound. The multi-part identifier "PAR.UniqueID" could not be bound. The multi-part identifier "PAR.PAR_Status" could not be bound. The multi-part identifier "Salary.New_Salary" could not be bound. The multi-part identifier "Salary.UniqueID" could not be bound. The multi-part...

.NET Framework

8991

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9548

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

9325

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

9249

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

8244

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

6076

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

4607

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

3315

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

2787

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP