473,748 Members | 9,933 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Multi-byte chars

I've been reading the C standard online and I'm puzzled as to what multibyte
chars are. Wide chars I believe would be characters for languages such as
cantonese or Japanese. I know the ASCII character set specifies that each
character such as 'b' or 'B' is an 8 bit character. So what's a multibyte
character?
Also how would you use the function parameter main (char argc, char
**argv) if that's correct?

Bill

-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 80,000 Newsgroups - 16 Different Servers! =-----
Nov 13 '05 #1
43 4807
Bill Cunningham wrote:
I've been reading the C standard online and I'm puzzled as to what
multibyte chars are.
A multibyte character is a "sequence of one or more bytes representing a
member of the extended character set of either the source or the execution
environment", if I have the quote from 3.7.2 right.
Wide chars I believe would be characters for
languages such as cantonese or Japanese.
C isn't as specific as that. See 3.7.3.
I know the ASCII character set
specifies that each character such as 'b' or 'B' is an 8 bit character.


7 bits, not 8. ASCII is a 7-bit code.

<snip>

--
Richard Heathfield : bi****@eton.pow ernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton
Nov 13 '05 #2
Bill Cunningham <so**@some.ne t> wrote:

I've been reading the C standard online and I'm puzzled as to what multibyte
chars are. Wide chars I believe would be characters for languages such as
cantonese or Japanese. I know the ASCII character set specifies that each
character such as 'b' or 'B' is an 8 bit character. So what's a multibyte
character?


A single logical character that requires more than one byte to express.
For example, consider the UTF-8 encoding format for ISO 10646: normal
ASCII characters (between \x00 and \x7f) are encoded as a single byte
with the same value. Other characters are encoded as multiple bytes,
each of which has the top bit set; the first byte is in the range \xc0
to \xfd and indicates the number of bytes that follow, subsequent bytes
are in the range \x80 to \xbf. UTF-8 encoded characters can be any
length between one and six bytes. So 'A' is encoded as \x41 but '©'
(the copyright sign) is encoded as \xc2\xa9.

Multibyte encodings can be very space efficient, but they are difficult
to process since different characters have different lengths. Wide
characters, on the other hand, are intended to be efficient for
processing, but not necessarily space efficient. Wide characters are
integers that are large enough so that every logical character can be
represented in just one wide character.

-Larry Jones

If I get a bad grade, it'll be YOUR fault for not doing the work for me!
-- Calvin
Nov 13 '05 #3

<la************ @eds.com> wrote in message news:nv******** **@cvg-65-27-189-87.cinci.rr.com ...
Bill Cunningham <so**@some.ne t> wrote:

I've been reading the C standard online and I'm puzzled as to what multibyte
chars are. Wide chars I believe would be characters for languages such as
cantonese or Japanese. I know the ASCII character set specifies that each
character such as 'b' or 'B' is an 8 bit character. So what's a multibyte
character?


A single logical character that requires more than one byte to express.
For example, consider the UTF-8 encoding format for ISO 10646: normal
ASCII characters (between \x00 and \x7f) are encoded as a single byte
with the same value.


My understanding is that the standard requires 'A' == L'A' by the fact
that the basic character set must be a subset of the extended
character set. Do this and what you mentioned above mean that a
character set whose code values differ from ASCII's can't be the basic
set on an implementation where code values of Unicode is used as those
of the extended set?
--
Jun, Woong (my******@hanma il.net)
Dept. of Physics, Univ. of Seoul

Nov 13 '05 #4
In <be**********@n ews.hananet.net > "Jun Woong" <my******@hanma il.net> writes:

<la*********** *@eds.com> wrote in message news:nv******** **@cvg-65-27-189-87.cinci.rr.com ...
Bill Cunningham <so**@some.ne t> wrote:
>
> I've been reading the C standard online and I'm puzzled as to what multibyte
> chars are. Wide chars I believe would be characters for languages such as
> cantonese or Japanese. I know the ASCII character set specifies that each
> character such as 'b' or 'B' is an 8 bit character. So what's a multibyte
> character?
A single logical character that requires more than one byte to express.
For example, consider the UTF-8 encoding format for ISO 10646: normal
ASCII characters (between \x00 and \x7f) are encoded as a single byte
with the same value.


My understanding is that the standard requires 'A' == L'A' by the fact
that the basic character set must be a subset of the extended
character set.


Non sequitur. The fact that A belongs to the basic character set has
no relevance on the value of L'A', AFAICT. All the standard has to say
on the issue is:

11 A wide character constant has type wchar_t, an integer type
defined in the <stddef.h> header. The value of a wide character
constant containing a single multibyte character that maps to
a member of the extended execution character set is the wide
character corresponding to that multibyte character, as defined
by the mbtowc function, with an implementation-defined current
locale.
Do this and what you mentioned above mean that a
character set whose code values differ from ASCII's can't be the basic
set on an implementation where code values of Unicode is used as those
of the extended set?


Nope, he was merely describing what happens on an implementation using
ASCII for normal characters and UCS for wide characters (therefore UTF-8
for multi-byte characters).

There is nothing preventing an implementation from using EBCDIC for
normal characters and UCS for wide characters, in which case it is foolish
to expect 'A' == L'A'.

Furthermore, there is nothing preventing an implementation from using
ASCII for normal characters and EBCDIC for wide characters (or vice
versa). The fact that C99 supports UCNs in source code means nothing WRT
the execution character set (whose extended version need not contain any
additional characters).

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #5

"Dan Pop" <Da*****@cern.c h> wrote in message news:be******** **@sunnews.cern .ch...
In <be**********@n ews.hananet.net > "Jun Woong" <my******@hanma il.net> writes:
<la*********** *@eds.com> wrote in message news:nv******** **@cvg-65-27-189-87.cinci.rr.com ...
Bill Cunningham <so**@some.ne t> wrote:
>
> I've been reading the C standard online and I'm puzzled as to what multibyte
> chars are. Wide chars I believe would be characters for languages such as
> cantonese or Japanese. I know the ASCII character set specifies that each
> character such as 'b' or 'B' is an 8 bit character. So what's a multibyte
> character?

A single logical character that requires more than one byte to express.
For example, consider the UTF-8 encoding format for ISO 10646: normal
ASCII characters (between \x00 and \x7f) are encoded as a single byte
with the same value.


My understanding is that the standard requires 'A' == L'A' by the fact
that the basic character set must be a subset of the extended
character set.


Non sequitur. The fact that A belongs to the basic character set has
no relevance on the value of L'A', AFAICT. All the standard has to say
on the issue is:

11 A wide character constant has type wchar_t, an integer type
defined in the <stddef.h> header. The value of a wide character
constant containing a single multibyte character that maps to
a member of the extended execution character set is the wide
character corresponding to that multibyte character, as defined
by the mbtowc function, with an implementation-defined current
locale.


And in 7.17p2:

wchar_t

which is an integer type whose range of values can represent
distinct codes for all members of the largest extended character
set specified among the supported locales; the null character
shall have the code value zero and each member of the basic
character set shall have a code value equal to its value when used
as the lone character in an integer character constant.
--
Jun, Woong (my******@hanma il.net)
Dept. of Physics, Univ. of Seoul

Nov 13 '05 #6
Jun Woong <my******@hanma il.net> wrote:

My understanding is that the standard requires 'A' == L'A' by the fact
that the basic character set must be a subset of the extended
character set. Do this and what you mentioned above mean that a
character set whose code values differ from ASCII's can't be the basic
set on an implementation where code values of Unicode is used as those
of the extended set?


Yes, but. That requirement is a hold-over from the very earliest days of
extended character set support, before there were functions to convert
between wide and narrow characters. Now that those functions exist,
there is no longer any reason for the requirement, and the committee has
voted to remove it. See the committee's response to DR #279:

<http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/dr_279.htm>

-Larry Jones

Somebody's always running my life. I never get to do what I want to do.
-- Calvin
Nov 13 '05 #7
In <be**********@n ews.hananet.net > "Jun Woong" <my******@hanma il.net> writes:

And in 7.17p2:

wchar_t

which is an integer type whose range of values can represent
distinct codes for all members of the largest extended character
set specified among the supported locales; the null character
shall have the code value zero and each member of the basic
character set shall have a code value equal to its value when used
as the lone character in an integer character constant.


This requirement, carried on from C89, is simply broken: implementations
that don't use ASCII for normal characters wouldn't be able to use *any*
of the ASCII extensions (UCS, most importantly) for wide characters.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #8

"Dan Pop" <Da*****@cern.c h> wrote in message news:be******** **@sunnews.cern .ch...
In <be**********@n ews.hananet.net > "Jun Woong" <my******@hanma il.net> writes:

And in 7.17p2:

wchar_t

which is an integer type whose range of values can represent
distinct codes for all members of the largest extended character
set specified among the supported locales; the null character
shall have the code value zero and each member of the basic
character set shall have a code value equal to its value when used
as the lone character in an integer character constant.


This requirement, carried on from C89, is simply broken: implementations
that don't use ASCII for normal characters wouldn't be able to use *any*
of the ASCII extensions (UCS, most importantly) for wide characters.


Then, the proper answer to my previous question should be mention of
the DR in process, not citation of an irrelevant wording.
--
Jun, Woong (my******@hanma il.net)
Dept. of Physics, Univ. of Seoul

Nov 13 '05 #9

<la************ @eds.com> wrote in message news:73******** ***@cvg-65-27-189-87.cinci.rr.com ...
[...]

Yes, but. That requirement is a hold-over from the very earliest days of
extended character set support, before there were functions to convert
between wide and narrow characters. Now that those functions exist,
there is no longer any reason for the requirement,


Weren't there some conversion functions between wide and multibyte
characters in C90? Do you mean that the wording in question was
written before the C89 committee decided to put those functions into
the standard, or that now we have more complete set of functions to
deal with wide and multibyte characters so don't need the requirement
any more?
--
Jun, Woong (my******@hanma il.net)
Dept. of Physics, Univ. of Seoul

Nov 13 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

37
4895
by: ajikoe | last post by:
Hello, Is anyone has experiance in running python code to run multi thread parallel in multi processor. Is it possible ? Can python manage which cpu shoud do every thread? Sincerely Yours, Pujo
4
4673
by: Frank Jona | last post by:
Intellisense with C# and a multi-file assembly is not working. With VB.NET it is working. Is there a fix availible? We're using VisualStudio 2003 Regards Frank
12
3879
by: * ProteanThread * | last post by:
but depends upon the clique: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=954drf%24oca%241%40agate.berkeley.edu&rnum=2&prev=/groups%3Fq%3D%2522cross%2Bposting%2Bversus%2Bmulti%2Bposting%2522%26ie%3DUTF-8%26oe%3DUTF-8%26hl%3Den ...
0
3783
by: frankenberry | last post by:
I have multi-page tiff files. I need to extract individual frames from the multi-page tiffs and save them as single-page tiffs. 95% of the time I receive multi-page tiffs containing 1 or more black and white CCITT4 compressed files (frames) inside the tiff. Every now and then I receive a mixture of black and white CCITT4 and JPEG compressed files, and sometimes just multi-page tiffs with JPEG only. The code runs great when dealing with the...
6
8177
by: cody | last post by:
What are multi file assemblies good for? What are the advantages of using multiple assemblies (A.DLL+B.DLL) vs. a single multi file assembly (A.DLL+A.NETMODULE)?
4
17873
by: mimmo | last post by:
Hi! I should convert the accented letters of a string in the correspondent letters not accented. But when I compile with -Wall it give me: warning: multi-character character constant Do the problem is the charset? How I can avoid this warning? But the worst thing isn't the warning, but that the program doesn't work! The program execute all other operations well, but it don't print the converted letters: for example, in the string...
5
5996
by: Shane Story | last post by:
I can seem to get the dimensions of a frame in a multiframe tiff. After selecting activeframe, the Width/Height is still really much larger than the page's actual dimensions. When I split a TIFF to several PNG files this causes a problem, becuase the resulting image is (the page to the far left and a lot of black space surrounding it and a filesize that is larger than needed. Any ideas?
5
5766
by: bobwansink | last post by:
Hi, I'm relatively new to programming and I would like to create a C++ multi user program. It's for a project for school. This means I will have to write a paper about the theory too. Does anyone know a good place to start looking for some theory on the subject of multi user applications? I know only bits and pieces, like about transactions, but a compendium of possible approches to multi user programming would be very appreciated!
0
2327
by: Sabri.Pllana | last post by:
We apologize if you receive multiple copies of this call for papers. *********************************************************************** 2008 International Workshop on Multi-Core Computing Systems (MuCoCoS'08) Barcelona, Spain, March 4 - 7, 2008; in conjunction with CISIS'08. <http://www.par.univie.ac.at/~pllana/mucocos08> *********************************************************************** Context
1
9314
by: mknoll217 | last post by:
I am recieving this error from my code: The multi-part identifier "PAR.UniqueID" could not be bound. The multi-part identifier "Salary.UniqueID" could not be bound. The multi-part identifier "PAR.UniqueID" could not be bound. The multi-part identifier "PAR.PAR_Status" could not be bound. The multi-part identifier "Salary.New_Salary" could not be bound. The multi-part identifier "Salary.UniqueID" could not be bound. The multi-part...
0
8991
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9548
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9325
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9249
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8244
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6076
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4607
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3315
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2787
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.