On Thu, 21 Jul 2005 10:59:45 -0500, wob
<wo***@yahoo.com> wrote:
Many thanks for those who responded to my question of "putting greek char
into C string". In searching for an solution, I noticed that there are more
than one version of "Extended ASCII characters"(No. 128 to 255) . e.g., in
one version No. 224 is the symbol alpha, in another, it's a "a" with a ` on
it... How come?
There is no such thing as "Extended ASCII" in any meaningful form. It's
like "C with extensions", the extended parts are done by whoever wants
them.
ASCII defines /only/ characters using the bottom 7 bits, thus the
characters numbered 0 to 127. Various people have decided that they
want more, so they allocated them to codes above 127 as they felt like
it. Line drawing characters, European accented characters (at least
four versions used commonly in Europe), mathematical symbols, Cyrillic
(Russuan) characters, Greek, funny faces, you name it. And of course
Microsoft came up with its own ones different from any others.
Recently (i.e. in the last 20 years) there have been attempts to
standardise, but because all of the characters can't fit into the
'spare' 128 available positions there are lots of variants in the
ISO-8859 standard (at least 10 variants). See for instance
http://czyborra.com/charsets/iso8859.html
It was realised that what was really wanted was a much expanded
character space, to allow for the thousands of Chinese characters and
other languages to be added, so Unicode was born. This uses fixed-width
characters of either 16 or 32 bits, with each character assigned to only
one position (some of the characters look alike but are in different
national or specific sets so they are treated as different characters).
Because much software still uses 8 bit strings (and 8 bit transport
paths), Unicode also specifies a method of converting a 'wide' (16 or
32 bit) character into an string of 8 bit characters. This system,
UTF (Unicode Transformation Format) 8 keeps the ASCII characters as
individual 7 bits with the top bit of the 8 bit character zero, so it is
compatible with 7 bit ASCII, and characterss with the top bit set are
not valid on their own, only as part of a "multi-byte character" string.
The web page above has descriptions of the ISO 8859 variants, and also
points to articles and descriptions of Unicode, UTF-8 and other matters.
This is relevant to C in the support for 'wide' characters and multibyte
characters, and the functions which transform and output them.
Chris C