By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,285 Members | 2,109 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,285 IT Pros & Developers. It's quick & easy.

basic source character set

P: n/a
Hi

Please let me know if I have this clear. The basic source character
set is the list of (96) characters that all implementations must have
in their vocabulary. All other characters recognized by an
implementation are implementation defined, and will not necessarily be
the same across implementations. The key issue as far as developers
are concerned is that if they want their code to be perfectly
portable, then they must restrict their source files to using only
characters from the basic source character set, or use universal
character names to insert characters outside of the basic source
character set.

For example, the following code is not strictly portable:

char *str = "$";

since the "$" character is not a member of the basic source character
set. To make it portable, you would need to do the following

char *str = "\u0024";

regards, B.

Aug 25 '07 #1
Share this Question
Share on Google+
6 Replies


P: n/a
bo*******@gmail.com said:
Hi

Please let me know if I have this clear. The basic source character
set is the list of (96) characters that all implementations must have
in their vocabulary. All other characters recognized by an
implementation are implementation defined, and will not necessarily be
the same across implementations. The key issue as far as developers
are concerned is that if they want their code to be perfectly
portable, then they must restrict their source files to using only
characters from the basic source character set, or use universal
character names to insert characters outside of the basic source
character set.
Yes, that's basically it. In practice, I think you'll be okay with all
the printable characters that are in the common subset of ASCII and
EBCDIC, although I await correction on the matter from those who have
used conforming C implementations that employ more esoteric source
character sets. Unfortunately, however, AFAICT this only extends the
basic character set by two: $ and @
For example, the following code is not strictly portable:

char *str = "$";

since the "$" character is not a member of the basic source character
set.
Strictly speaking, you are correct, yes. Of course, you can /read/ a '$'
character from an open stream at runtime without any trouble at all, if
one happens to be present and is representable as an unsigned char.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Aug 25 '07 #2

P: n/a
bo*******@gmail.com wrote:
>
Please let me know if I have this clear. The basic source
character set is the list of (96) characters that all
implementations must have in their vocabulary. All other
characters recognized by an implementation are implementation
defined, and will not necessarily be the same across
implementations. The key issue as far as developers are
concerned is that if they want their code to be perfectly
portable, then they must restrict their source files to using
only characters from the basic source character set, or use
universal character names to insert characters outside of the
basic source character set.
Not quite. Including space, there are 92 printing chars in the
basic set (not 96). Chars such as $ are language dependant, and
may therefore be different on other machines. Other missing chars
are '@', '`' and the rubout (hex 7f in ASCII). The following is an
extract from N869:

[#3] Both the basic source and basic execution character
sets shall have at least the following members: the 26
uppercase letters of the Latin alphabet

A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

the 26 lowercase letters of the Latin alphabet

a b c d e f g h i j k l m
n o p q r s t u v w x y z

the 10 decimal digits

0 1 2 3 4 5 6 7 8 9

the following 29 graphic characters

! " # % & ' ( ) * + , - . / :
; < = ? [ \ ] ^ _ { | } ~

the space character, and control characters representing
horizontal tab, vertical tab, and form feed. The

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 25 '07 #3

P: n/a
On Sat, 25 Aug 2007 07:07:56 -0400, CBFalconer wrote:
bo*******@gmail.com wrote:
>>
Please let me know if I have this clear. The basic source
character set is the list of (96) characters that all
implementations must have in their vocabulary. All other
characters recognized by an implementation are implementation
defined, and will not necessarily be the same across
implementations. The key issue as far as developers are
concerned is that if they want their code to be perfectly
portable, then they must restrict their source files to using
only characters from the basic source character set, or use
universal character names to insert characters outside of the
basic source character set.

Not quite. Including space, there are 92 printing chars in the
basic set (not 96).
He did not specify "printing characters", so he's only off by one.
[...]
the space character, and control characters representing
horizontal tab, vertical tab, and form feed. The
--
Army1987 (Replace "NOSPAM" with "email")
No-one ever won a game by resigning. -- S. Tartakower

Aug 26 '07 #4

P: n/a
On Aug 25, 5:39 pm, boroph...@gmail.com wrote:
...if [developers] want their code to be perfectly
portable, then they must restrict their source files to
using only characters from the basic source character set,
Yes.
or use universal character names to insert characters
outside of the basic source character set.
If you have a supporting compiler.
>
For example, the following code is not strictly portable:

char *str = "$";

since the "$" character is not a member of the basic source
character set.
Correct.
To make it portable, you would need to do the following

char *str = "\u0024";
That's fine for the source, but it won't actually help you
when the program executes. There is still no guarantee that
the dollar sign is a member of the execution character set,
even though you can now 'name' it.

You'll get a dollar sign on the systems that have them, but
you'll get an implementation defined character on the systems
that don't.

Given that programs that _need_ $ and @ invariably need 'A'
to be 65 as well, you might as well go ahead and use them in
the source.

[Aside: One of the pre-standard drafts of C99 actually
precluded the naming of $ and @ with universal character
escapes. Fortunately, someone alerted the Committee of
their apparent use in some circles. :-]

--
Peter

Aug 27 '07 #5

P: n/a
Peter Nilsson said:

<snip>
Given that programs that _need_ $ and @ invariably need 'A'
to be 65 as well, you might as well go ahead and use them in
the source.
But this is not true. I've worked on a number of programs that needed a
'$' but which were quite happy for 'A' to have a non-65 code point (and
it's just as well, since they often had to run on systems where 'A' was
in fact not 65).

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Aug 27 '07 #6

P: n/a
Peter Nilsson <ai***@acay.com.auwrote:
Given that programs that _need_ $ and @ invariably need 'A'
to be 65 as well, you might as well go ahead and use them in
the source.
A large amount of accounting software written to run on IBM systems
would be surprised to hear that (though I don't know whether any of that
software was written in C).

Richard
Aug 27 '07 #7

This discussion thread is closed

Replies have been disabled for this discussion.