473,386 Members | 1,745 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

basic source character set

Hi

Please let me know if I have this clear. The basic source character
set is the list of (96) characters that all implementations must have
in their vocabulary. All other characters recognized by an
implementation are implementation defined, and will not necessarily be
the same across implementations. The key issue as far as developers
are concerned is that if they want their code to be perfectly
portable, then they must restrict their source files to using only
characters from the basic source character set, or use universal
character names to insert characters outside of the basic source
character set.

For example, the following code is not strictly portable:

char *str = "$";

since the "$" character is not a member of the basic source character
set. To make it portable, you would need to do the following

char *str = "\u0024";

regards, B.

Aug 25 '07 #1
6 4504
bo*******@gmail.com said:
Hi

Please let me know if I have this clear. The basic source character
set is the list of (96) characters that all implementations must have
in their vocabulary. All other characters recognized by an
implementation are implementation defined, and will not necessarily be
the same across implementations. The key issue as far as developers
are concerned is that if they want their code to be perfectly
portable, then they must restrict their source files to using only
characters from the basic source character set, or use universal
character names to insert characters outside of the basic source
character set.
Yes, that's basically it. In practice, I think you'll be okay with all
the printable characters that are in the common subset of ASCII and
EBCDIC, although I await correction on the matter from those who have
used conforming C implementations that employ more esoteric source
character sets. Unfortunately, however, AFAICT this only extends the
basic character set by two: $ and @
For example, the following code is not strictly portable:

char *str = "$";

since the "$" character is not a member of the basic source character
set.
Strictly speaking, you are correct, yes. Of course, you can /read/ a '$'
character from an open stream at runtime without any trouble at all, if
one happens to be present and is representable as an unsigned char.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Aug 25 '07 #2
bo*******@gmail.com wrote:
>
Please let me know if I have this clear. The basic source
character set is the list of (96) characters that all
implementations must have in their vocabulary. All other
characters recognized by an implementation are implementation
defined, and will not necessarily be the same across
implementations. The key issue as far as developers are
concerned is that if they want their code to be perfectly
portable, then they must restrict their source files to using
only characters from the basic source character set, or use
universal character names to insert characters outside of the
basic source character set.
Not quite. Including space, there are 92 printing chars in the
basic set (not 96). Chars such as $ are language dependant, and
may therefore be different on other machines. Other missing chars
are '@', '`' and the rubout (hex 7f in ASCII). The following is an
extract from N869:

[#3] Both the basic source and basic execution character
sets shall have at least the following members: the 26
uppercase letters of the Latin alphabet

A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

the 26 lowercase letters of the Latin alphabet

a b c d e f g h i j k l m
n o p q r s t u v w x y z

the 10 decimal digits

0 1 2 3 4 5 6 7 8 9

the following 29 graphic characters

! " # % & ' ( ) * + , - . / :
; < = ? [ \ ] ^ _ { | } ~

the space character, and control characters representing
horizontal tab, vertical tab, and form feed. The

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 25 '07 #3
On Sat, 25 Aug 2007 07:07:56 -0400, CBFalconer wrote:
bo*******@gmail.com wrote:
>>
Please let me know if I have this clear. The basic source
character set is the list of (96) characters that all
implementations must have in their vocabulary. All other
characters recognized by an implementation are implementation
defined, and will not necessarily be the same across
implementations. The key issue as far as developers are
concerned is that if they want their code to be perfectly
portable, then they must restrict their source files to using
only characters from the basic source character set, or use
universal character names to insert characters outside of the
basic source character set.

Not quite. Including space, there are 92 printing chars in the
basic set (not 96).
He did not specify "printing characters", so he's only off by one.
[...]
the space character, and control characters representing
horizontal tab, vertical tab, and form feed. The
--
Army1987 (Replace "NOSPAM" with "email")
No-one ever won a game by resigning. -- S. Tartakower

Aug 26 '07 #4
On Aug 25, 5:39 pm, boroph...@gmail.com wrote:
...if [developers] want their code to be perfectly
portable, then they must restrict their source files to
using only characters from the basic source character set,
Yes.
or use universal character names to insert characters
outside of the basic source character set.
If you have a supporting compiler.
>
For example, the following code is not strictly portable:

char *str = "$";

since the "$" character is not a member of the basic source
character set.
Correct.
To make it portable, you would need to do the following

char *str = "\u0024";
That's fine for the source, but it won't actually help you
when the program executes. There is still no guarantee that
the dollar sign is a member of the execution character set,
even though you can now 'name' it.

You'll get a dollar sign on the systems that have them, but
you'll get an implementation defined character on the systems
that don't.

Given that programs that _need_ $ and @ invariably need 'A'
to be 65 as well, you might as well go ahead and use them in
the source.

[Aside: One of the pre-standard drafts of C99 actually
precluded the naming of $ and @ with universal character
escapes. Fortunately, someone alerted the Committee of
their apparent use in some circles. :-]

--
Peter

Aug 27 '07 #5
Peter Nilsson said:

<snip>
Given that programs that _need_ $ and @ invariably need 'A'
to be 65 as well, you might as well go ahead and use them in
the source.
But this is not true. I've worked on a number of programs that needed a
'$' but which were quite happy for 'A' to have a non-65 code point (and
it's just as well, since they often had to run on systems where 'A' was
in fact not 65).

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Aug 27 '07 #6
Peter Nilsson <ai***@acay.com.auwrote:
Given that programs that _need_ $ and @ invariably need 'A'
to be 65 as well, you might as well go ahead and use them in
the source.
A large amount of accounting software written to run on IBM systems
would be surprised to hear that (though I don't know whether any of that
software was written in C).

Richard
Aug 27 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

30
by: Hallvard B Furuseth | last post by:
Now that the '-*- coding: <charset> -*-' feature has arrived, I'd like to see an addition: # -*- str7bit:True -*- After the source file has been converted to Unicode, cause a parse error if a...
27
by: John Roth | last post by:
PEP 263 is marked finished in the PEP index, however I haven't seen the specified Phase 2 in the list of changes for 2.4 which is when I expected it. Did phase 2 get cancelled, or is it just not...
28
by: Dave | last post by:
Below is the code ive written just to count the characters typed in. I assumed EOF is -1, so if i type -1 and then press enter shouldnt the program end? It orks if i put something like 'q' in the...
6
by: Chris Lane | last post by:
Hi, I have been searching for a possible list that shows what methods or properties in the System Names replace the ones in the Visual Basic Namespace so I can stop using the Visual Basic...
11
by: cmay | last post by:
I am having this problem... Lets say that your source XML is formatted like this: <somenode> Here is some text Here is some more text </somenode> When to a <xsl:value-of select="somenode" /I...
111
by: Enteng | last post by:
Hi I'm thinking about learning C as my first programming language. Would you recommend it? Also how do you suggest that I learn it?What books/tutorials should I read for someone like me? Thanks...
28
by: Randy Reimers | last post by:
(Hope I'm posting this correctly, otherwise - sorry!, don't know what else to do) I wrote a set of programs "many" years ago, running in a type of basic, called "Thoroughbred Basic", a type of...
3
by: siddhartag | last post by:
This is not strictly a python question, but I'm hoping someone here has come across a similar situation. I have a django app and I've protected some views with basic authentication. The user can...
6
Atli
by: Atli | last post by:
This is an easy to digest 12 step guide on basics of using MySQL. It's a great refresher for those who need it and it work's great for first time MySQL users. Anyone should be able to get...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.