basic source character set

borophyll

Hi

Please let me know if I have this clear. The basic source character
set is the list of (96) characters that all implementations must have
in their vocabulary. All other characters recognized by an
implementation are implementation defined, and will not necessarily be
the same across implementations. The key issue as far as developers
are concerned is that if they want their code to be perfectly
portable, then they must restrict their source files to using only
characters from the basic source character set, or use universal
character names to insert characters outside of the basic source
character set.

For example, the following code is not strictly portable:

char *str = "$";

since the "$" character is not a member of the basic source character
set. To make it portable, you would need to do the following

char *str = "\u0024";

regards, B.

Aug 25 '07 #1

Subscribe Post Reply

4504

Richard Heathfield

bo*******@gmail.com said:

Hi

Please let me know if I have this clear. The basic source character
set is the list of (96) characters that all implementations must have
in their vocabulary. All other characters recognized by an
implementation are implementation defined, and will not necessarily be
the same across implementations. The key issue as far as developers
are concerned is that if they want their code to be perfectly
portable, then they must restrict their source files to using only
characters from the basic source character set, or use universal
character names to insert characters outside of the basic source
character set.

Yes, that's basically it. In practice, I think you'll be okay with all
the printable characters that are in the common subset of ASCII and
EBCDIC, although I await correction on the matter from those who have
used conforming C implementations that employ more esoteric source
character sets. Unfortunately, however, AFAICT this only extends the
basic character set by two: $ and @

For example, the following code is not strictly portable:

char *str = "$";

since the "$" character is not a member of the basic source character
set.

Strictly speaking, you are correct, yes. Of course, you can /read/ a '$'
character from an open stream at runtime without any trouble at all, if
one happens to be present and is representable as an unsigned char.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 25 '07 #2

CBFalconer

bo*******@gmail.com wrote:

>
Please let me know if I have this clear. The basic source
character set is the list of (96) characters that all
implementations must have in their vocabulary. All other
characters recognized by an implementation are implementation
defined, and will not necessarily be the same across
implementations. The key issue as far as developers are
concerned is that if they want their code to be perfectly
portable, then they must restrict their source files to using
only characters from the basic source character set, or use
universal character names to insert characters outside of the
basic source character set.

Not quite. Including space, there are 92 printing chars in the
basic set (not 96). Chars such as $ are language dependant, and
may therefore be different on other machines. Other missing chars
are '@', '`' and the rubout (hex 7f in ASCII). The following is an
extract from N869:

[#3] Both the basic source and basic execution character
sets shall have at least the following members: the 26
uppercase letters of the Latin alphabet

A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

the 26 lowercase letters of the Latin alphabet

a b c d e f g h i j k l m
n o p q r s t u v w x y z

the 10 decimal digits

0 1 2 3 4 5 6 7 8 9

the following 29 graphic characters

! " # % & ' ( ) * + , - . / :
; < = ? [ \ ] ^ _ { | } ~

the space character, and control characters representing
horizontal tab, vertical tab, and form feed. The

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 25 '07 #3

Army1987

On Sat, 25 Aug 2007 07:07:56 -0400, CBFalconer wrote:

bo*******@gmail.com wrote:
>>
Please let me know if I have this clear. The basic source
character set is the list of (96) characters that all
implementations must have in their vocabulary. All other
characters recognized by an implementation are implementation
defined, and will not necessarily be the same across
implementations. The key issue as far as developers are
concerned is that if they want their code to be perfectly
portable, then they must restrict their source files to using
only characters from the basic source character set, or use
universal character names to insert characters outside of the
basic source character set.

Not quite. Including space, there are 92 printing chars in the
basic set (not 96).

He did not specify "printing characters", so he's only off by one.
[...]

the space character, and control characters representing
horizontal tab, vertical tab, and form feed. The

--
Army1987 (Replace "NOSPAM" with "email")
No-one ever won a game by resigning. -- S. Tartakower

Aug 26 '07 #4

Peter Nilsson

On Aug 25, 5:39 pm, boroph...@gmail.com wrote:

...if [developers] want their code to be perfectly
portable, then they must restrict their source files to
using only characters from the basic source character set,

Yes.

or use universal character names to insert characters
outside of the basic source character set.

If you have a supporting compiler.

>
For example, the following code is not strictly portable:

char *str = "$";

since the "$" character is not a member of the basic source
character set.

Correct.

To make it portable, you would need to do the following

char *str = "\u0024";

That's fine for the source, but it won't actually help you
when the program executes. There is still no guarantee that
the dollar sign is a member of the execution character set,
even though you can now 'name' it.

You'll get a dollar sign on the systems that have them, but
you'll get an implementation defined character on the systems
that don't.

Given that programs that _need_ $ and @ invariably need 'A'
to be 65 as well, you might as well go ahead and use them in
the source.

[Aside: One of the pre-standard drafts of C99 actually
precluded the naming of $ and @ with universal character
escapes. Fortunately, someone alerted the Committee of
their apparent use in some circles. :-]

--
Peter

Aug 27 '07 #5

Richard Heathfield

Peter Nilsson said:

<snip>

Given that programs that _need_ $ and @ invariably need 'A'
to be 65 as well, you might as well go ahead and use them in
the source.

But this is not true. I've worked on a number of programs that needed a
'$' but which were quite happy for 'A' to have a non-65 code point (and
it's just as well, since they often had to run on systems where 'A' was
in fact not 65).

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 27 '07 #6

Richard Bos

Peter Nilsson <ai***@acay.com.auwrote:

Given that programs that _need_ $ and @ invariably need 'A'
to be 65 as well, you might as well go ahead and use them in
the source.

A large amount of accounting software written to run on IBM systems
would be surprised to hear that (though I don't know whether any of that
software was written in C).

Richard

Aug 27 '07 #7

Similar topics

Proposal: require 7-bit source str's

by: Hallvard B Furuseth | last post by:

Now that the '-*- coding: <charset> -*-' feature has arrived, I'd like to see an addition: # -*- str7bit:True -*- After the source file has been converted to Unicode, cause a parse error if a...

Python

PEP 263 status check

by: John Roth | last post by:

PEP 263 is marked finished in the PEP index, however I haven't seen the specified Phase 2 in the list of changes for 2.4 which is when I expected it. Did phase 2 get cancelled, or is it just not...

Python

basic c i/o and EOF

by: Dave | last post by:

Below is the code ive written just to count the characters typed in. I assumed EOF is -1, so if i type -1 and then press enter shouldnt the program end? It orks if i put something like 'q' in the...

C / C++

System Equivalents of Visual Basic Namespace methods and propertie

by: Chris Lane | last post by:

Hi, I have been searching for a possible list that shows what methods or properties in the System Names replace the ones in the Visual Basic Namespace so I can stop using the Visual Basic...

Visual Basic .NET

Removing Tabs from source XML in output from XSLT

by: cmay | last post by:

I am having this problem... Lets say that your source XML is formatted like this: <somenode> Here is some text Here is some more text </somenode> When to a <xsl:value-of select="somenode" /I...

.NET Framework

111

I have no programming experience. Would you recommend C?

by: Enteng | last post by:

Hi I'm thinking about learning C as my first programming language. Would you recommend it? Also how do you suggest that I learn it?What books/tutorials should I read for someone like me? Thanks...

C / C++

Converting from an ancient BASIC to new database

by: Randy Reimers | last post by:

(Hope I'm posting this correctly, otherwise - sorry!, don't know what else to do) I wrote a set of programs "many" years ago, running in a type of basic, called "Thoroughbred Basic", a type of...

Visual Basic .NET

Setting the encoding in the basic auth header

by: siddhartag | last post by:

This is not strictly a python question, but I'm hoping someone here has come across a similar situation. I have a django app and I've protected some views with basic authentication. The user can...

Python

MySQL Basic Tutorial

by: Atli | last post by:

This is an easy to digest 12 step guide on basics of using MySQL. It's a great refresher for those who need it and it work's great for first time MySQL users. Anyone should be able to get...

MySQL Database

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing