By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,723 Members | 1,298 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,723 IT Pros & Developers. It's quick & easy.

Re: Backtick, at-sign, dollar-sign: legal in source?

P: n/a

Some months ago, I'd asked this group:

"Backtick, at-sign, dollar-sign: legal in source?"

I got 20 replies, most of which were off-topic and did not
answer the question.

Keith Thompson provided the only direct, on-topic reply.
He wrote:
Those three characters cannot be used at all in portable
C programs.
Thank you. I'd always wondered about that.

I've never had a compiler reject a source file because it
contained string literals or comments containing the three
characters in question (or indeed any glyphical iso-8859-1
characters). But still, if the standard doesn't require
compilers to properly process source code containing characters
outside the "basic source character set", it's probably better
not to use such characters.

For portability, I suppose instead of

char asdf[] = "Three @ `$455.50' each";
printf("%s\n", asdf);

one could write

char asdf[] = "Three \100 \140\44" "455.50' each";
printf("%s\n", asdf);

(This has the disadvantage, though, that if the execution
environment is using a character encoding other than ASCII or
iso-8859-1, this might not print what the programmer expects.)

--
Cheers,
Robbie Hatley
lonewolf aatt well dott com
www dott well dott com slant user slant lonewolf slant
Jul 14 '08 #1
Share this Question
Share on Google+
4 Replies


P: n/a
"Robbie Hatley" <se**************@for.my.email.addresswrites:
Some months ago, I'd asked this group:

"Backtick, at-sign, dollar-sign: legal in source?"

I got 20 replies, most of which were off-topic and did not
answer the question.

Keith Thompson provided the only direct, on-topic reply.
He wrote:
>Those three characters cannot be used at all in portable
C programs.

Thank you. I'd always wondered about that.

I've never had a compiler reject a source file because it
contained string literals or comments containing the three
characters in question (or indeed any glyphical iso-8859-1
characters). But still, if the standard doesn't require
compilers to properly process source code containing characters
outside the "basic source character set", it's probably better
not to use such characters.

For portability, I suppose instead of

char asdf[] = "Three @ `$455.50' each";
printf("%s\n", asdf);

one could write

char asdf[] = "Three \100 \140\44" "455.50' each";
printf("%s\n", asdf);
Well, that's not what I'd do.

If a compiler doesn't support those characters, then the original
version has the advantage that the compiler is very likely to reject
it -- which is exactly what you want. (I'm assuming that the compiler
will accept the characters in a source file if and only if they can be
displayed in the target environment.)
(This has the disadvantage, though, that if the execution
environment is using a character encoding other than ASCII or
iso-8859-1, this might not print what the programmer expects.)
Right.

My answer was in terms of what the standard requires of all conforming
implementations. In real life, the odds of '$' not being supported
are so low that I just wouldn't worry about it. If I want to display
a '$' character, I'll write '$' in my source file. The fact that this
limits the portability of my code (to all implementations that
actually exist) isn't much of a problem.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 15 '08 #2

P: n/a
Keith Thompson schrieb:
>one could write

char asdf[] = "Three \100 \140\44" "455.50' each";
printf("%s\n", asdf);

Well, that's not what I'd do.
Me neither. Could it be possible under some obscure imaginary character
set that the above code compiles to

char asdf[] = "Three ? %s" "455.50' each";

which would lead to UB? Or are the positions of '%' and 's' in the
character set specified?
>(This has the disadvantage, though, that if the execution
environment is using a character encoding other than ASCII or
iso-8859-1, this might not print what the programmer expects.)

Right.
Well, the original code has the exact same problem, when the character
set is wrong.

char asfs[] = "Das ist ein Ümlaut";
printf("%s\n", asdf);

when typed on my (UTF-8) machine prints when executed on my (UTF-8) machine

Das ist ein Ümlaut

but when compiled and executed on an ISO-8859-1 machine it displays

Das ist ein Ümlaut

Kind regards,
Johannes

--
"Wer etwas kritisiert muss es noch lange nicht selber besser können. Es
reicht zu wissen, daß andere es besser können und andere es auch
besser machen um einen Vergleich zu bringen." - Wolfgang Gerber
in de.sci.electronics <47***********************@news.freenet.de>
Jul 15 '08 #3

P: n/a
Johannes Bauer said:
Keith Thompson schrieb:
>>one could write

char asdf[] = "Three \100 \140\44" "455.50' each";
printf("%s\n", asdf);

Well, that's not what I'd do.

Me neither.
Right. What I'd do is read the appropriate characters in from file, leaving
the local expertise to provide a file that gives the best representation
of the glyphs I want, using the characters available in the execution
character set. Either that or use words instead of symbols (I don't know
what the above is supposed to represent, but guessing sensibly, I might
write something like "Three at GBP 455.50 each").
Could it be possible under some obscure imaginary character
set that the above code compiles to

char asdf[] = "Three ? %s" "455.50' each";

which would lead to UB?
Yes, it's possible. Whilst the ?, %, and s symbols must be in the source
character set and must have positive values, there's nothing in the rules
to stop them having the specified code points.

<snip>

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Jul 15 '08 #4

P: n/a
"Robbie Hatley" <se**************@for.my.email.addresswrote:
For portability, I suppose instead of

char asdf[] = "Three @ `$455.50' each";
printf("%s\n", asdf);

one could write

char asdf[] = "Three \100 \140\44" "455.50' each";
printf("%s\n", asdf);

(This has the disadvantage, though, that if the execution
environment is using a character encoding other than ASCII or
iso-8859-1, this might not print what the programmer expects.)
It is, in fact, worse than the original. If you transfer code which
contains a $ to a system which uses another character set, doing so will
usually include a translation process between the two character sets,
and the translation tool should complain about untranslatable characters
if the new charset has no $. You can then take measures.
If, OTOH, you translate \44 to another charset, you get \44, no
warnings, no questions asked. This is probably not going to be a $; what
is worse, it might even be another currency sign. Imagine suddenly
having to pay 50% more than you did, just because \44 happens to be the
euro sign on the new system...

Richard
Jul 16 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.