By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,795 Members | 1,832 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,795 IT Pros & Developers. It's quick & easy.

Binary storage of string constants

P: n/a
Suppose I define a char* variable as follows:

char *s = "";

What actually gets put into the binary? Presumably, it gets stored in
the encoding of the source file. Am I right? Or is it compiler/platform
dependent?

The C spec suggests that string constants get mapped in an
implementation-defined manner to members of the execution character
set. Does this mean that some compilers perform iconv-esque conversion
between the source and execution character sets at runtime? If so, does
this mean the result of strlen(s) may vary depending on the execution
character set?

Thanks in advance.

Jun 29 '06 #1
Share this Question
Share on Google+
10 Replies


P: n/a
Ross said:
Suppose I define a char* variable as follows:

char *s = "";
I don't know what you wrote there, but on my display I can see a little
white square. (Just a quick tip: use const char * when pointing at string
literals.)
What actually gets put into the binary?
It depends. The value might not even make it into the binary, depending on
whether s gets used. But typically the coding point of the character will
appear in the binary somewhere.
Presumably, it gets stored in
the encoding of the source file. Am I right? Or is it compiler/platform
dependent?


Very much so.

<snip>

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
Jun 29 '06 #2

P: n/a
Richard Heathfield wrote:
Ross said:
Suppose I define a char* variable as follows:

char *s = "€";


I don't know what you wrote there, but on my display I can see a little
white square.


Looks like a Euro here in this newsreader (knode). And when I tried pasting
it into a command window (konsole), it seemed to become a zero-width
character - and backspacing rubbed out a space in the command prompt!

--
Chris "nice icon for a planeship flying left" Dollin
A rock is not a fact. A rock is a rock.

Jun 29 '06 #3

P: n/a
Yeah, it was supposed to be a Euro symbol.

Any idea what happens at runtime, then? Is it possible that the string
gets converted into the execution chacacter set, or will it just remain
'as is' in the source character set? Does the same apply to character
constants?

Jun 29 '06 #4

P: n/a
Chris Dollin said:
Richard Heathfield wrote:
Ross said:
Suppose I define a char* variable as follows:

char *s = "";
I don't know what you wrote there, but on my display I can see a little
white square.


Looks like a Euro here in this newsreader (knode).


I'm using knode too. Perhaps Euros look like little white squares. (I must
admit I thought they were triangular rubber coins 6800 miles on a side, but
I've never actually seen one, so I could be wrong about that.)
And when I tried
pasting it into a command window (konsole), it seemed to become a
zero-width character - and backspacing rubbed out a space in the command
prompt!


Oopsie. If I were you, I'd sue the OP for breach of command prompt.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
Jun 29 '06 #5

P: n/a
Richard Heathfield posted:

Looks like a Euro here in this newsreader (knode).


I'm using knode too. Perhaps Euros look like little white squares. (I
must admit I thought they were triangular rubber coins 6800 miles on a
side, but I've never actually seen one, so I could be wrong about
that.)

For anyone who's interested:

http://www.joerch.org/coins/euro-r.html

--

Frederick Gotham
Jun 29 '06 #6

P: n/a
Ross wrote:
Suppose I define a char* variable as follows:

char *s = "";

What actually gets put into the binary?
You don't know.
Presumably, it gets stored in the encoding of the source file. Am I
right?
No. The encoding of the source file is in principle completely immaterial to
whatever the compiler output is. Theoretically, the compiler could even
produce code that "computes" your strings just-in-time, so there aren't any
characters in the binary at all.
Or is it compiler/platform dependent?
Yes.
The C spec suggests that string constants get mapped in an
implementation-defined manner to members of the execution character
set. Does this mean that some compilers perform iconv-esque conversion
between the source and execution character sets at runtime?
The compiler isn't allowed to do that. Mapping of characters int the source
character set to the execution character set takes place at translation
time. "Implementation-defined" just means that the way of mapping has to be
documented.
If so, does this mean the result of strlen(s) may vary depending on the
execution character set?

Only insofar as the results of strlen() depend on the execution character
set used at translation (which is when the mapping from source character set
to execution character set happens).

When strlen() gets around, all that's left are characters stored in bytes.
strlen() counts these characters, which is the same as the number of bytes
they occupy. The result of strlen() on "the same" string may therefore vary
with platform, and even with compilation on the same platform, but not with
execution of the same translated program.

S.
Jun 29 '06 #7

P: n/a
Does this mean that 'execution character set' is referring to the
execution of the compiler, rather than the execution of the compiled
program (as I had assumed)?

Jun 30 '06 #8

P: n/a
Scrub that. I'm pretty sure that 'execution character set' is referring
to the execution of the compiled program. I guess the real question
should be: what is meant by 'translation time'? If it's synonymous with
'compilation time', how can the compiler know what the execution
character set is going to be? Surely this depends on the locale of the
system on which the compiled program is executed?

Jun 30 '06 #9

P: n/a
OK, I see where I'm going wrong. The execution character set is fixed
at compile time and is in no way affected by the locale of the system
in which the binary is executed.

Thanks to one and all.

Jun 30 '06 #10

P: n/a
"Ross" <ro************@yahoo.co.uk> writes:
Does this mean that 'execution character set' is referring to the
execution of the compiler, rather than the execution of the compiled
program (as I had assumed)?


Does *what* mean that?

Google Groups, for a long time, made it gratuitously difficult to
provide proper context when posting a followup, but I believe the
problem has been corrected.

Please leave enough context in your followups so that they make some
sense even if the previous article isn't available. Those of us who
read everything in this newsgroup can't remember all the details of
every thread.

<http://cfaj.freeshell.org/google/> has information (now obsolete)
about how to work around Google's former bug; it also has a number of
links to articles with good information about posting to Usenet.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jun 30 '06 #11

This discussion thread is closed

Replies have been disabled for this discussion.