469,632 Members | 1,785 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,632 developers. It's quick & easy.

String constants

Hi,

I have a question about string constants. I compile the following program:

#include <stdio.h>
#include <string.h>

int main(void)
{
char str1[] = "\007";
char str2[] = "\0" "07";
char str3[] = { '\0', '0', '7', '\0' };

printf("str1 = %s\n" "str2 = %s\n" "str3 = %s\n", str1, str2, str3);
printf("sizeof(str1) = %d\n" "sizeof(str2) = %d\n"
"sizeof(str3) = %d\n", sizeof(str1), sizeof(str2),
sizeof(str3));
printf("strlen(str1) = %d\n" "strlen(str2) = %d\n"
"strlen(str3) = %d\n", strlen(str1), strlen(str2),
strlen(str3));

return 0;
}

Here is the output:

str1 =
str2 =
str3 =
sizeof(str1) = 2
sizeof(str2) = 4
sizeof(str3) = 4
strlen(str1) = 1
strlen(str2) = 0
strlen(str3) = 0

I understand that yet another obscure C feature is the octal character
specification so that \ddd is one character. However, should not str1 and str2
be the same? Obscure feature conflict (\ddd vs string concatenation)?
Jul 31 '06 #1
8 2825
MQ

gtho...@ee.ryerson.ca wrote:
Hi,

I have a question about string constants. I compile the following program:

#include <stdio.h>
#include <string.h>

int main(void)
{
char str1[] = "\007";
char str2[] = "\0" "07";
char str3[] = { '\0', '0', '7', '\0' };

printf("str1 = %s\n" "str2 = %s\n" "str3 = %s\n", str1, str2, str3);
printf("sizeof(str1) = %d\n" "sizeof(str2) = %d\n"
"sizeof(str3) = %d\n", sizeof(str1), sizeof(str2),
sizeof(str3));
printf("strlen(str1) = %d\n" "strlen(str2) = %d\n"
"strlen(str3) = %d\n", strlen(str1), strlen(str2),
strlen(str3));

return 0;
}

Here is the output:

str1 =
str2 =
str3 =
sizeof(str1) = 2
sizeof(str2) = 4
sizeof(str3) = 4
strlen(str1) = 1
strlen(str2) = 0
strlen(str3) = 0

I understand that yet another obscure C feature is the octal character
specification so that \ddd is one character. However, should not str1 and str2
be the same?
No, str1 contains a single ASCII character with value 7, followed by a
null terminator, which gives a length of two. str2 is actually three
characters, which are '\0', which is a null terminator character,
followed by the '0' character, followed by the '7' character. With the
null terminator at the end of the string, you have four characters.

str1 appears invisible because ASCII 7 is a non-printable character.
In str2 and str3 you have actually created a string which starts with a
null terminator, making the string appear to be empty (which is why
strlen returns 0 in both of these cases)

Jul 31 '06 #2
In article <ea**********@news.ryerson.ca>, <gt*****@ee.ryerson.cawrote:
char str1[] = "\007";
char str2[] = "\0" "07";
char str3[] = { '\0', '0', '7', '\0' };
>I understand that yet another obscure C feature is the octal character
specification so that \ddd is one character.
True (provided the d are all in the range 0 through 7.)
>However, should not str1 and str2
be the same? Obscure feature conflict (\ddd vs string concatenation)?
Concatenation of adjacent string literals is not done until a
later point than tokenization of the strings.

In str3, there is no concatenation taking place: you have
specified, char by char, exactly what should be put into adjacent
locations in the array.

Going back to your second string: would you expect that "\" "007"
would compile the same as "\007" ? It doesn't of course -- the
backslash escapes the double-quote, rather than being held in
suspension in case something is going to show up later.

The behaviour is well specified in C89: the octal sequence stops
at the first non-octal character.

Consider a problem in the hex escape sequences: "\xABCD".
That is treated as four hex digits, possibly split over several char.
Suppose, though, that you wanted to stop after the \xAB and you
wanted literal C and literal D: how would you do it?
The solution from the standard is that you can use "\xAB" "CD"
because the sequence ends at the first non-hex character
(the second double-quote.) But suppose it were otherwise, that
concatention took place first and then the result was maximally
tokenized: then in order to get the C to be a C, you would have to
put in the hex value corresponding to "C", and then you'd have to
put in the hex value corresponding to "D", and you'd have to
keep on encoding until finally your text happened to include something
that wasn't interpretable as hex.
--
"It is important to remember that when it comes to law, computers
never make copies, only human beings make copies. Computers are given
commands, not permission. Only people can be given permission."
-- Brad Templeton
Jul 31 '06 #3
gt*****@ee.ryerson.ca wrote:
Hi,
I have a question about string constants. I compile the following program:
#include <stdio.h>
#include <string.h>
int main(void)
{
char str1[] = "\007";
char str2[] = "\0" "07";
char str3[] = { '\0', '0', '7', '\0' };
printf("str1 = %s\n" "str2 = %s\n" "str3 = %s\n", str1, str2, str3);
printf("sizeof(str1) = %d\n" "sizeof(str2) = %d\n"
"sizeof(str3) = %d\n", sizeof(str1), sizeof(str2),
sizeof(str3));
printf("strlen(str1) = %d\n" "strlen(str2) = %d\n"
"strlen(str3) = %d\n", strlen(str1), strlen(str2),
strlen(str3));
return 0;
}
Here is the output:
str1 =
str2 =
str3 =
sizeof(str1) = 2
sizeof(str2) = 4
sizeof(str3) = 4
strlen(str1) = 1
strlen(str2) = 0
strlen(str3) = 0
I understand that yet another obscure C feature is the octal character
specification so that \ddd is one character. However, should not str1 and str2
be the same? Obscure feature conflict (\ddd vs string concatenation)?
Also, if you wanted to, for example, use a string containing '\01' '0', how
would you do this unambiguously? As in str2? How about '\0' "01" or '\0' '0'???

Jul 31 '06 #4
MQ

gtho...@ee.ryerson.ca wrote:
>
Also, if you wanted to, for example, use a string containing '\01' '0', how
would you do this unambiguously? As in str2? How about '\0' "01" or '\0' '0'???
I'm not sure what you are trying to acheive, but it seems you are not
understanding how strings work. '\0' is ASCII 0. You cannot justr
append this to a string of numbers and get a single character out of
it. You will get a string with ASCII 0 at the start (the null
character) plus the string of numbers. Can you explain what you are
trying to do so we can suggest a better way...

MQ

Jul 31 '06 #5
gt*****@ee.ryerson.ca wrote:
Also, if you wanted to, for example, use a string containing '\01' '0', how
would you do this unambiguously?
Since '\01' has value 1 I assume that what you want is a
string whose first byte has the value 1 and second byte the
value '0'. You do that with char str[] = {1,'0'}
How about '\0' "01" or '\0' '0'???
It would make your thoughts more clear for us if you
wrote the complete statement you have in mind.

Spiros Bousbouras

Jul 31 '06 #6
In article <ea**********@news.ryerson.ca <gt*****@ee.ryerson.cawrote:
>I have a question about string constants.
There are a number of tricks you need to "get straight in your head"
in order to deal with this.

First, a C string is actually a data structure, namely, an array
of "char"s in which the first zero-byte is considered the end of the
string.

Second, escapes like '\007' are interpreted by the compiler, and
the lexical rules for the octal version are:

From the backslash, consume up to (but no more than) three
octal digits, stopping when you run out of digits or when
the first "invalid" character occurs.

Hence, if you encounter

\1\29\00345

this "means" \1, then \2, then 9, then \003, then 4, then 5.

Third, string literals usually -- but not always[%] -- mean "generate
an anonymous array containing the characters given in the literal,
with a \0 character appended".

Last, adjacent string literals are concatenated after escape sequence
interpretation, but before adding the final \0.
char str1[] = "\007";
This string literal has one \7 character inside, so generates an
array containing two characters, namely \7 and \0.
char str2[] = "\0" "07";
Here there are two adjacent string literals. The first has one
\0 character inside, and the second has two characters inside,
'0' and '7'. These are concatenated -- giving '\0' '0' '7'
in that order -- and a final \0 is added. The result is the same
as if you wrote either:

char str2[] = "\00007";

or the initializer you gave for str3:
char str3[] = { '\0', '0', '7', '\0' };
Both of these create an array of size 4, containing the four
specified "char"s. Since str2 and str3 both begin with a zero
byte, their strlen()s are zero, even though both arrays continue
(always) to hold four "char"s.
>I understand that yet another obscure C feature is the octal character
specification so that \ddd is one character.
Right -- but only if the digits are uninterrupted, and all octal.
(The situation is quite different for \x escapes, as someone else
noted elsethread.)
>However, should not str1 and str2 be the same?
No; the order in which the escape-interpretation and
string-literal-concatenation occurs forbids this.

[% The two exceptions are: when the literal is not the last in an
adjacent sequence, so that concatenation occurs before adding the
\0, or when the literal is used as an initializer for an array
whose size was specified, and whose specified size is exactly large
enough to hold the characters in the literal without adding the
\0. Making use of this second exception is particularly annoying;
it reminds me of the Bad Old Days of Hollerith constants in Fortran.]
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Jul 31 '06 #7
gt*****@ee.ryerson.ca writes:
[...]
Also, if you wanted to, for example, use a string containing '\01'
'0', how would you do this unambiguously? As in str2? How about '\0'
"01" or '\0' '0'???
If you want a string containing the characters '\1' and '0', you can
use "\0010", since an octal escape has at most 3 characters. Or you
can split it into two string literals: "\1" "0".

Your second example, '\0' "01" is ill-formed; adjacent string literals
are concatenated, but character constants are not. Assuming you want
{ '\0', "0", "1" }, you can write "\00001", or, more clearly,
"\0" "01".

Similarly, for your third example, you can write "\0000" or "\0" "0".

In each case, of course, there's an implicit trailing '\0' at the end
of each string literal (after concatentation), even if the last
character is an explicit '\0' -- but this is suppressed if the string
literal is an initializer for a character array of exactly the right
size. For example:
const char x[3] = "abc";
initializes x to { 'a', 'b', 'c' }, but
const char y[] = "abc"
initializes y to { 'a', 'b', 'c', '\0' }.
--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jul 31 '06 #8
On 2006-07-31, gt*****@ee.ryerson.ca <gt*****@ee.ryerson.cawrote:
Hi,

I have a question about string constants. I compile the following program:

#include <stdio.h>
#include <string.h>

int main(void)
{
char str1[] = "\007";
If you're using ASCII, \007 is an unprintable character. Hence the
string appears empty.
char str2[] = "\0" "07";
Here you begin a string with a 0, which in C terminates a string. Hence,
the string /is/ empty.
char str3[] = { '\0', '0', '7', '\0' };
And you have the same problem here: '\0' signifies the end of a string.

<legal string-printing code snipped>
Here is the output:

str1 =
str2 =
str3 =
sizeof(str1) = 2
sizeof(str2) = 4
sizeof(str3) = 4
strlen(str1) = 1
strlen(str2) = 0
strlen(str3) = 0

I understand that yet another obscure C feature is the octal character
specification so that \ddd is one character. However, should not str1 and
str2 be the same? Obscure feature conflict (\ddd vs string concatenation)?
Concatenation (sp?) occurs before or at the same time as replacing
escape characters, which includes hexadecimal and octal numbers.

--
Andrew Poelstra <website down>
To reach my email, use <email also down>
New server ETA: 42
Jul 31 '06 #9

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Andrew Palumbo | last post: by
7 posts views Thread by al | last post: by
6 posts views Thread by ESPN Lover | last post: by
8 posts views Thread by Duncan Winn | last post: by
13 posts views Thread by Angus | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.