473,395 Members | 1,401 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Character array initialization

Hi,

if I want to store the string "123456" in a variable of type char[], I can
do it like this:

char s[] = "123456";

Or like this:

char s[] = { '1', '2', '3', '4', '5', '6', '\0' };

Or like this:

char s[7] = "123456";

These are all equivalent because string literals have an implicit '\0'
character at the end. That's why it is a mistake to write this:

char s[6] = "123456";

Here we reserve one byte less than we need. Now here's my question: Doesn't
that mistake warrant a diagnostic? I was quite baffled today when four
different compilers failed to generate even a warning when fed the above
code, even though the mistake should be easy to detect. Or am I
misunderstanding string literals?
Thanks,
Christian
Nov 14 '05 #1
9 3322
In article <2n************@uni-berlin.de>,
Christian Kandeler <ch****************@hob.de> wrote:
That's why it is a mistake to write this:

char s[6] = "123456";


It's *often* a mistake.

But try this:

char s[5] = "123456";

I expect your compilers will generate a diagnostic.

What's the difference? There's no requirement in C that arrays of
characters be null-terminated. There are just a bunch of functions
that expect it. It's quite reasonable to initialize an array of 6
characters with 6 non-null characters. On the other hand, it makes no
sense to initialize an array of 5 characters with 6 characters.

-- Richard
Nov 14 '05 #2
Christian Kandeler wrote:
That's why it is a mistake to write this:

char s[6] = "123456";


It's not always a mistake.
It's my prefered way of writing:
char s[] = { '1', '2', '3', '4', '5', '6'};

You could have code like this:

const char letter[4] = "DCBA";

if (number > 60) {
if (number > 99) {
number = 99;
}
grade = letter[(number - 60) / 10];
} else {
grade = 'F';
}
--
pete
Nov 14 '05 #3
Christian Kandeler wrote:
Here we reserve one byte less than we need. Now here's my question: Doesn't
that mistake warrant a diagnostic? I was quite baffled today when four
different compilers failed to generate even a warning when fed the above
code, even though the mistake should be easy to detect. Or am I
misunderstanding string literals?


No, because it's legal in C. From the C99 draft standard:
6.7.8 Initialization

[#14] An array of character type may be initialized by a
character string literal, optionally enclosed in braces.
Successive characters of the character string literal
(including the terminating null character if there is room
or if the array is of unknown size) initialize the elements
of the array.


Brian Rodenborn
Nov 14 '05 #4
In article <2n************@uni-berlin.de>
Christian Kandeler <ch****************@hob.de> wrote:
[string literals add a '\0' terminator, and] That's why it is a mistake
to write this:
char s[6] = "123456";
Here we reserve one byte less than we need. Now here's my question: Doesn't
that mistake warrant a diagnostic?


Back in the 1980s when the X3J11 committee was standardizing C
for the first time, this *was* an error and *did* get a diagnostic
from all existing (pre-ANSI) C compilers worthy of the name "C
compiler" (there were some strange compilers back then :-) ).

But then someone on the committee decided it would be nice to
have a way to suppress the automatic '\0'-adding for certain
special cases. Whoever it was, proposed that if the programmer
manually counted up the bytes and put the size in an array definition,
and then used a string literal to initialize an array of "char",
that it would be OK to have the array be "just one too small" to
hold the final '\0', in which case the '\0' would be suppressed.

Of course, a MUCH BETTER suggestion was sent in during the review
periods -- a new escape sequence, \z, could be used at the end of
string literals to suppress the final zero byte, so that one could
even write things like x = "\10\2\4\1\z"[i & 3], for instance, to
make a four-byte literal array without the unnecessary '\0' at the
end -- but it was "Not Invented Here" and rejected. Had \z been
accepted, you would have been able to write:

char s[] = "123456\z";

and get an array of size 6, without having to manually count -- or
perhaps mis-count -- the bytes. Leaving out the \z would get a
diagnostic, just as one would expect.

The X3J11 committee folks went along with the dumb idea :-) , so now
that is what we have.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #5
Christian Kandeler wrote:
These are all equivalent because string literals have an implicit '\0'
character at the end. That's why it is a mistake to write this:

char s[6] = "123456";
It is *not* a mistake. It declares an array of 6 chars, but not a
string. There is no law that every array of chars must be a string.
Here we reserve one byte less than we need.
No, we reserve the exactly amount of space we need for the char array.
Now here's my question: Doesn't
that mistake warrant a diagnostic?
It is not a mistake and does not require a diagnostic.
I was quite baffled today when four
different compilers failed to generate even a warning when fed the above
code, even though the mistake should be easy to detect.
It is not a mistake.
Or am I
misunderstanding string literals?


You are misunderstanding the initialization of char arrays.
Nov 14 '05 #6
Chris Torek wrote:
[...]
Of course, a MUCH BETTER suggestion was sent in during the review
periods -- a new escape sequence, \z, could be used at the end of
string literals to suppress the final zero byte, so that one could
even write things like x = "\10\2\4\1\z"[i & 3], for instance, to
make a four-byte literal array without the unnecessary '\0' at the
end -- but it was "Not Invented Here" and rejected. Had \z been
accepted, you would have been able to write:

char s[] = "123456\z";

and get an array of size 6, without having to manually count -- or
perhaps mis-count -- the bytes. Leaving out the \z would get a
diagnostic, just as one would expect.


Interesting -- but an escape sequence whose effect is
to *suppress* a character rather than generate one would
certainly be an oddity. An escape sequence that has
different effects at different positions in a literal
would be peculiar. An escape sequence that could be
used in a string literal but not in a character literal
would be downright weird!

Hmmm: Would the proposal have outlawed \z except at
the end of a string literal, or would it have had some
other meaning (perhaps implementation-defined or undefined)
at other positions? I'm thinking of things like

#define ABC "ABC\z"
char abc[] = ABC;
#define XYZ "XYZ\z"
char xyz[] = XYZ;
char abcxyz[] = ABC XYZ; // what happens?

Also, whenever a new notation crops up somebody writes
a coding style guide that recommends its use, as in

puts ("Hello, world!\0\z");

The "benefit," of course, is that the programmer can see the
formerly invisible terminator, and perhaps be less likely to
commit the common mistake of overlooking it ... Ugly!

--
Er*********@sun.com

Nov 14 '05 #7
>Chris Torek wrote:
Of course, a MUCH BETTER suggestion was sent in during the review
periods -- a new escape sequence, \z, could be used at the end of
string literals to suppress the final zero byte ...

In article <news:41**************@sun.com>
Eric Sosman <Er*********@Sun.COM> writes: Interesting -- but an escape sequence whose effect is
to *suppress* a character rather than generate one would
certainly be an oddity.
True enough.
... An escape sequence that could be
used in a string literal but not in a character literal
would be downright weird!
It might either be ignored, or diagnosed, when used in "strange"
places:
Hmmm: Would the proposal have outlawed \z except at
the end of a string literal, or would it have had some
other meaning (perhaps implementation-defined or undefined)
at other positions?
I do not recall whether the proposal even covered this, much less
what it might have said. If I were proposing it myself (e.g., if I
thought anyone might listen :-) ), I would say it gets ignored in
other positions in string literals:
I'm thinking of things like

#define ABC "ABC\z"
char abc[] = ABC;
#define XYZ "XYZ\z"
char xyz[] = XYZ;
char abcxyz[] = ABC XYZ; // what happens?
Here the concatenation would result in "ABC\zXYZ\z" which "means"
the same as just "ABCXYZ\z", i.e., sizeof abcxyz would be 6.

I would probably vote for a required diagnostic if \z is used in
character constants, so that both 'a\z' and just '\z' are errors.
Also, whenever a new notation crops up somebody writes
a coding style guide that recommends its use, as in

puts ("Hello, world!\0\z");

The "benefit," of course, is that the programmer can see the
formerly invisible terminator, and perhaps be less likely to
commit the common mistake of overlooking it ... Ugly!


Indeed. But note that we can already do something similar in C99:

puts((const char []){'H', 'e', 'l', 'l', 'o', ',', ' ',
'w', 'o', 'r', 'l', 'd', '!', '\0'});

Again, the programmer can now see the formerly-invisible terminator
(if said programmer can see anything at all, in amongst all that
syntax! :-) ).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #8
Chris Torek wrote:
.... snip ...
The X3J11 committee folks went along with the dumb idea :-) , so
now that is what we have.


And I wonder which version you supported :-) In point of fact I
can see arguments both ways, and I suspect the one that won was
something like: Why have another escape which nobody ever heard
of before for a single purpose that can be handles anyhow.

--
"I'm a war president. I make decisions here in the Oval Office
in foreign policy matters with war on my mind." - Bush.
"Churchill and Bush can both be considered wartime leaders, just
as Secretariat and Mr Ed were both horses." - James Rhodes.
"If I knew then what I know today, I would still have invaded
Iraq. It was the right decision" - G.W. Bush, 2004-08-02
Nov 14 '05 #9
Martin Ambuhl wrote:
char s[6] = "123456";


It is *not* a mistake. It declares an array of 6 chars, but not a
string. There is no law that every array of chars must be a string.


I was aware of the fact that not all character arrays are strings. I was
also aware of the fact that string literals have an implicit '\0' at the
end. What I was not aware of is the fact that this - and only this - last
character of the string literal can legally just magically disappear if
that makes it fit into a character array. And I must say that I find this
behaviour rather, umh, peculiar. Anyway, thanks to all who have answered.
Christian
Nov 14 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Kris | last post by:
Hi All, I just tried to do something that I thought would be quite simple in C++ and discovered (I think) that it's not possible. I did a bunch of reading and everything that I've seen seems to...
8
by: Peter B. Steiger | last post by:
The latest project in my ongoing quest to evolve my brain from Pascal to C is a simple word game that involves stringing together random lists of words. In the Pascal version the whole array was...
10
by: gk245 | last post by:
I have something like this: #include <stdio.h> main () { struct line { char write; char read;
6
by: Kannan | last post by:
Hi, I have question about character array initialization. In section 6.7.8 paragraph number 21, it's given that "If there are fewer initializers in a brace-enclosed list than there are...
5
by: toton | last post by:
Hi, I can initialize an array of class with a specific class as, class Test{ public: Test(int){} }; Test x = {Test(3),Test(6)}; using array initialization list. (Note Test do NOT have a...
13
by: Ivan | last post by:
Hi, What is the best syntax to use a char to index into an array. /////////////////////////////////// For example int data; data = 1;
8
by: Gary | last post by:
When you declare an array of chars and store a string in it, where is the position of the null character \0? And what happens to the unused memory locations? #include <stdio.h> int main(void)...
4
by: jameskuyper | last post by:
mkeles84 wrote: The other responses you've received have explained why 'name' needs to contain a Null-Terminated Character String (NTCS). However, it occurred to me that you might not be aware...
50
by: arunajob | last post by:
Hi all, If I have a piece of code something like this void main(void) { char * p1="abcdefghijklmn"; ............................................. }
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.