By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,795 Members | 1,173 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,795 IT Pros & Developers. It's quick & easy.

Assigning values to char arrays

P: n/a
Hi all,

here's an elementary question. Assume I have declared two variables,

char *a, **b;

I can then give a value to a like

a="hello world";

The question is, how should I assign values to b? A simple

b[0]="string";

results in a segmentation fault.

Answers greatly appreciated.

Regards, Emyl.

Nov 2 '07 #1
Share this Question
Share on Google+
43 Replies


P: n/a
emyl <kw****@yahoo.comwrites:
Hi all,

here's an elementary question. Assume I have declared two variables,

char *a, **b;
b is a pointer to a char pointer.
>
I can then give a value to a like

a="hello world";

The question is, how should I assign values to b? A simple

b[0]="string";
*b = "s";
>
results in a segmentation fault.

Answers greatly appreciated.

Regards, Emyl.
Nov 2 '07 #2

P: n/a
emyl said:
Hi all,

here's an elementary question. Assume I have declared two variables,

char *a, **b;

I can then give a value to a like

a="hello world";

The question is, how should I assign values to b? A simple

b[0]="string";

results in a segmentation fault.

Answers greatly appreciated.
Your definition:

char *a, **b;

reserves sufficient storage for a pointer-to-char named a, and a
pointer-to-pointer-to-char named b.

a="hello world";

assigns a value to this pointer-to-char, the value in question being the
address of the first character in the given string literal.

But b[0]="string"; is a problem, not because there's anything wrong with
the syntax, but because you've made an incorrect assumption.

b is a pointer-to-pointer-to-char, but you haven't pointed it to any
pointers-to-char, so it is currently indeterminate. b[0]="string"; is
*not* an attempt to give a value to b. It is an attempt to give a value to
b[0]. But b[0] is meaningless unless b has a meaningful value.

You can give b a meaningful value in any of several ways, but the most
obvious is to allocate some fresh memory for it:

#include <stdlib.h>

/* allocate memory for some pointers-to-char */
char **cpalloc(size_t n)
{
char **ptr = malloc(n * sizeof *ptr);
if(ptr != NULL)
{
while(n--)
{
ptr[n] = NULL;
}
}
return ptr;
}

#include <stdio.h>

int main(void)
{
char *a, **b;
a = "what has it got in its pocketses?";
b = cpalloc(2);
if(b != NULL)
{
b[0] = "string";
b[1] = "nothing";

printf("%s\n", a);
printf("%s or %s\n", b[0], b[1]);

free(b);
}
return 0;
}

Be careful. The cpalloc function written above does not allocate storage
for strings, only for a collection of pointers to char. A pointer to char
is sufficient for pointing at a string, but not for storing it.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Nov 2 '07 #3

P: n/a
emyl wrote:
>
here's an elementary question. Assume I have declared two variables,
char *a, **b;
I can then give a value to a like
a="hello world";
The question is, how should I assign values to b? A simple
b[0]="string";
results in a segmentation fault.
"char **b;" declares a pointer to a pointer to char. You could
initialize it with "b = &a;" (provided the a declaration is
present). Then **b is a[0].

However note that your initialization of <a = "hello world";>
leaves a pointing to an unmodifiable string.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Nov 2 '07 #4

P: n/a
Richard <rg****@gmail.comwrites:
emyl <kw****@yahoo.comwrites:
>Hi all,

here's an elementary question. Assume I have declared two variables,

char *a, **b;

b is a pointer to a char pointer.
>>
I can then give a value to a like

a="hello world";

The question is, how should I assign values to b? A simple

b[0]="string";

*b = "s";
*slaps cold water on face*

Sorry. That was bullshit. I thought I had included the initialisation
line. See other replies.

Scary.
>>
results in a segmentation fault.

Answers greatly appreciated.

Regards, Emyl.
Nov 2 '07 #5

P: n/a
BIG sigh of relief... Thanks all for your replies. More
specifically,

Richard: Your suggestion is one of the bazillion things I had tried
more or less a
t random. Like you noticed, it won't work. Thanks for taking the
time to answer.

Richard H.: My heartfelt thanks for a crystal clear answer that works
like a charm a
nd is general enough to use it in the broader context of my project.
If I could I'd buy
a beer to you and candy to your kids..............

Chuck F.: Thanks a lot for your answer. Of all the simple
possibilities it was the
only one I didn't try, and I guess the only correct one. If I had
thought of that
I'd have had enough of a clue to solve the problem.

Nov 2 '07 #6

P: n/a
On Nov 2, 5:32 am, Richard Heathfield <r...@see.sig.invalidwrote:
emyl said:


Hi all,
here's an elementary question. Assume I have declared two variables,
char *a, **b;
I can then give a value to a like
a="hello world";
The question is, how should I assign values to b? A simple
b[0]="string";
results in a segmentation fault.
Answers greatly appreciated.

Your definition:

char *a, **b;

reserves sufficient storage for a pointer-to-char named a, and a
pointer-to-pointer-to-char named b.

a="hello world";

assigns a value to this pointer-to-char, the value in question being the
address of the first character in the given string literal.

But b[0]="string"; is a problem, not because there's anything wrong with
the syntax, but because you've made an incorrect assumption.

b is a pointer-to-pointer-to-char, but you haven't pointed it to any
pointers-to-char, so it is currently indeterminate. b[0]="string"; is
*not* an attempt to give a value to b. It is an attempt to give a value to
b[0]. But b[0] is meaningless unless b has a meaningful value.

You can give b a meaningful value in any of several ways, but the most
obvious is to allocate some fresh memory for it:

#include <stdlib.h>

/* allocate memory for some pointers-to-char */
char **cpalloc(size_t n)
{
char **ptr = malloc(n * sizeof *ptr);
if(ptr != NULL)
{
while(n--)
{
ptr[n] = NULL;
}
}
I have one question . Can memset be used as
memset(ptr,0,n);
Instead of the while loop ?
return ptr;
Nov 2 '07 #7

P: n/a
somenath said:
On Nov 2, 5:32 am, Richard Heathfield <r...@see.sig.invalidwrote:
<snip>
> char **ptr = malloc(n * sizeof *ptr);
> if(ptr != NULL)
{
while(n--)
{
ptr[n] = NULL;
}
}
I have one question . Can memset be used as
memset(ptr,0,n);
Instead of the while loop ?
Not unless you can guarantee that the representation of null pointers on
all target platforms is all-bits-zero. I don't recall that the OP
mentioned any platforms. The code I supplied was portable to any hosted
implementation.

In situations where you /can/ use memset, don't bother - just calloc it
instead.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Nov 2 '07 #8

P: n/a
On Nov 2, 11:07 am, Richard Heathfield <r...@see.sig.invalidwrote:
somenath said:
On Nov 2, 5:32 am, Richard Heathfield <r...@see.sig.invalidwrote:

<snip>
char **ptr = malloc(n * sizeof *ptr);
if(ptr != NULL)
{
while(n--)
{
ptr[n] = NULL;
}
}
I have one question . Can memset be used as
memset(ptr,0,n);
Instead of the while loop ?

Not unless you can guarantee that the representation of null pointers on
all target platforms is all-bits-zero. I don't recall that the OP
mentioned any platforms. The code I supplied was portable to any hosted
implementation.

In situations where you /can/ use memset, don't bother - just calloc it
instead.
Many thanks for the response.

But my understanding was in pointer context 0 and NULL is converted to
null pointer. And converting to null pointer is compiler
responsibility. So I thought 0 in memset will be converted to null
pointer (which is system specific).

I would request you to correct me as I am feeling I may be
misunderstood some concept.
Nov 2 '07 #9

P: n/a
Ark Khasin wrote:
Richard wrote:
<snip>
I am not to argue who of us two is more of a newbie, but your post
sheds no light on the question asked. Ego bubbling?
This is most hilarious sentence I've read in c.l.c. this year.

Nov 3 '07 #10

P: n/a
Ark Khasin wrote:
Ben Bacarisse wrote:
<snip>
>No. unsigned char may not have padding bits. All the bits must be
value bits.
Why?
6.2.6.2 says "For unsigned integer types other than unsigned char, the
bits of the object representation shall be divided into two groups:
value bits and padding bits (there need not be any of the latter).
But I couldn't find anything saying that unsigned char *may not* have
padding bits.
Well the above quote says that unsigned char may not have _both_ padding
and value bits. Obviously the bit type left out has to be padding
bits - otherwise one would not be able to potably use unsigned char
objects.

<snip>

Nov 3 '07 #11

P: n/a
santosh wrote:
Ark Khasin wrote:
>Ben Bacarisse wrote:

<snip>
>>No. unsigned char may not have padding bits. All the bits must be
value bits.
>Why?
6.2.6.2 says "For unsigned integer types other than unsigned char, the
bits of the object representation shall be divided into two groups:
value bits and padding bits (there need not be any of the latter).
But I couldn't find anything saying that unsigned char *may not* have
padding bits.

Well the above quote says that unsigned char may not have _both_ padding
and value bits. Obviously the bit type left out has to be padding
bits - otherwise one would not be able to potably use unsigned char
objects.
Is this "just a theory"? IMHO, 6.2.6.2 says *exactly nothing* about
unsigned char.
Nov 3 '07 #12

P: n/a
santosh wrote:
Ark Khasin wrote:
>Richard wrote:

<snip>
>I am not to argue who of us two is more of a newbie, but your post
sheds no light on the question asked. Ego bubbling?

This is most hilarious sentence I've read in c.l.c. this year.
Ty. But Richard offered a satisfactory explanation.
--
Ark
Nov 3 '07 #13

P: n/a
Ark Khasin wrote:
santosh wrote:
>Ark Khasin wrote:
>>Ben Bacarisse wrote:

<snip>
>>>No. unsigned char may not have padding bits. All the bits must be
value bits.
>>Why?
6.2.6.2 says "For unsigned integer types other than unsigned char,
the bits of the object representation shall be divided into two
groups: value bits and padding bits (there need not be any of the
latter). But I couldn't find anything saying that unsigned char *may
not* have padding bits.

Well the above quote says that unsigned char may not have _both_
padding and value bits. Obviously the bit type left out has to be
padding bits - otherwise one would not be able to potably use
unsigned char objects.
Is this "just a theory"? IMHO, 6.2.6.2 says *exactly nothing* about
unsigned char.
<quote n1256.pdf>

6.2.6.2 Integer types

1 For unsigned integer types other than unsigned char, the bits of the
object representation shall be divided into two groups: value bits and
padding bits (there need not be any of the latter).

<endquote>

Note closely the text within the parenthesis. To me it _strongly_
implies, to say the least, that value bits are mandatory for objects of
all unsigned integer types. Since unsigned char is disallowed from
having padding bits, it must be composed only of value bits.

<quote n1256.pdf>

If there are N value bits, each bit shall represent a different power of
2 between 1 and 2N-1, so that objects of that type shall be capable of
representing values from 0 to 2N -1 using a pure binary representation;
this shall be known as the value representation. The values of any
padding bits are unspecified.44)

6.2.6.1

3 Values stored in unsigned bit-fields and objects of type unsigned char
shall be represented using a pure binary notation.40)

<endquote>

Again 6.2.6.1(3) in conjunction with 6.2.6.2(1) reinforces the
requirement that unsigned char may not have padding bits.

<quote n1256.pdf>

4 Values stored in non-bit-field objects of any other object type
consist of n CHAR_BIT bits, where n is the size of an object of that
type, in bytes. The value may be copied into an object of type unsigned
char [n] (e.g., by memcpy); the resulting setof bytes is
called the object representation of the value. Values stored in
bit-fields consist of m bits, where m is the size specified for the
bit-field. The object representation is the set of m bits the bit-field
comprises in the addressable storage unit holding it. Two values (other
than NaNs) with the same object representation compare equal, but values
that compare equal may have different object representations.

<endquote>

This answers the other issue that you raised concerning null pointers
not being all bits zero.

Nov 3 '07 #14

P: n/a
Ark Khasin wrote:
>
santosh wrote:
Ark Khasin wrote:
Ben Bacarisse wrote:
<snip>
>No. unsigned char may not have padding bits.
Is this "just a theory"?
No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.

--
pete
Nov 3 '07 #15

P: n/a
pete wrote:
Ark Khasin wrote:
>santosh wrote:
>>Ark Khasin wrote:
Ben Bacarisse wrote:
<snip>

No. unsigned char may not have padding bits.
>Is this "just a theory"?

No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.
Thanks to adding to my confusion :)
So I have an 11-bit machine bytes and UCHAR_MAX==8 and 3 padding most
significant bits. Anything wrong?
BTW, if I am not mistaken, in other integer types padding bits don't
have to be contiguous.
--
Ark
Nov 3 '07 #16

P: n/a
Ark Khasin wrote:
pete wrote:
>Ark Khasin wrote:
>>santosh wrote:
Ark Khasin wrote:
Ben Bacarisse wrote:
<snip>

>No. unsigned char may not have padding bits.
>>Is this "just a theory"?

No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.
Thanks to adding to my confusion :)
So I have an 11-bit machine bytes and UCHAR_MAX==8 and 3 padding most
significant bits. Anything wrong?
What do you mean by UCHAR_MAX==8? Do you mean CHAR_BIT==8?

As far as the Standard is concerned a char i.e., a byte (as defined by
C) contains CHAR_BIT bits. Additionally unsigned char may not contain
padding bits.

I don't know what you mean by "machine bytes" above. Are they supposed
to be different from C bytes?
BTW, if I am not mistaken, in other integer types padding bits don't
have to be contiguous.
Yes. Padding bits need not be contiguous.

Nov 3 '07 #17

P: n/a
Ark Khasin wrote, On 03/11/07 20:05:
pete wrote:
>Ark Khasin wrote:
>>santosh wrote:
Ark Khasin wrote:
Ben Bacarisse wrote:
<snip>

>No. unsigned char may not have padding bits.
>>Is this "just a theory"?

No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1 shall equal 2 raised to the
power CHAR_BIT.
Thanks to adding to my confusion :)
So I have an 11-bit machine bytes and UCHAR_MAX==8 and 3 padding most
significant bits. Anything wrong?
CHAR_BIT is the number of bits in a signed, unsigned and plain char.
Note, the number of bits, NOT the number of value bits. Therefore, as
UCHAR_MAX is 2 raised to the power of CHAR_BIT all of the bits must be
value bits.
BTW, if I am not mistaken, in other integer types padding bits don't
have to be contiguous.
The padding bits can be anywhere, but short of using an unsigned char
pointer to look at the representation they are hard to get at since the
bitwise operations are defined as operating on values.
--
Flash Gordon
Nov 3 '07 #18

P: n/a
Ark Khasin wrote:
Ben Bacarisse wrote:
>Ark Khasin <ak*****@macroexpressions.comwrites:
[>Ben Bacarisse wrote:]
....
>RH's point was something else altogether -- that all bits zero is not
guaranteed to produce a null pointer (to be scrupulously correct, it
is not guaranteed to produce a value that compares equal to a null
pointer constant).
The parenthesized comment was not actually needed to make the statement
"scrupulously correct"; it would have been just as correct, and less
confusing, without it.
That's where I am lost and reading the standard doesn't help:
What's the difference between a value of an object and how it compares
equal? I mean, if a==b, whatever their representations, in what
context(s) does it make sense to say they may have different values?
There is no difference. Don't let the unnecessary "clarification"
confuse you. The issue isn't having different values with the same
representation in a single type - that can't happen. The issue is that
there can be multiple different representations of the same value in a
given type. However, the values of objects of that type containing those
different representations must compare equal.

You're tripping over a minor issue; the fact that there can be multiple
representations of a null pointer. However, you've lost track of the key
issue: that a pointer object with all of its bits set to 0 doesn't have
to be one of those representations. In fact, it doesn't have to
represent a valid pointer value of any kind.
[NEGATIVE_ZERO comes to mind - and goes away. BTW, is it fair to say
that bitwise logic is a magic performed on representations, and not on
values?]
No. In general, the bitwise operations are defined in terms of their
actions on the values, not the representations. For instance, E>>1 is
defined as dividing the value of E by 2. The complicated exceptions all
involve sign bits, and most result in undefined behavior, which is why
it's strongly recommended that bitwise operations be restricted to
unsigned types, or at least restricted to values which are guaranteed to
be positive both before and after the operation.
> void *a;;
memset_as_above(&a, 0, sizeof a);
There is, at this point, no guarantee that 'a' contains a valid pointer
representation. Therefore, the next line renders the behavior of your
entire program undefined:
> if (a == 0) {
/* not guaranteed */
//Which is correct but implies
{
void **pNULL = 0;
if(a==*pNULL) {
/* not guaranteed */
I'm not sure what your point was; but you've just attempted to
dereference a null pointer, again making the behavior undefined.
Nov 3 '07 #19

P: n/a
Ark Khasin wrote:
>
pete wrote:
Ark Khasin wrote:
santosh wrote:
Ark Khasin wrote:
Ben Bacarisse wrote:
<snip>

No. unsigned char may not have padding bits.
Is this "just a theory"?
No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.
Thanks to adding to my confusion :)
So I have an 11-bit machine bytes
That's what "CHAR_BIT equals eleven" means.

--
pete
Nov 3 '07 #20

P: n/a
James Kuyper <ja*********@verizon.netwrites:
Ark Khasin wrote:
>Ben Bacarisse wrote:
>>Ark Khasin <ak*****@macroexpressions.comwrites:
[>Ben Bacarisse wrote:]
...
>>RH's point was something else altogether -- that all bits zero is not
guaranteed to produce a null pointer (to be scrupulously correct, it
is not guaranteed to produce a value that compares equal to a null
pointer constant).

The parenthesized comment was not actually needed to make the
statement "scrupulously correct"; it would have been just as correct,
and less confusing, without it.
Sorry if I've confused the issue. I was worried about suggesting that
there was only one such thing (one null pointer) but I can see that I
clearly don't. Maybe I did at some point as I was editing the text.

<snip>
>[NEGATIVE_ZERO comes to mind - and goes away. BTW, is it fair to say
that bitwise logic is a magic performed on representations, and not
on values?]

No. In general, the bitwise operations are defined in terms of their
actions on the values, not the representations.
Is that true for &, |, ^ and ~? The definitions are very bland, but
they suggest (simply by saying so little) that the interpretation is
to be based on the representation. This is backed up by section
6.5p4.

--
Ben.
Nov 4 '07 #21

P: n/a
pete wrote:
>>>><snip>
>
>>No. unsigned char may not have padding bits.
Is this "just a theory"?
No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.
Thanks to adding to my confusion :)
So I have an 11-bit machine bytes

That's what "CHAR_BIT equals eleven" means.
Sorry for being that stubborn, but:
Why?
Why can't I have CHAR_BIT==8 on a 11-bit machine?
E.g. my int would be something like say 11(lower)+8(upper)=19 bits.
Is it postulated somewhere that
UINT_MAX+1==(UCHAR_MAX+1)*sizeof(unsigned)
?
I don't think so.
Nov 4 '07 #22

P: n/a
Ben Bacarisse wrote:
James Kuyper <ja*********@verizon.netwrites:

<snip>
>>[NEGATIVE_ZERO comes to mind - and goes away. BTW, is it fair to say
that bitwise logic is a magic performed on representations, and not
on values?]
No. In general, the bitwise operations are defined in terms of their
actions on the values, not the representations.

Is that true for &, |, ^ and ~? The definitions are very bland, but
they suggest (simply by saying so little) that the interpretation is
to be based on the representation. This is backed up by section
6.5p4.
Yes, I took a beating in this ng recently for proposing, as an academic
exercise,
int cmpneq(int a, int b){ return a^b; }
At the time, I agreed that the beating was well deserved. But as far as
I can tell, it depended on ^ operating on representations.
An authoritative and well-substantiated clarification would be more than
welcome!
--
Ark
Nov 4 '07 #23

P: n/a
Ark Khasin wrote:
pete wrote:
>>>>><snip>
>>
>>>No. unsigned char may not have padding bits.
Is this "just a theory"?
No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.

Thanks to adding to my confusion :)
So I have an 11-bit machine bytes

That's what "CHAR_BIT equals eleven" means.
Sorry for being that stubborn, but:
Why?
Why can't I have CHAR_BIT==8 on a 11-bit machine?
E.g. my int would be something like say 11(lower)+8(upper)=19 bits.
Given sizeof(char) == 1 by definition, how would you express
sizeof(int)? 2.38 doesn't fit into size_t very well....

--
Ian Collins.
Nov 4 '07 #24

P: n/a
Ark Khasin wrote:
pete wrote:
>>>>><snip>
>>
>>>No. unsigned char may not have padding bits.
Is this "just a theory"?
No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.

Thanks to adding to my confusion :)
So I have an 11-bit machine bytes

That's what "CHAR_BIT equals eleven" means.
Sorry for being that stubborn, but:
Why?
Why can't I have CHAR_BIT==8 on a 11-bit machine?
E.g. my int would be something like say 11(lower)+8(upper)=19 bits.
Is it postulated somewhere that
UINT_MAX+1==(UCHAR_MAX+1)*sizeof(unsigned)
?
I don't think so.
Sorry for posting nonsense contradicting 6.2.6.1 #4.
It appears indeed that I cannot have 11+9-bit int. While I can have
8+8=16-bit int etc, such a C machine would simply ignore the 3 of 11
bits. Or it can use for padding, which demonstrates that padding of
unsigned char is possible e.g. for trap values (for instance,
uninitialized or truncated on assignment or whatever).
Would it be a legit implementation?
--
Ark
Nov 4 '07 #25

P: n/a
Ark Khasin <ak*****@macroexpressions.comwrites:
Why can't I have CHAR_BIT==8 on a 11-bit machine?
E.g. my int would be something like say 11(lower)+8(upper)=19 bits.
Because the individual bytes in an object must be able to be
inspected and modified. If I understand what you are proposing,
there would be 3 bits in the lower byte of your 19-bit int that
would not appear when that byte was inspected, because a char
would only be 8 bits wide.
--
Ben Pfaff
http://benpfaff.org
Nov 4 '07 #26

P: n/a
Ark Khasin <ak*****@macroexpressions.comwrites:
Ben Bacarisse wrote:
>James Kuyper <ja*********@verizon.netwrites:

<snip>
>>>[NEGATIVE_ZERO comes to mind - and goes away. BTW, is it fair to say
that bitwise logic is a magic performed on representations, and not
on values?]
No. In general, the bitwise operations are defined in terms of their
actions on the values, not the representations.

Is that true for &, |, ^ and ~? The definitions are very bland, but
they suggest (simply by saying so little) that the interpretation is
to be based on the representation. This is backed up by section
6.5p4.
Yes, I took a beating in this ng recently for proposing, as an
academic exercise,
int cmpneq(int a, int b){ return a^b; }
At the time, I agreed that the beating was well deserved. But as far
as I can tell, it depended on ^ operating on representations.
An authoritative and well-substantiated clarification would be more
than welcome!
If you think about it, you can *always* define the meaning in terms of
values even if it is more natural to think of it in terms of
representations. However, that would be stretching a point. An
expression like '-1 | -2' does not invoke undefined behaviour and the
result is most easily explained in terms of the representation of the
operands. (Of course it is daft, but that is not really the point.)

--
Ben.
Nov 4 '07 #27

P: n/a
Ark Khasin <ak*****@macroexpressions.comwrites:
It appears indeed that I cannot have 11+9-bit int. While I can have
8+8=16-bit int etc, such a C machine would simply ignore the 3 of 11
bits. Or it can use for padding, which demonstrates that padding of
unsigned char is possible e.g. for trap values (for instance,
uninitialized or truncated on assignment or whatever).
Would it be a legit implementation?
You mean an implementation with CHAR_BIT == 11 but only 8 value
bits in an unsigned char? No, that would not be a legitimate
implementation because unsigned char may not have padding bits.
--
"Am I missing something?"
--Dan Pop
Nov 4 '07 #28

P: n/a
Ark Khasin <ak*****@macroexpressions.comwrites:
Ark Khasin wrote:
>pete wrote:
>>>>>><snip>
>>>
>>>>No. unsigned char may not have padding bits.
>Is this "just a theory"?
No.
>
N869
5.2.4.2.1 Sizes of integer types <limits.h>
>
[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.
>
Thanks to adding to my confusion :)
So I have an 11-bit machine bytes

That's what "CHAR_BIT equals eleven" means.
Sorry for being that stubborn, but:
Why?
Why can't I have CHAR_BIT==8 on a 11-bit machine?
E.g. my int would be something like say 11(lower)+8(upper)=19 bits.
Is it postulated somewhere that
UINT_MAX+1==(UCHAR_MAX+1)*sizeof(unsigned)
?
I don't think so.
Sorry for posting nonsense contradicting 6.2.6.1 #4.
It appears indeed that I cannot have 11+9-bit int. While I can have
8+8=16-bit int etc, such a C machine would simply ignore the 3 of 11
bits. Or it can use for padding, which demonstrates that padding of
unsigned char is possible e.g. for trap values (for instance,
uninitialized or truncated on assignment or whatever).
Would it be a legit implementation?
No. Unsigned char can't have padding bits. It is not permitted.
Neither are trap representations.

If you choose to fake an 8-bit char on your 11-bit hardware you must
do so in such a way as to hide all evidence of the extra bits.

Padding bits are visible. You can tell they are there because the set
of representable values in type T is less than or equal 2**(CHAR_BIT *
sizeof(T) - 1). I.e. at least one bit does not contribute to the set
of values.

--
Ben.
Nov 4 '07 #29

P: n/a
Ben Bacarisse wrote:
Ark Khasin <ak*****@macroexpressions.comwrites:
>Ben Bacarisse wrote:
>>James Kuyper <ja*********@verizon.netwrites:

<snip>
[NEGATIVE_ZERO comes to mind - and goes away. BTW, is it fair to say
that bitwise logic is a magic performed on representations, and not
on values?]
No. In general, the bitwise operations are defined in terms of their
actions on the values, not the representations.
Is that true for &, |, ^ and ~? The definitions are very bland, but
they suggest (simply by saying so little) that the interpretation is
to be based on the representation. This is backed up by section
6.5p4.
Yes, I took a beating in this ng recently for proposing, as an
academic exercise,
int cmpneq(int a, int b){ return a^b; }
At the time, I agreed that the beating was well deserved. But as far
as I can tell, it depended on ^ operating on representations.
An authoritative and well-substantiated clarification would be more
than welcome!

If you think about it, you can *always* define the meaning in terms of
values even if it is more natural to think of it in terms of
representations. However, that would be stretching a point. An
expression like '-1 | -2' does not invoke undefined behaviour and the
result is most easily explained in terms of the representation of the
operands. (Of course it is daft, but that is not really the point.)
So, -1 | -2 (or better yet, (-1)^1) is... what? Does it not depend on
one of the 3 models of negatives C recognizes - 2's complement, 1's
complement and sign+magnitude?
--
Ark
Nov 4 '07 #30

P: n/a
James Kuyper wrote:
Ark Khasin wrote:
>Ben Bacarisse wrote:
>>Ark Khasin <ak*****@macroexpressions.comwrites:
[>Ben Bacarisse wrote:]
...
>>RH's point was something else altogether -- that all bits zero is not
guaranteed to produce a null pointer (to be scrupulously correct, it
is not guaranteed to produce a value that compares equal to a null
pointer constant).

The parenthesized comment was not actually needed to make the statement
"scrupulously correct"; it would have been just as correct, and less
confusing, without it.
>That's where I am lost and reading the standard doesn't help:
What's the difference between a value of an object and how it compares
equal? I mean, if a==b, whatever their representations, in what
context(s) does it make sense to say they may have different values?

There is no difference. Don't let the unnecessary "clarification"
confuse you. The issue isn't having different values with the same
representation in a single type - that can't happen. The issue is that
there can be multiple different representations of the same value in a
given type. However, the values of objects of that type containing those
different representations must compare equal.

You're tripping over a minor issue; the fact that there can be multiple
representations of a null pointer.
I don't think I am
However, you've lost track of the key
issue: that a pointer object with all of its bits set to 0 doesn't have
to be one of those representations. In fact, it doesn't have to
represent a valid pointer value of any kind.
I don't think I have. But I find it while correct, grotesque.
>
>[NEGATIVE_ZERO comes to mind - and goes away. BTW, is it fair to say
that bitwise logic is a magic performed on representations, and not on
values?]

No. In general, the bitwise operations are defined in terms of their
actions on the values, not the representations. For instance, E>>1 is
defined as dividing the value of E by 2.
No. E.g. with 2's complement machine and C99 (and perhaps 90% of C90)
a/2 is (a>=0)?(a>>1):((a+1)>>1)
The complicated exceptions all
involve sign bits, and most result in undefined behavior, which is why
it's strongly recommended that bitwise operations be restricted to
unsigned types, or at least restricted to values which are guaranteed to
be positive both before and after the operation.
>> void *a;;
memset_as_above(&a, 0, sizeof a);

There is, at this point, no guarantee that 'a' contains a valid pointer
representation.
Sure. But I find it while correct, grotesque.
Therefore, the next line renders the behavior of your
entire program undefined:
>> if (a == 0) {
/* not guaranteed */
//Which is correct but implies
{
void **pNULL = 0;
if(a==*pNULL) {
/* not guaranteed */

I'm not sure what your point was; but you've just attempted to
dereference a null pointer, again making the behavior undefined.
Oops.
I meant
void *pNULL = 0; //pNULL == NULL
a == pNULL not guaranteed.
Not that I don't grasp it; it just seems grotesque
--
Ark

Nov 4 '07 #31

P: n/a
Ark Khasin <ak*****@macroexpressions.comwrites:

<snip>
So, -1 | -2 (or better yet, (-1)^1) is... what? Does it not depend on
one of the 3 models of negatives C recognizes - 2's complement, 1's
complement and sign+magnitude?
Yes. My reading of the standard is that whatever bits are set to
represent -1 and -2 (and that depends on the kind of negative number
system used) are OR'd to get the result.

1's comp s+mag 2's comp
-1 1..1111110 1..0000001 1..1111111
-2 1..1111101 1..0000010 1..1111110
-1 | -2 1..1111111 1..0000011 1..1111111
value -0 -3 -1

-1 1..1111110 1..0000001 1..1111111
1 0..0000001 0..0000001 0..0000001
-1 ^ 1 1..1111111 1..0000000 1..1111110
-0 -0 -2

I prefer my example since it can result in three values (or two values
and a trap representation).

--
Ben.
Nov 4 '07 #32

P: n/a
Ark Khasin wrote:
So, -1 | -2 (or better yet, (-1)^1) is... what? Does it not depend on
one of the 3 models of negatives C recognizes - 2's complement, 1's
complement and sign+magnitude?
(-1 | -2) == {-1,-0,-3}
(-1 ^ 1) == {-2,-0,-0}
(-1) | (-2)
1111.1111 | 1111.1110 == (-1)
1111.1110 | 1111.1101 == (-0)
1000.0001 | 1000.0010 == (-3)
(-1) ^ (1)
1111.1111 ^ 0000.0001 == (-2)
1111.1110 ^ 0000.0001 == (-0)
1000.0001 ^ 0000.0001 == (-0)

--
pete
Nov 4 '07 #33

P: n/a
Ark Khasin <ak*****@macroexpressions.comwrites:
James Kuyper wrote:
>Ark Khasin wrote:
<snip>
>>[NEGATIVE_ZERO comes to mind - and goes away. BTW, is it fair to
say that bitwise logic is a magic performed on representations, and
not on values?]

No. In general, the bitwise operations are defined in terms of their
actions on the values, not the representations. For instance, E>>1
is defined as dividing the value of E by 2.
No. E.g. with 2's complement machine and C99 (and perhaps 90% of C90)
a/2 is (a>=0)?(a>>1):((a+1)>>1)
That is rather unfair. James Kuyper goes on, immediately, to say that
the exceptions involve sign bits. He says that "most" of these cases
are undefined (which is correct) but you are also largely correct in
that shifting a signed and negative value one place left is
implementation defined (it is not connected to 2's compliment, it is
simply an implementation defined operation -- which may be undefined,
of course).
>The complicated exceptions all
involve sign bits, and most result in undefined behavior, which is
why it's strongly recommended that bitwise operations be restricted
to unsigned types, or at least restricted to values which are
guaranteed to be positive both before and after the operation.
<snip>

--
Ben.
Nov 4 '07 #34

P: n/a
Ark Khasin wrote:
James Kuyper wrote:
>Ark Khasin wrote:
>>Ben Bacarisse wrote:
....
>issue: that a pointer object with all of its bits set to 0 doesn't
have to be one of those representations. In fact, it doesn't have to
represent a valid pointer value of any kind.
I don't think I have. But I find it while correct, grotesque.
....
>No. In general, the bitwise operations are defined in terms of their
actions on the values, not the representations. For instance, E>>1 is
defined as dividing the value of E by 2.
No. E.g. with 2's complement machine and C99 (and perhaps 90% of C90)
a/2 is (a>=0)?(a>>1):((a+1)>>1)
>The complicated exceptions all
involve sign bits, and most result in undefined behavior, which is why
As I said, the exceptions involve sign bits, and the behavior when the
sign bit is set is the key difference between what I wrote and what you
wrote. Please note that the behavior of (a+1)>>1 produces an
implementation-defined result when a+1 is negative; while the most
plausible behavior is to handle it in the fashion you expect, the
standard does not require it.
>>> void *a;;
memset_as_above(&a, 0, sizeof a);

There is, at this point, no guarantee that 'a' contains a valid
pointer representation.
Sure. But I find it while correct, grotesque.
I can't help you with the aesthetic judgments. As a practical matter, I
believe that the standard lets these things depend upon the
implementation, precisely because there are real platforms where the
rules you'd like to see would make it unacceptably difficult to create
an efficient implementation of C. It wasn't just invented to make things
complicated for programmers.
Therefore, the next line renders the behavior of your
>entire program undefined:
>>> if (a == 0) {
/* not guaranteed */
//Which is correct but implies
{
void **pNULL = 0;
if(a==*pNULL) {
/* not guaranteed */

I'm not sure what your point was; but you've just attempted to
dereference a null pointer, again making the behavior undefined.
Oops.
I meant
void *pNULL = 0; //pNULL == NULL
a == pNULL not guaranteed.
With those changes, if it weren't for the fact that a had been filled in
by a call to memset that renders any use of 'a' dangerous, the rest of
this code would have been fine. If 'a' compared equal to 0, that would
normally have been sufficient to guarantee that it would also compare
equal to pNULL. All null pointers must compare equal, regardless of
representation, and no non-null pointer is allowed to compare equal to a
null pointer.
Nov 4 '07 #35

P: n/a
pete <pf*****@mindspring.comwrites:
Ark Khasin wrote:
>So, -1 | -2 (or better yet, (-1)^1) is... what? Does it not depend on
one of the 3 models of negatives C recognizes - 2's complement, 1's
complement and sign+magnitude?

(-1 | -2) == {-1,-0,-3}
(-1 ^ 1) == {-2,-0,-0}
snap!

--
Ben.
Nov 4 '07 #36

P: n/a
James Kuyper wrote:
Ark Khasin wrote:
>>No. In general, the bitwise operations are defined in terms of their
actions on the values, not the representations. For instance, E>>1 is
defined as dividing the value of E by 2.
No. E.g. with 2's complement machine and C99 (and perhaps 90% of C90)
a/2 is (a>=0)?(a>>1):((a+1)>>1)
>>The complicated exceptions all
involve sign bits, and most result in undefined behavior, which is why

As I said, the exceptions involve sign bits, and the behavior when the
sign bit is set is the key difference between what I wrote and what you
wrote. Please note that the behavior of (a+1)>>1 produces an
implementation-defined result when a+1 is negative; while the most
plausible behavior is to handle it in the fashion you expect, the
standard does not require it.
I have to offer my apologies. It was unfair indeed to object to the
first 40% of the statement without parsing the meaning of the remaning 60%.
Sorry.
--
Ark
Nov 4 '07 #37

P: n/a
Ark Khasin wrote:
>
pete wrote:
>>><snip>

>No. unsigned char may not have padding bits.
Is this "just a theory"?
No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.

Thanks to adding to my confusion :)
So I have an 11-bit machine bytes
That's what "CHAR_BIT equals eleven" means.
Sorry for being that stubborn, but:
Why?
Because that's what "CHAR_BIT" means.

N869
5.2.4.2 Numerical limits
[#1]
-- number of bits for smallest object that is not a bit-
field (byte)
CHAR_BIT

--
pete
Nov 4 '07 #38

P: n/a
Ark Khasin wrote:
....
Is it so that a consensus emerges that ^ | & on negative numbers depend
I can't speak for the consensus; this is just my understanding of what
the standard says. In many cases my understanding differs distinctly
from the consensus understanding.
on representation (or, to tell the truth, act on representations) and so
are implementation-defined (although only 3 ways to implement are
recognized)?
Those operators don't act on the representations of objects; otherwise
cases like

x += (a+1) | (b-2);

would be undefined. The way that they operate on or generate negative
values must, however, be consistent with whichever of 3 permitted ways
of handling the sign bits that the implementation chose for the relevant
type. I see no way for those operators to operate on the padding bits
in any sense that is meaningful within the context of the standard. I
would expect that what happens is that the padding bits would be handled
just like the value bits, but that an implementation is not required to
do so. If you want to ensure that padding bits are handled in the same
fashion, you need to access the object as an array of unsigned char, and
perform the operations on a byte-by-byte basis.
Nov 4 '07 #39

P: n/a
Ark Khasin <ak*****@macroexpressions.comwrites:

<snip discussion of value vs. representation operations>
Is it so that a consensus emerges that ^ | & on negative numbers
depend on representation (or, to tell the truth, act on
representations) and so are implementation-defined (although only 3
ways to implement are recognized)?
Well, not many people have expressed a view, and by this time in the
thread I don't think anyone is any doubt about what the various
operations *do*, so the discussion comes down to how the term
"representation" is used.

James Kuyper takes the view that representations only exist in objects
(i.e. when values are stored) but, although that is reasonable meaning
of the term, I don't think that is supported by the wording of the
standard (see my other answer to him in this thread).

In any case, the meaning of ^, |, and & on signed values is most
definitely implementation defined because it depends on the way
negative numbers are represented by the implementation. To me, that
means they are defined in terms of the representation, even though no
storage is involved in an expression like '-1 | -2'. You decide if
you like this meaning of the term.

Is that a consensus? No, but to some extent one can interpret silence
as consent here. Alternatively, since such thing are inherently not
portable, the silence might just mean that no one cares. I certainly
care about this much less then the volume of words I have written
about it suggests -- I'd never use such a construct.

--
Ben.
Nov 5 '07 #40

P: n/a
James Kuyper <ja*********@verizon.netwrites:
Ark Khasin wrote:
...
>Is it so that a consensus emerges that ^ | & on negative numbers
depend

I can't speak for the consensus; this is just my understanding of what
the standard says. In many cases my understanding differs distinctly
from the consensus understanding.
>on representation (or, to tell the truth, act on representations)
and so are implementation-defined (although only 3 ways to implement
are recognized)?

Those operators don't act on the representations of objects; otherwise
cases like

x += (a+1) | (b-2);
You have written a correct statement because, of course, a+1 is not an
object. The question is, does '(a+1) | (b-2)' depend (or act) on the
representation of 'a+1' and 'b-2' and I'd say it does.

Section 6.2.6 is called "Representations of types" (not objects) and
6.2.6.1 p4 takes some pains to define a new term -- the "object
representation" -- which is not quite the same thing as the
representation of the type.

I don't want to suggest that | operates on padding bits. I don't
think one can determine that either way, but it seems to be pushing the
meaning of representation to its limits (and beyond that used in the
standard) to say that |, &, and ^ act on the value not the
representation.
would be undefined. The way that they operate on or generate negative
values must, however, be consistent with whichever of 3 permitted ways
of handling the sign bits that the implementation chose for the
relevant type.
These "permitted ways" are described in the section called
"Representation of types". Your sentence would be simpler if you had
said that the bits they operate on (and the meaning of the bits that
are produced) are determined by the representation used for signed
integer types.

It is possible to define the result of | on signed types by talking
only about the value of the operands, but it is complicated to do so
and the standard clearly distinguishes between the terms
"representation" and "object representation" so that there is simple
way to understand the operation. The standard uses only few words to
define | (and the others) because the term "corresponding bits" is
clearly intended to refer back to the bits used to represent the
value as described in 6.2.6.
I see no way for those operators to operate on the
padding bits in any sense that is meaningful within the context of the
standard.
I agree, but padding bits are only one part of the representation of
the type.

--
Ben.
Nov 5 '07 #41

P: n/a
Ben Bacarisse wrote:
James Kuyper <ja*********@verizon.netwrites:
....
>Those operators don't act on the representations of objects; otherwise
cases like

x += (a+1) | (b-2);

You have written a correct statement because, of course, a+1 is not an
object. The question is, does '(a+1) | (b-2)' depend (or act) on the
representation of 'a+1' and 'b-2' and I'd say it does.

Section 6.2.6 is called "Representations of types" (not objects) and
6.2.6.1 p4 takes some pains to define a new term -- the "object
representation" -- which is not quite the same thing as the
representation of the type.
That is not clear to me. The standard defines what "object
representation" means, but it never defines what "representation" means.
As far as I can tell, the term "representation" is almost always used
in the standard as a short form for "object representation". The
standard uses the term "value representation" exactly once, in
6.2.6.2p2, and oddly enough, while it is worded as a definition, it not
italicized as a definition should be. If it were an official definition,
from context it would appear to be defined only for unsigned types.

When the word "representation" is used without either "object" or
"value" preceding it, I found only a few cases where it was clear that
it was not a reference to the object representation; in every case, it
was also clear that it did not refer to the value representation.
Examples include such things as the representation of characters on the
display screen, or the representation of data in the output file. In
both of those cases, that only thing the standard says is that those are
details outside of its scope.

I'm not saying there's no instances where a lone "representation"
clearly refers to the value representation, where inserting the word
"object" before "representation" would clearly change the intended
meaning. There's too many instances for me to check reliably. However, I
didn't find any - do you know of any?
>would be undefined. The way that they operate on or generate negative
values must, however, be consistent with whichever of 3 permitted ways
of handling the sign bits that the implementation chose for the
relevant type.

These "permitted ways" are described in the section called
"Representation of types". Your sentence would be simpler if you had
said that the bits they operate on (and the meaning of the bits that
are produced) are determined by the representation used for signed
integer types.
Which is the way that the standard says the same thing. I don't think
there's a difference in meaning, just a difference in clarity.

....
and the standard clearly distinguishes between the terms
"representation" and "object representation" so that there is simple
Citation, please? The definition of "object representation" does not
clearly distinguish them.
Nov 6 '07 #42

P: n/a
Ben Bacarisse wrote:
....
If section 6.2.6 was called "Object representation of types" so that
it was unambiguously about how values are stored and nothing else,
then I believe '-1 | -2' would be undefined. If values, not stored in
objects, do not have a representation, then what "bits" are there to
or together? It is because values may be represented as collections
of bits (as it happens in three different ways) that such expressions
can have a meaning defined by combining "corresponding bits".
6.5p4 says "Some operators (the unary operator ~, and the binary
operators <<, >>, &, ^, and |, collectively described as bitwise
operators) are required to have operands that have integer type. These
operators yield values that depend on the internal representations of
integers, and have implementation-defined and undefined aspects for
signed types".

As I understand it, that statement about the "internal representations"
refers to the object representation, and does not imply that the
standard attaches any meaning to the representation of a value that is
not currently stored in any C object. It merely requires that the
bitwise operators act on values in such a way that if the resulting
value were saved in an object of that value's type, it would have the
correct bit pattern.

....
I have no other citation than the definition I already cited. I agree
the distinction is not crystal clear, but if your reading of the words
is that there is essentially no difference between "representation"
and "object representation" then how do you give '-1 | -2' a meaning?
You could, of course, say that | (and friends) behaves as if its
operands where stored in objects and the corresponding bits are
combined together, but you'd *still* be referencing the allowed
representations.
That's exactly how I understand it. I never denied that the allowed
representations were referenced, only that the concept of a
representation of a value only aquires a meaning in the event that it is
stored in an object.
Nov 7 '07 #43

P: n/a
James Kuyper <ja*********@verizon.netwrites:
Ben Bacarisse wrote:
...
>If section 6.2.6 was called "Object representation of types" so that
it was unambiguously about how values are stored and nothing else,
then I believe '-1 | -2' would be undefined. If values, not stored in
objects, do not have a representation, then what "bits" are there to
or together? It is because values may be represented as collections
of bits (as it happens in three different ways) that such expressions
can have a meaning defined by combining "corresponding bits".

6.5p4 says "Some operators (the unary operator ~, and the binary
operators <<, >>, &, ^, and |, collectively described as bitwise
operators) are required to have operands that have integer type. These
operators yield values that depend on the internal representations of
integers, and have implementation-defined and undefined aspects for
signed types".

As I understand it, that statement about the "internal
representations" refers to the object representation, and does not
imply that the standard attaches any meaning to the representation of
a value that is not currently stored in any C object.
I can accept that this is the intent, but I do not think it is clear
and unambiguous. Since it can't be detected by a C program, I don't
really care if there is one representation (when a value is stored) or
another, transient one, as well. The latter just seemed to me a
convenient expository device for exactly these cases, but if it is a
fiction of my imagination, so be it!
It merely
requires that the bitwise operators act on values in such a way that
if the resulting value were saved in an object of that value's type,
it would have the correct bit pattern.

...
>I have no other citation than the definition I already cited. I agree
the distinction is not crystal clear, but if your reading of the words
is that there is essentially no difference between "representation"
and "object representation" then how do you give '-1 | -2' a meaning?
You could, of course, say that | (and friends) behaves as if its
operands where stored in objects and the corresponding bits are
combined together, but you'd *still* be referencing the allowed
representations.

That's exactly how I understand it. I never denied that the allowed
representations were referenced,
Well that was exactly what I thought you were doing when you said:

"In general, the bitwise operations are defined in terms of their
actions on the values, not the representations."

I just can't square this with your last remark above. For four of the
six, the representation is a required part of the definition.
only that the concept of a
representation of a value only aquires a meaning in the event that it
is stored in an object.

--
Ben.
Nov 7 '07 #44

This discussion thread is closed

Replies have been disabled for this discussion.