Associativity of ++ and --

Chad

The following question stems from p.132 in the book "The C Programming
Language", second edition by K & R.

The have

struct {
int len;
char *str;
} *p;

The question is how come parentheses are need when you go like:
(++p)->len;

but you don't need them when you go:
p++->len;

The only thing I can think of would be that in the former, the prefix
operator associates right to left,
whereas in the former, the postfix operator associates left to right.
I'm I close?

Chad

Sep 28 '06 #1

Subscribe Post Reply

2671

Zara

On 27 Sep 2006 20:13:47 -0700, "Chad" <cd*****@gmail.comwrote:

>The following question stems from p.132 in the book "The C Programming
Language", second edition by K & R.

The have

struct {
int len;
char *str;
} *p;

The question is how come parentheses are need when you go like:
(++p)->len;

but you don't need them when you go:
p++->len;

<..>

It is only a question of natural evaluation of expressions and
precedence of operators.

In the second case, it is only natural that ++ affects p, there is no
way it could affect ->len.

In the first case, operator -has higher precedence than operator ++.
Thus, ++p->len should be parsed as ++(p->len)

regards,

zara

Sep 28 '06 #2

Chris Torek

In article <11**********************@i3g2000cwc.googlegroups. com>
Chad <cd*****@gmail.comwrote:

>The following question stems from p.132 in the book "The C Programming
Language", second edition by K & R.

The have

struct {
int len;
char *str;
} *p;

The question is how come parentheses are need when you go like:
(++p)->len;

but you don't need them when you go:
p++->len;

The only thing I can think of would be that in the former, the prefix
operator associates right to left,
whereas in the former, the postfix operator associates left to right.
I'm I close?

Not really, no. :-)

The issue is not associativity (or, in a sense, even precedence).
K&R (either edition), and most sensible humans, use an "operator
precedence" grammar to describe expressions in C. The C standards
(C89 and C99 both) instead use a fully factored grammar, in which
the problems solved by "operator precedence and associativity"
never even arise.

So what are these problems? Well, they are the obvious ones we
always ask: given an expression like:

f() + g() * h()

"Do I do the times first and then the plus, or the plus first and
then the times?"

Unfortunately, these questions imply something that is not true.
Some may recall a book that came out in the 1990s titled "Why Cats
Paint". Why *do* cats paint? Well, that question presupposes
something: we skip right over the question of whether cats paint
at all [%]. So does the question "which one do we do first" -- it
presupposes that there is some order of operations, so that each
individual step is done one at a time, and completed, and then
another one is done. A C compiler is allowed to do them at the
*same time* as long as it gets the "right answer". So the trick
is determining which answer(s) are "right" without implying that
there is a specific sequence.

[% They must, there was a whole book about it :-) ]

Anyway, given an "operator precedence" grammar -- the kind that
humans like -- we end up with those questions about "precedence"
and "associativity". Answering them will give us the key to getting
the "right answer", as long as we do not assume that "precedence"
means "runtime order of evaluation".

The problems occur when there is what, in language theory, is called
an "ambiguous parse". Given a "sentence" (or even just a "phrase", a
sort of sentence fragment) written in some language, we have to
find what words or symbols bind together with what other words or
symbols.

(This situation occurs in English too. Commenting on squirrels
one day, someone wrote: "Damn those hairy nut eaters!" Are the
squirrels both hairy *and* nut-eaters, or are they simply eaters
of hairy nuts?)

So, given:

a + b * c

does this mean:

(a + b) * c

or does it mean:

a + (b * c)

? (If you use the Standard's grammar, you find that it can *only*
mean the latter. Doing so is a tedious exercise, so I leave it to
the reader. :-) ) If you use an operator-precedence grammar, you
find that it can mean either one, and it is at this point that you
consult your precedence-and-associativity table, for only one
reason: it is time to "break ties" between possible parses.

In computer-language parsing, we can build an "expression tree"
in which each tree-node has an operator and one or two (or, for
C's ?: expressions, three) operands, and remove all the ambiguity.
We will end up with just one of these two:

+ *
/ \ / \
a * + c
/ \ / \
b c a b

(This is something like drawing a sentence diagram to figure out
whether those were hairy eaters, or hairy nuts.) The tree version
makes it clear which operators are bound to which operands. But
of course we do not use this while writing the source code; instead,
we just have to use parentheses to force a particular binding, if
needed.

This is the key -- the item C programmers need to memorize -- about
parsing: the compile-time "precedence" and "associativity" that
we like to use in thinking about our code simply determines the
*compile time* binding of operators and operands. It is, I think,
a good idea to use the word "binding" here, because it helps keep
this concept separate from run-time order of evaluation.

Let me go back to:

f() + g() * h()

again. Let me also show the contents of functions f(), g(), and
h():

int f(void) { puts("f"); return 5; }
int g(void) { puts("g"); return 6; }
int h(void) { puts("h"); return 7; }

Whenever any of these three functions is called (at runtime), it
announces the fact that it has just been called (by printing its
name to stdout). Then it returns the corresponding value. This
sequence of events is guaranteed by C's "sequence points", which
-- as their name implies -- tell you about runtime sequencing.

Now, if you do:

int x = f() + g() * h();

the fact is that x *must* be set to 47 -- the result of adding 5
to 42 (i.e., 6 times 7) -- but the lines "f", "g", and "h" can
come out in any order. A compiler can, if it likes, call f()
first. On a stack-oriented machine, it might make a lot of sense
to do this -- the machine code might be:

call f
call g
call h
mul
add

which means "call f, and push the result on a stack; then call g,
and push its result; then call h and push its result; then pop the
top two stack elements, multiply, and push the result; then pop
the top two stack elements, add, and push." This leaves 47 at
the top of the stack, which is the right answer.

On the other hand, a compiler can, if it likes, call either g or
h first. In fact, since f() is known to return 5, g() to return
6, and h() to return 7, it can call all three in any old order,
and then just set x to 47 without doing any arithmetic at all.

If you need a specific runtime order, you must use C's sequence
points to get it. For instance, if for some reason you must have
g() called first, then f(), and then finally h(), you can do this:

int fval, gval, hval;
int x;

gval = g();
fval = f();
hval = h();
x = fval + gval * hval;

Since semicolon-separated statements force sequence points, the
calls will now occur in the required order.

With all that out of the way, let me get back to the first of the
original expressions:

(++p)->len;

Without the parentheses, this would have two possible parses,
given the grammar from K&R (the name "pre++" distinguishes this
"++" operator from postfix-++; in tree form, they look the same).
The prefix++ could bind to p first, leaving the -to apply to its
result and len; or the -could bind to p and len first, leaving
the prefix++ to bind to its result:

[if -binds first] [if pre++ binds first]
pre++ ->
| / \
- pre++ len
/ \ |
p len p

Which one should you choose? Well, you should choose the one
the Standard's grammar would *force* you to pick; but if you
want to do that without the pain of using the Standard's grammar,
you look in your "precedence and associativity" table, starting
with the "precedence" part. It says that "->" has "higher
precedence" than "++", so the "->" will bind more tightly than
the ++. That is, the -operator will suck up its operands
first, leaving "++" to apply to the now-bound version. This
gives the left-hand parse tree, which is not the one we wanted.
So we can add parentheses, forcing the "++" to bind first,
leaving "->" to operate on the result of "++p".

Now consider the second expression:

p++->len;

Here, we have the postfix++ operator instead of the prefix++
operator. As before, our table says that postfix++ has "lower
precedence" than the -operator. But what can the -operator
bind to? Can it have "postfix++" on its left side, without the
postfix++ applying to anything?

->
/ \
++ len [and what happened to p?]

This is not a valid parse at all. It simply "cannot happen",
so it *does* not happen. "p++", on the other hand, is a valid
left hand side for the "->" operator, so we *can* build *this*
tree:

->
/ \
post++ len
|
p

So there is only one possible parse: the -binds to len on its
right, and to the result of "postfix++ applied to p" on its left.
We never even have to *look* at our operator table, because only
one parse makes sense.

This, incidentally, is also why parentheses are not required
around "b = c" in:

result = (a ? b = c : d);

There is only one possible parse for the middle part. (In effect,
the punctuators ? and : act like parentheses around the middle.
They only do not *look* like parentheses because they do not
have any shape to them.) On the other hand:

a ? b = c : d = e;

has two possible parses (well, two that are "grammatically correct"
even though one is meaningless):

?: =
/ | \ / \
/ | \ / e
a = = ?:
/ \ / \ / | \
b c d e a = d
/ \
b c

The grammar in the C Standard chooses the right hand version, which
turns out to be semantically incorrect, so that you get an error
message. Hence, to misuse ?: as an if-then-else and get the first
parse, you must parenthesize the "d = e" part:

a ? b = c : (d = e);

This forcibly binds the "d = e" together, so that the ?: can then
take it as a unit for its third operand.

Incidentally, depending on the compiler internals, ?: might be
represented as a pair of binary nodes, instead of a trinary node:

x ? y : z

might look more like this:

?
/ \
x :
/ \
y z

GCC's internals use (used? I have not touched them in ages) an
extensible representation so that there is no need to squeeze ?:
expressions into binary trees, but other compilers have only binary
trees, even using them for function calls:

call
/ \
fn args
/ \
a1 args
/ \
a2 args
/ \
a3 a4

is one way to represent "fn(a1, a2, a3, a4)". (Note that the left
operand of a "call" operator is actually a pointer to a function,
at least in principle. If "fn" is the name of a function, rather
than a pointer, it may need to be predeced by a "unary&" node. A
lot of this depends on compiler internal details, though.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Sep 28 '06 #3

Rod Pemberton

"Chris Torek" <no****@torek.netwrote in message
news:ef*********@news3.newsguy.com...

In article <11**********************@i3g2000cwc.googlegroups. com>
Chad <cd*****@gmail.comwrote:
The following question stems from p.132 in the book "The C Programming
Language", second edition by K & R.

The have

struct {
int len;
char *str;
} *p;

The question is how come parentheses are need when you go like:
(++p)->len;

but you don't need them when you go:
p++->len;

The only thing I can think of would be that in the former, the prefix
operator associates right to left,
whereas in the former, the postfix operator associates left to right.
I'm I close?

Not really, no. :-)

The issue is not associativity (or, in a sense, even precedence).
K&R (either edition), and most sensible humans, use an "operator
precedence" grammar to describe expressions in C. The C standards
(C89 and C99 both) instead use a fully factored grammar, in which
the problems solved by "operator precedence and associativity"
never even arise.

So what are these problems? Well, they are the obvious ones we
always ask: given an expression like:

f() + g() * h()

"Do I do the times first and then the plus, or the plus first and
then the times?"

Unfortunately, these questions imply something that is not true.
Some may recall a book that came out in the 1990s titled "Why Cats
Paint". Why *do* cats paint? Well, that question presupposes
something: we skip right over the question of whether cats paint
at all [%]. So does the question "which one do we do first" -- it
presupposes that there is some order of operations, so that each
individual step is done one at a time, and completed, and then
another one is done. A C compiler is allowed to do them at the
*same time* as long as it gets the "right answer". So the trick
is determining which answer(s) are "right" without implying that
there is a specific sequence.

[% They must, there was a whole book about it :-) ]

Anyway, given an "operator precedence" grammar -- the kind that
humans like -- we end up with those questions about "precedence"
and "associativity". Answering them will give us the key to getting
the "right answer", as long as we do not assume that "precedence"
means "runtime order of evaluation".

The problems occur when there is what, in language theory, is called
an "ambiguous parse". Given a "sentence" (or even just a "phrase", a
sort of sentence fragment) written in some language, we have to
find what words or symbols bind together with what other words or
symbols.

(This situation occurs in English too. Commenting on squirrels
one day, someone wrote: "Damn those hairy nut eaters!" Are the
squirrels both hairy *and* nut-eaters, or are they simply eaters
of hairy nuts?)

So, given:

a + b * c

does this mean:

(a + b) * c

or does it mean:

a + (b * c)

? (If you use the Standard's grammar, you find that it can *only*
mean the latter. Doing so is a tedious exercise, so I leave it to
the reader. :-) ) If you use an operator-precedence grammar, you
find that it can mean either one, and it is at this point that you
consult your precedence-and-associativity table, for only one
reason: it is time to "break ties" between possible parses.

In computer-language parsing, we can build an "expression tree"
in which each tree-node has an operator and one or two (or, for
C's ?: expressions, three) operands, and remove all the ambiguity.
We will end up with just one of these two:

+ *
/ \ / \
a * + c
/ \ / \
b c a b

(This is something like drawing a sentence diagram to figure out
whether those were hairy eaters, or hairy nuts.) The tree version
makes it clear which operators are bound to which operands. But
of course we do not use this while writing the source code; instead,
we just have to use parentheses to force a particular binding, if
needed.

This is the key -- the item C programmers need to memorize -- about
parsing: the compile-time "precedence" and "associativity" that
we like to use in thinking about our code simply determines the
*compile time* binding of operators and operands. It is, I think,
a good idea to use the word "binding" here, because it helps keep
this concept separate from run-time order of evaluation.

Let me go back to:

f() + g() * h()

again. Let me also show the contents of functions f(), g(), and
h():

int f(void) { puts("f"); return 5; }
int g(void) { puts("g"); return 6; }
int h(void) { puts("h"); return 7; }

Whenever any of these three functions is called (at runtime), it
announces the fact that it has just been called (by printing its
name to stdout). Then it returns the corresponding value. This
sequence of events is guaranteed by C's "sequence points", which
-- as their name implies -- tell you about runtime sequencing.

Now, if you do:

int x = f() + g() * h();

the fact is that x *must* be set to 47 -- the result of adding 5
to 42 (i.e., 6 times 7) -- but the lines "f", "g", and "h" can
come out in any order. A compiler can, if it likes, call f()
first. On a stack-oriented machine, it might make a lot of sense
to do this -- the machine code might be:

call f
call g
call h
mul
add

which means "call f, and push the result on a stack; then call g,
and push its result; then call h and push its result; then pop the
top two stack elements, multiply, and push the result; then pop
the top two stack elements, add, and push." This leaves 47 at
the top of the stack, which is the right answer.

On the other hand, a compiler can, if it likes, call either g or
h first. In fact, since f() is known to return 5, g() to return
6, and h() to return 7, it can call all three in any old order,
and then just set x to 47 without doing any arithmetic at all.

If you need a specific runtime order, you must use C's sequence
points to get it. For instance, if for some reason you must have
g() called first, then f(), and then finally h(), you can do this:

int fval, gval, hval;
int x;

gval = g();
fval = f();
hval = h();
x = fval + gval * hval;

Since semicolon-separated statements force sequence points, the
calls will now occur in the required order.

With all that out of the way, let me get back to the first of the
original expressions:

(++p)->len;

Without the parentheses, this would have two possible parses,
given the grammar from K&R (the name "pre++" distinguishes this
"++" operator from postfix-++; in tree form, they look the same).
The prefix++ could bind to p first, leaving the -to apply to its
result and len; or the -could bind to p and len first, leaving
the prefix++ to bind to its result:

[if -binds first] [if pre++ binds first]
pre++ ->
| / \
- pre++ len
/ \ |
p len p

Which one should you choose? Well, you should choose the one
the Standard's grammar would *force* you to pick; but if you
want to do that without the pain of using the Standard's grammar,
you look in your "precedence and associativity" table, starting
with the "precedence" part. It says that "->" has "higher
precedence" than "++", so the "->" will bind more tightly than
the ++. That is, the -operator will suck up its operands
first, leaving "++" to apply to the now-bound version. This
gives the left-hand parse tree, which is not the one we wanted.
So we can add parentheses, forcing the "++" to bind first,
leaving "->" to operate on the result of "++p".

Now consider the second expression:

p++->len;

Here, we have the postfix++ operator instead of the prefix++
operator. As before, our table says that postfix++ has "lower
precedence" than the -operator. But what can the -operator
bind to? Can it have "postfix++" on its left side, without the
postfix++ applying to anything?

->
/ \
++ len [and what happened to p?]

This is not a valid parse at all. It simply "cannot happen",
so it *does* not happen. "p++", on the other hand, is a valid
left hand side for the "->" operator, so we *can* build *this*
tree:

->
/ \
post++ len
|
p

So there is only one possible parse: the -binds to len on its
right, and to the result of "postfix++ applied to p" on its left.
We never even have to *look* at our operator table, because only
one parse makes sense.

This, incidentally, is also why parentheses are not required
around "b = c" in:

result = (a ? b = c : d);

There is only one possible parse for the middle part. (In effect,
the punctuators ? and : act like parentheses around the middle.
They only do not *look* like parentheses because they do not
have any shape to them.) On the other hand:

a ? b = c : d = e;

has two possible parses (well, two that are "grammatically correct"
even though one is meaningless):

?: =
/ | \ / \
/ | \ / e
a = = ?:
/ \ / \ / | \
b c d e a = d
/ \
b c

The grammar in the C Standard chooses the right hand version, which
turns out to be semantically incorrect, so that you get an error
message. Hence, to misuse ?: as an if-then-else and get the first
parse, you must parenthesize the "d = e" part:

a ? b = c : (d = e);

This forcibly binds the "d = e" together, so that the ?: can then
take it as a unit for its third operand.

Incidentally, depending on the compiler internals, ?: might be
represented as a pair of binary nodes, instead of a trinary node:

x ? y : z

might look more like this:

?
/ \
x :
/ \
y z

GCC's internals use (used? I have not touched them in ages) an
extensible representation so that there is no need to squeeze ?:
expressions into binary trees, but other compilers have only binary
trees, even using them for function calls:

call
/ \
fn args
/ \
a1 args
/ \
a2 args
/ \
a3 a4

is one way to represent "fn(a1, a2, a3, a4)". (Note that the left
operand of a "call" operator is actually a pointer to a function,
at least in principle. If "fn" is the name of a function, rather
than a pointer, it may need to be predeced by a "unary&" node. A
lot of this depends on compiler internal details, though.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to

spammers.

Why do you always provide a link to your website and "C for Smarties," when
you never update it with excellent posts such as this one? While you're
waiting for Thompson or Healthfield to redirect your post on parsing and
abstract syntax trees to comp.compilers, you could be using Google Groups
advanced search to consolidate your posts into a C compendium. Or at least,
you could fix them up a bit and put them into Wikipedia.
Rod Pemberton

Sep 28 '06 #4

Skarmander

Rod Pemberton wrote:

"Chris Torek" <no****@torek.netwrote in message
news:ef*********@news3.newsguy.com...

[on precedence, associativity and parsing in C]

Why do you always provide a link to your website and "C for Smarties," when
you never update it with excellent posts such as this one? While you're
waiting for Thompson or Healthfield to redirect your post on parsing and
abstract syntax trees to comp.compilers,

That would ignore the part where it ties in with the C grammar. To suggest
that the post would be subject to topicality policing is unfair to the
topicality cops...

you could be using Google Groups advanced search to consolidate your
posts into a C compendium. Or at least, you could fix them up a bit and
put them into Wikipedia.

Wikibooks would be more appropriate. Although consistency is hard to find on
an encyclopedia that anyone can edit, a certain writing style and approach
is agreed on, and detailed explanations like these don't fit (concise and
incorrect explanations are preferred). It would fit perfectly in a book to
teach C, however, the kind of which Wikibooks would host.

Of course, Wikibooks doesn't have the same kind of sex appeal.

S.

Sep 28 '06 #5

Associativity of ++ and --

Similar topics