ju**********@yahoo.co.in wrote:
Hi,
I have a very basic doubt about associativity and precedence of
operators. I am not a computer science person and may find it quite
weird.
This isn't really topical for comp.lang.c, but since precedence and
associativity (not to mention their half-brother, evaluation order)
seem to cause problems to C beginners, I'll give it a go.
Consider an expression
4+5*2
As per the tutorials that I read earlier,
(which ones?)
there could be two different parse trees possible for this.
No parse tree without a grammar. What the parse tree is depends
on the grammar that's specified. For an expression grammar,
likely productions are
exp ::= number | exp + exp | exp * exp ...
This grammar is ambiguous, as you say, since:
+ and *
/ \ / \
4 * + 3
/ \ / \
5 2 4 5
would both be valid parse trees.
So, to avoid such ambiguities, the language specify the associativity
and precedence rules so that the compiler always generate a unique
parse tree for an expression.
No. To avoid such ambiguities, an unambiguous grammar is used. /One
way/ to disambiguate is to apply precedence and associativity rules.
Another is to write the grammar so that they're unnecessary:
exp ::= additive
additive ::= additive | additive + multiplicative
multiplicative ::= operand | operand * multiplicative
operand ::= integer
Here we can say "+ associates to the left", because we can't
parse "1 + 2 + 3" as "1 + additive" because the right operand
of `+` has to be multiplicative and hence (in this grammar) can't
include a `+` operator.
Similarly we can say that `* has precedence over +` because
"1 + 2 * 3" can't be parsed as "operand * multiplicative" because
`operand` can't include `+` in this grammar.
But these "associative" and "precedence" rules are just shorthands
for talking about the grammar. The C grammar of the (either) Standard
/does not use them/; it has explicit productions in the grammar
instead.
It is, I think, a matter of opinion/style (and parser-generation tool)
which is the "better" way of describing how operators and operands
group in a language. [I lean toward defining it with a full grammar
but ensuring that the shorthand terms work clearly with it.]
But, what I feel that the lexical analyzer always starts reading the
tokens from left. If this is the case it will always generate the
second parse tree and the expression is equivalent to
(4+5)*2.
Well ... no.
First, a piece of terminology: it's not the lexical analyser that parses
the expression, it's the parser. The parser eats the lexical tokens
the lexical analyser provides.
and there will be no ambiguity. If we start from left we will always
have only one parse tree and there is no need for any associativity and
precedence.
If the parser starts from the left and eats as few tokens at a time as
possible, it has /imposed/ (left) associativity and (no) precedence on
the grammar.
Here's an expression parser. (Incomplete pseudo-code, alas, for brevity.)
parseExpression():
var L = parseOperand()
if next token is operator then
var op = parseOperator()
var R = parseOperand()
return infix( L, op, R )
else
return L
endif
This does /right/-association, without precedence. If you find an
operator, you have to /decide/ whether you want to eat the
minimum (left association) or maximum (right association) or
some intermediate (precedence) size of operand.
Consider other example,
4+5+8
The articles say that it may be treated as (4+5)+8 Or 4+(5+8) and to
avoid this associativity rule is required that says "+" operator is
left associative.
You have to disambiguate /somehow/, yes: you have to answer the
question "what should this mean?", or, at least, "what structure
should this have?"
I feel that if the lexical analyzer starts from left it should always
be equivalent to (4+5)+8 and therefore no associativity rule is
required.
The choice doesn't go away if you go left-to-right, but it's not so
easy to /see/ that it's a choice.
I made a mistake like this once [1]. I was designing and implementing
a (Pure-)functional programming language, in which I had the usual
expressions (leaving out a whole host of details):
Exp ::=
let Dec in Exp [Dec ::= Id = Exp, if it matters]
| Exp Exp
| Id
Writing the parser is dead easy:
parseExp()
var F = parseOperand()
while current token can start an Exp do
F := apply( F, parseExp() )
endwhile
return F
parseOperand()
if current token is `let` then
skip token, var D = parseDec()
skip `in`, var E = parseExp()
return let( D, E )
else
return parseId()
endif
Brill, no problem, this ensures that the Exp of a let-expression is as
long as possible, as tradition required.
What I /hadn't/ noticed was that this grammar is ambiguous, because
not only does
let D in F X
have two parses /according to the grammar/, viz
let D in (F X)
(let D in F) X
but there's a whole other issue of
F let D in X Y
which I hadn't even noticed. It took exposure to an LL(1) parser
generator to show me that although my /code/ was one-token-lookahead,
no backtracking, potter along left-to-right, my /grammar/ was not LL(1)
and Decisions Had To Be Made.
I made them by requiring that `let`s had to appear as "full expressions",
ie not embedded in other expressions unless bracketed. Since this covered
about 99.17254% of the actual code written, it was not a problem. (I
paid off the other 0.82746% of the code with a promise of a quick
assignment with the garbage collector.)
[1] OK, once /that I noticed/.
--
Chris "Essen -6 and counting" Dollin
A rock is not a fact. A rock is a rock.