By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,473 Members | 3,657 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,473 IT Pros & Developers. It's quick & easy.

Question about comment parsing between C and C++ compiler

P: n/a
Hi,
I am reading book <<Expert C Programming>>, it has the following
quiz,

a //*
//*/ b

In C and C++ compiler what does the above code trun out?

I think it is simple for C compiler, it is a/b.

But for C++ compiler, the book says it is a. The reason is "//"
makes the rest of line comment.

I am wondering on this.

Just couple page back, it mentions that compiler has a "maximal
munch strategy". For me when the C++ compiler reads the 1st line,
there is ambiguous intepretation, it could be "a// *" or "a / /*",
then if we apply the "maximal much strategy", it should use the second
one and parse the code to

a / /*
// */ b

which is a/b.

I think I am confused at somewhere, could you shed some light?

Thanks.

May 7 '07 #1
Share this Question
Share on Google+
15 Replies


P: n/a
li*****@hotmail.com wrote:
>
Hi,
I am reading book <<Expert C Programming>>, it has the following
quiz,

a //*
//*/ b

In C and C++ compiler what does the above code trun out?

I think it is simple for C compiler, it is a/b.

But for C++ compiler, the book says it is a. The reason is "//"
makes the rest of line comment.

I am wondering on this.

Just couple page back, it mentions that compiler has a "maximal
munch strategy". For me when the C++ compiler reads the 1st line,
there is ambiguous intepretation, it could be "a// *" or "a / /*",
then if we apply the "maximal much strategy", it should use the second
one and parse the code to

a / /*
// */ b

which is a/b.

I think I am confused at somewhere, could you shed some light?
What does "maximal munch strategy" mean?

--
pete
May 7 '07 #2

P: n/a
li*****@hotmail.com said:
Hi,
I am reading book <<Expert C Programming>>, it has the following
quiz,

a //*
//*/ b

In C and C++ compiler what does the above code trun out?

I think it is simple for C compiler, it is a/b.
It's fairly simple, but nowadays it is not quite as simple as you make
out. What PvdL didn't realise was that //-comments would be introduced
into C in the 1999 language revision!
But for C++ compiler, the book says it is a. The reason is "//"
makes the rest of line comment.
Yeah.
I am wondering on this.

Just couple page back, it mentions that compiler has a "maximal
munch strategy".
Grab the biggest token you can, yes.

For me when the C++ compiler reads the 1st line,
there is ambiguous intepretation, it could be "a// *" or "a / /*",
No, it can't be either of those. It could be

a / /*

or

a // *

and maximal munch dictates the second.

Whether C++ actually has a maximal munch rule is a question that our
friends in comp.lang.c++ would undoubtedly be able to answer.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
May 7 '07 #3

P: n/a
pete <pf*****@mindspring.comwrites:
[...]
What does "maximal munch strategy" mean?
It means that, when determining the next token, the compiler grabs as
many characters as possible to get a valid token.

For example, this:

x+++++y

is tokenized as

x ++ ++ + y

which results in a syntax error, even though this:

x ++ + ++ y

would result in a valid parse. (Tokenization doesn't account for
later phases.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
May 7 '07 #4

P: n/a
Rg
On 7 maio, 03:14, Keith Thompson <k...@mib.orgwrote:
>
[...]

It means that, when determining the next token, the compiler grabs as
many characters as possible to get a valid token.

[...]
It other words, it means the lexical analyzer is greedy.

Ain't that much simpler to say?

May 7 '07 #5

P: n/a
On May 7, 4:05 pm, Richard Heathfield <r...@see.sig.invalidwrote:
Whether C++ actually has a maximal munch rule is a question that our
friends in comp.lang.c++ would undoubtedly be able to answer.
C++98 does (I'll save the OP the effort of making a new post there).

Off-topic but possibly interesting aside: C++ uses "<" and
">" like brackets in some contexts, but the maximal munch
rule has the effect that <a<b>gets parsed unexpectedly
because the closing chevrons get tokenised as the right-shift
operator.

There's been a DR accepted to change this so that >is not
maximally munched in this situation -- for better or worse.

May 7 '07 #6

P: n/a
Old Wolf said:
On May 7, 4:05 pm, Richard Heathfield <r...@see.sig.invalidwrote:
>Whether C++ actually has a maximal munch rule is a question that our
friends in comp.lang.c++ would undoubtedly be able to answer.

C++98 does (I'll save the OP the effort of making a new post there).

Off-topic but possibly interesting aside: C++ uses "<" and
">" like brackets in some contexts, but the maximal munch
rule has the effect that <a<b>gets parsed unexpectedly
because the closing chevrons get tokenised as the right-shift
operator.
I don't see why that's unexpected. Maximum munch is hardly a secret in
C, and I presume it's no secret in C++ either. I didn't know it applied
in C++, but in my C++ programming I have always conservatively assumed
that it does.
There's been a DR accepted to change this so that >is not
maximally munched in this situation -- for better or worse.
It's for worse. Hard cases make bad law.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
May 7 '07 #7

P: n/a
In article <t9******************************@bt.com>,
Richard Heathfield <rj*@see.sig.invalidwrote:
>There's been a DR accepted to change this so that >is not
maximally munched in this situation -- for better or worse.
>It's for worse. Hard cases make bad law.
Unless the exception proves the rule.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
May 7 '07 #8

P: n/a
Richard Tobin said:
In article <t9******************************@bt.com>,
Richard Heathfield <rj*@see.sig.invalidwrote:
>>There's been a DR accepted to change this so that >is not
maximally munched in this situation -- for better or worse.
>>It's for worse. Hard cases make bad law.

Unless the exception proves the rule.
No, not really. Exceptions are sometimes necessary, but never elegant.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
May 7 '07 #9

P: n/a
On May 8, 10:12 am, Richard Heathfield <r...@see.sig.invalidwrote:
Old Wolf said:
There's been a DR accepted to change this so that >is not
maximally munched in this situation -- for better or worse.

It's for worse. Hard cases make bad law.
Funny situation really. I assume the DR came about because
many newbies were being tripped up by the situation; maximal
munch must be 'unintuitive' for most people. As it is for me,
I might add; my mind tends to parse a sentence in the way
that makes the most sense and I suspect others' minds work
that way too (as evinced by the fact that people can read
all sorts of mis-spelled garbage). In the <a<b>case, the
pairs of matching chevron brackets is clearly what was intended.

Of course it makes the compiler writers' job harder too, but
C++ parsing is already so convoluted and context sensitive
that the horse has long since bolted on the idea of having
an easily-parsable syntax.

May 7 '07 #10

P: n/a
Richard Heathfield <rj*@see.sig.invalidwrites:
Richard Tobin said:
>In article <t9******************************@bt.com>,
Richard Heathfield <rj*@see.sig.invalidwrote:
>>>There's been a DR accepted to change this so that >is not
maximally munched in this situation -- for better or worse.
>>>It's for worse. Hard cases make bad law.

Unless the exception proves the rule.

No, not really. Exceptions are sometimes necessary, but never elegant.
Except when they are, of course. 8-)}

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
May 7 '07 #11

P: n/a
Old Wolf wrote:
On May 8, 10:12 am, Richard Heathfield <r...@see.sig.invalidwrote:
Old Wolf said:
There's been a DR accepted to change this so that >is not
maximally munched in this situation -- for better or worse.
It's for worse. Hard cases make bad law.

Funny situation really. I assume the DR came about because
many newbies were being tripped up by the situation; maximal
munch must be 'unintuitive' for most people.
Shortly after I had started my current position at work, I "solved"
that for a guy. Of course, I was able to do so because I'd just read
about it on clc++, but hey, never let them see behind the curtain.

Brian
May 7 '07 #12

P: n/a
Default User said:
Old Wolf wrote:
<snip>
>>
Funny situation really. I assume the DR came about because
many newbies were being tripped up by the situation; maximal
munch must be 'unintuitive' for most people.

Shortly after I had started my current position at work, I "solved"
that for a guy. Of course, I was able to do so because I'd just read
about it on clc++, but hey, never let them see behind the curtain.
It wouldn't matter if you did. With a few honorable exceptions,
comp.lang.c can be viewed as an inordinately long series of
unsuccessful attempts to persuade people to lift the curtain.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
May 7 '07 #13

P: n/a
On May 6, 10:05 pm, Richard Heathfield <r...@see.sig.invalidwrote:
linq...@hotmail.com said:
Hi,
I am reading book <<Expert C Programming>>, it has the following
quiz,
a //*
//*/ b
In C and C++ compiler what does the above code trun out?
I think it is simple for C compiler, it is a/b.

It's fairly simple, but nowadays it is not quite as simple as you make
out. What PvdL didn't realise was that //-comments would be introduced
into C in the 1999 language revision!
But for C++ compiler, the book says it is a. The reason is "//"
makes the rest of line comment.

Yeah.
I am wondering on this.
Just couple page back, it mentions that compiler has a "maximal
munch strategy".

Grab the biggest token you can, yes.

For me when the C++ compiler reads the 1st line,
there is ambiguous intepretation, it could be "a// *" or "a / /*",

No, it can't be either of those. It could be

a / /*

or

a // *

and maximal munch dictates the second.

Whether C++ actually has a maximal munch rule is a question that our
friends in comp.lang.c++ would undoubtedly be able to answer.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999http://www.cpax.org.uk
email: rjh at the above domain, - www.
We are same at what are the options for the parser, namely 2 options
are, in your format,
a / /*

or

a // *
The reason I think "maximal much strategy" should take option 1 is, I
think, if parser sees /* then it would take it all the way to the
matching */ and take the whole thing together as one token, this
definitely has more characters than option 2.

I do not check the compiler parser implementation and whether the
standard mandates this, so this could be compiler dependent?

Thanks.

May 8 '07 #14

P: n/a
li*****@hotmail.com said:

<snip>
We are same at what are the options for the parser, namely 2 options
are, in your format,
>a / /*

or

a // *
The reason I think "maximal much strategy" should take option 1 is, I
think, if parser sees /* then it would take it all the way to the
matching */ and take the whole thing together as one token, this
definitely has more characters than option 2.
6.4.9 Comments
1 Except within a character constant, a string literal, or a comment,
the characters /* introduce a comment. The contents of such a comment
are examined only to identify multibyte characters and to find the
characters */ that terminate it.69)
2 Except within a character constant, a string literal, or a comment,
the characters // introduce a comment that includes all multibyte
characters up to, but not including, the next new-line character. The
contents of such a comment are examined only to identify multibyte
characters and to find the terminating new-line character.

As you can see if you read carefully, //* falls within para 2, not para
1. The characters // are encountered first, so they fall within the
purview of para 2 before we get as far as the * which would otherwise
have invoked para 1.

Or, if you prefer, we can think of it in maximum munch terms again.
Maximum munch is not predictive. We don't say "which parse will give us
the biggest tokens possible?" but "starting with and including THIS
CHARACTER, what is the biggest token we can grab?"

And thus we take // rather than / /*, because // is bigger than /
whichever way you slice it.

I do not check the compiler parser implementation and whether the
standard mandates this, so this could be compiler dependent?
No. Your decision not to check whether the Standard mandates a given
behaviour does not affect the wording of the Standard or the
conformance of implementations to that Standard.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
May 8 '07 #15

P: n/a
On May 8, 12:57 pm, Richard Heathfield <r...@see.sig.invalidwrote:
linq...@hotmail.com said:

<snip>
We are same at what are the options for the parser, namely 2 options
are, in your format,
a / /*
or
a // *
The reason I think "maximal much strategy" should take option 1 is, I
think, if parser sees /* then it would take it all the way to the
matching */ and take the whole thing together as one token, this
definitely has more characters than option 2.

6.4.9 Comments
1 Except within a character constant, a string literal, or a comment,
the characters /* introduce a comment. The contents of such a comment
are examined only to identify multibyte characters and to find the
characters */ that terminate it.69)
2 Except within a character constant, a string literal, or a comment,
the characters // introduce a comment that includes all multibyte
characters up to, but not including, the next new-line character. The
contents of such a comment are examined only to identify multibyte
characters and to find the terminating new-line character.

As you can see if you read carefully, //* falls within para 2, not para
1. The characters // are encountered first, so they fall within the
purview of para 2 before we get as far as the * which would otherwise
have invoked para 1.

Or, if you prefer, we can think of it in maximum munch terms again.
Maximum munch is not predictive. We don't say "which parse will give us
the biggest tokens possible?" but "starting with and including THIS
CHARACTER, what is the biggest token we can grab?"

And thus we take // rather than / /*, because // is bigger than /
whichever way you slice it.
I do not check the compiler parser implementation and whether the
standard mandates this, so this could be compiler dependent?

No. Your decision not to check whether the Standard mandates a given
behaviour does not affect the wording of the Standard or the
conformance of implementations to that Standard.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999http://www.cpax.org.uk
email: rjh at the above domain, - www.
That is it.

Thanks for the elaboration.

May 8 '07 #16

This discussion thread is closed

Replies have been disabled for this discussion.