473,413 Members | 1,789 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,413 software developers and data experts.

Question about comment parsing between C and C++ compiler

Hi,
I am reading book <<Expert C Programming>>, it has the following
quiz,

a //*
//*/ b

In C and C++ compiler what does the above code trun out?

I think it is simple for C compiler, it is a/b.

But for C++ compiler, the book says it is a. The reason is "//"
makes the rest of line comment.

I am wondering on this.

Just couple page back, it mentions that compiler has a "maximal
munch strategy". For me when the C++ compiler reads the 1st line,
there is ambiguous intepretation, it could be "a// *" or "a / /*",
then if we apply the "maximal much strategy", it should use the second
one and parse the code to

a / /*
// */ b

which is a/b.

I think I am confused at somewhere, could you shed some light?

Thanks.

May 7 '07 #1
15 1665
li*****@hotmail.com wrote:
>
Hi,
I am reading book <<Expert C Programming>>, it has the following
quiz,

a //*
//*/ b

In C and C++ compiler what does the above code trun out?

I think it is simple for C compiler, it is a/b.

But for C++ compiler, the book says it is a. The reason is "//"
makes the rest of line comment.

I am wondering on this.

Just couple page back, it mentions that compiler has a "maximal
munch strategy". For me when the C++ compiler reads the 1st line,
there is ambiguous intepretation, it could be "a// *" or "a / /*",
then if we apply the "maximal much strategy", it should use the second
one and parse the code to

a / /*
// */ b

which is a/b.

I think I am confused at somewhere, could you shed some light?
What does "maximal munch strategy" mean?

--
pete
May 7 '07 #2
li*****@hotmail.com said:
Hi,
I am reading book <<Expert C Programming>>, it has the following
quiz,

a //*
//*/ b

In C and C++ compiler what does the above code trun out?

I think it is simple for C compiler, it is a/b.
It's fairly simple, but nowadays it is not quite as simple as you make
out. What PvdL didn't realise was that //-comments would be introduced
into C in the 1999 language revision!
But for C++ compiler, the book says it is a. The reason is "//"
makes the rest of line comment.
Yeah.
I am wondering on this.

Just couple page back, it mentions that compiler has a "maximal
munch strategy".
Grab the biggest token you can, yes.

For me when the C++ compiler reads the 1st line,
there is ambiguous intepretation, it could be "a// *" or "a / /*",
No, it can't be either of those. It could be

a / /*

or

a // *

and maximal munch dictates the second.

Whether C++ actually has a maximal munch rule is a question that our
friends in comp.lang.c++ would undoubtedly be able to answer.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
May 7 '07 #3
pete <pf*****@mindspring.comwrites:
[...]
What does "maximal munch strategy" mean?
It means that, when determining the next token, the compiler grabs as
many characters as possible to get a valid token.

For example, this:

x+++++y

is tokenized as

x ++ ++ + y

which results in a syntax error, even though this:

x ++ + ++ y

would result in a valid parse. (Tokenization doesn't account for
later phases.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
May 7 '07 #4
Rg
On 7 maio, 03:14, Keith Thompson <k...@mib.orgwrote:
>
[...]

It means that, when determining the next token, the compiler grabs as
many characters as possible to get a valid token.

[...]
It other words, it means the lexical analyzer is greedy.

Ain't that much simpler to say?

May 7 '07 #5
On May 7, 4:05 pm, Richard Heathfield <r...@see.sig.invalidwrote:
Whether C++ actually has a maximal munch rule is a question that our
friends in comp.lang.c++ would undoubtedly be able to answer.
C++98 does (I'll save the OP the effort of making a new post there).

Off-topic but possibly interesting aside: C++ uses "<" and
">" like brackets in some contexts, but the maximal munch
rule has the effect that <a<b>gets parsed unexpectedly
because the closing chevrons get tokenised as the right-shift
operator.

There's been a DR accepted to change this so that >is not
maximally munched in this situation -- for better or worse.

May 7 '07 #6
Old Wolf said:
On May 7, 4:05 pm, Richard Heathfield <r...@see.sig.invalidwrote:
>Whether C++ actually has a maximal munch rule is a question that our
friends in comp.lang.c++ would undoubtedly be able to answer.

C++98 does (I'll save the OP the effort of making a new post there).

Off-topic but possibly interesting aside: C++ uses "<" and
">" like brackets in some contexts, but the maximal munch
rule has the effect that <a<b>gets parsed unexpectedly
because the closing chevrons get tokenised as the right-shift
operator.
I don't see why that's unexpected. Maximum munch is hardly a secret in
C, and I presume it's no secret in C++ either. I didn't know it applied
in C++, but in my C++ programming I have always conservatively assumed
that it does.
There's been a DR accepted to change this so that >is not
maximally munched in this situation -- for better or worse.
It's for worse. Hard cases make bad law.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
May 7 '07 #7
In article <t9******************************@bt.com>,
Richard Heathfield <rj*@see.sig.invalidwrote:
>There's been a DR accepted to change this so that >is not
maximally munched in this situation -- for better or worse.
>It's for worse. Hard cases make bad law.
Unless the exception proves the rule.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
May 7 '07 #8
Richard Tobin said:
In article <t9******************************@bt.com>,
Richard Heathfield <rj*@see.sig.invalidwrote:
>>There's been a DR accepted to change this so that >is not
maximally munched in this situation -- for better or worse.
>>It's for worse. Hard cases make bad law.

Unless the exception proves the rule.
No, not really. Exceptions are sometimes necessary, but never elegant.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
May 7 '07 #9
On May 8, 10:12 am, Richard Heathfield <r...@see.sig.invalidwrote:
Old Wolf said:
There's been a DR accepted to change this so that >is not
maximally munched in this situation -- for better or worse.

It's for worse. Hard cases make bad law.
Funny situation really. I assume the DR came about because
many newbies were being tripped up by the situation; maximal
munch must be 'unintuitive' for most people. As it is for me,
I might add; my mind tends to parse a sentence in the way
that makes the most sense and I suspect others' minds work
that way too (as evinced by the fact that people can read
all sorts of mis-spelled garbage). In the <a<b>case, the
pairs of matching chevron brackets is clearly what was intended.

Of course it makes the compiler writers' job harder too, but
C++ parsing is already so convoluted and context sensitive
that the horse has long since bolted on the idea of having
an easily-parsable syntax.

May 7 '07 #10
Richard Heathfield <rj*@see.sig.invalidwrites:
Richard Tobin said:
>In article <t9******************************@bt.com>,
Richard Heathfield <rj*@see.sig.invalidwrote:
>>>There's been a DR accepted to change this so that >is not
maximally munched in this situation -- for better or worse.
>>>It's for worse. Hard cases make bad law.

Unless the exception proves the rule.

No, not really. Exceptions are sometimes necessary, but never elegant.
Except when they are, of course. 8-)}

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
May 7 '07 #11
Old Wolf wrote:
On May 8, 10:12 am, Richard Heathfield <r...@see.sig.invalidwrote:
Old Wolf said:
There's been a DR accepted to change this so that >is not
maximally munched in this situation -- for better or worse.
It's for worse. Hard cases make bad law.

Funny situation really. I assume the DR came about because
many newbies were being tripped up by the situation; maximal
munch must be 'unintuitive' for most people.
Shortly after I had started my current position at work, I "solved"
that for a guy. Of course, I was able to do so because I'd just read
about it on clc++, but hey, never let them see behind the curtain.

Brian
May 7 '07 #12
Default User said:
Old Wolf wrote:
<snip>
>>
Funny situation really. I assume the DR came about because
many newbies were being tripped up by the situation; maximal
munch must be 'unintuitive' for most people.

Shortly after I had started my current position at work, I "solved"
that for a guy. Of course, I was able to do so because I'd just read
about it on clc++, but hey, never let them see behind the curtain.
It wouldn't matter if you did. With a few honorable exceptions,
comp.lang.c can be viewed as an inordinately long series of
unsuccessful attempts to persuade people to lift the curtain.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
May 7 '07 #13
On May 6, 10:05 pm, Richard Heathfield <r...@see.sig.invalidwrote:
linq...@hotmail.com said:
Hi,
I am reading book <<Expert C Programming>>, it has the following
quiz,
a //*
//*/ b
In C and C++ compiler what does the above code trun out?
I think it is simple for C compiler, it is a/b.

It's fairly simple, but nowadays it is not quite as simple as you make
out. What PvdL didn't realise was that //-comments would be introduced
into C in the 1999 language revision!
But for C++ compiler, the book says it is a. The reason is "//"
makes the rest of line comment.

Yeah.
I am wondering on this.
Just couple page back, it mentions that compiler has a "maximal
munch strategy".

Grab the biggest token you can, yes.

For me when the C++ compiler reads the 1st line,
there is ambiguous intepretation, it could be "a// *" or "a / /*",

No, it can't be either of those. It could be

a / /*

or

a // *

and maximal munch dictates the second.

Whether C++ actually has a maximal munch rule is a question that our
friends in comp.lang.c++ would undoubtedly be able to answer.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999http://www.cpax.org.uk
email: rjh at the above domain, - www.
We are same at what are the options for the parser, namely 2 options
are, in your format,
a / /*

or

a // *
The reason I think "maximal much strategy" should take option 1 is, I
think, if parser sees /* then it would take it all the way to the
matching */ and take the whole thing together as one token, this
definitely has more characters than option 2.

I do not check the compiler parser implementation and whether the
standard mandates this, so this could be compiler dependent?

Thanks.

May 8 '07 #14
li*****@hotmail.com said:

<snip>
We are same at what are the options for the parser, namely 2 options
are, in your format,
>a / /*

or

a // *
The reason I think "maximal much strategy" should take option 1 is, I
think, if parser sees /* then it would take it all the way to the
matching */ and take the whole thing together as one token, this
definitely has more characters than option 2.
6.4.9 Comments
1 Except within a character constant, a string literal, or a comment,
the characters /* introduce a comment. The contents of such a comment
are examined only to identify multibyte characters and to find the
characters */ that terminate it.69)
2 Except within a character constant, a string literal, or a comment,
the characters // introduce a comment that includes all multibyte
characters up to, but not including, the next new-line character. The
contents of such a comment are examined only to identify multibyte
characters and to find the terminating new-line character.

As you can see if you read carefully, //* falls within para 2, not para
1. The characters // are encountered first, so they fall within the
purview of para 2 before we get as far as the * which would otherwise
have invoked para 1.

Or, if you prefer, we can think of it in maximum munch terms again.
Maximum munch is not predictive. We don't say "which parse will give us
the biggest tokens possible?" but "starting with and including THIS
CHARACTER, what is the biggest token we can grab?"

And thus we take // rather than / /*, because // is bigger than /
whichever way you slice it.

I do not check the compiler parser implementation and whether the
standard mandates this, so this could be compiler dependent?
No. Your decision not to check whether the Standard mandates a given
behaviour does not affect the wording of the Standard or the
conformance of implementations to that Standard.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
May 8 '07 #15
On May 8, 12:57 pm, Richard Heathfield <r...@see.sig.invalidwrote:
linq...@hotmail.com said:

<snip>
We are same at what are the options for the parser, namely 2 options
are, in your format,
a / /*
or
a // *
The reason I think "maximal much strategy" should take option 1 is, I
think, if parser sees /* then it would take it all the way to the
matching */ and take the whole thing together as one token, this
definitely has more characters than option 2.

6.4.9 Comments
1 Except within a character constant, a string literal, or a comment,
the characters /* introduce a comment. The contents of such a comment
are examined only to identify multibyte characters and to find the
characters */ that terminate it.69)
2 Except within a character constant, a string literal, or a comment,
the characters // introduce a comment that includes all multibyte
characters up to, but not including, the next new-line character. The
contents of such a comment are examined only to identify multibyte
characters and to find the terminating new-line character.

As you can see if you read carefully, //* falls within para 2, not para
1. The characters // are encountered first, so they fall within the
purview of para 2 before we get as far as the * which would otherwise
have invoked para 1.

Or, if you prefer, we can think of it in maximum munch terms again.
Maximum munch is not predictive. We don't say "which parse will give us
the biggest tokens possible?" but "starting with and including THIS
CHARACTER, what is the biggest token we can grab?"

And thus we take // rather than / /*, because // is bigger than /
whichever way you slice it.
I do not check the compiler parser implementation and whether the
standard mandates this, so this could be compiler dependent?

No. Your decision not to check whether the Standard mandates a given
behaviour does not affect the wording of the Standard or the
conformance of implementations to that Standard.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999http://www.cpax.org.uk
email: rjh at the above domain, - www.
That is it.

Thanks for the elaboration.

May 8 '07 #16

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: annoyingmouse2002 | last post by:
Hi there, sorry if this a long post but I'm really just starting out. I've been using MSXML to parse an OWL but would like to use a different solution. Basically it reads the OWL (Based on XML)...
55
by: ben | last post by:
is it true that a function without an inline keyword never get inlined? If not true when is it inlined or not? ben
10
by: Vavel | last post by:
Hi all! I want to insert the record into the table by using an application program that includes the following statements: EXEC SQL BEGIN DECLARE SECTION; long hvInt_Stor; long hvExt_Stor;...
7
by: Lyn | last post by:
Hi and Season's Greetings to all. I have a question regarding the use of a qualifier word "Global". I cannot find any reference to this in Access help, nor in books or on the Internet. "Global"...
42
by: Holger | last post by:
Hi guys Tried searching for a solution to this, but the error message is so generic, that I could not get any meaningfull results. Anyways - errormessage:...
10
by: Lloyd Dupont | last post by:
Let say I have 2 methods: void BeginGroup(); void BeginGroup(string msg); when I want to refer to them I write /// <see cref="BeginGroup"/> But this cause a compiler warning, where my...
42
by: mellyshum123 | last post by:
I need to read in a comma separated file, and for this I was going to use fgets. I was reading about it at http://www.cplusplus.com/ref/ and I noticed that the document said: "Reads characters...
13
by: James | last post by:
Hello, I'm a newbie to Python & wondering someone can help me with this... I have this code: -------------------------- #! /usr/bin/python import sys
5
by: =?Utf-8?B?SmVzc2ljYQ==?= | last post by:
Hello, I have a pInvoke question. This is the C function that is exported from one of the C dll, extern __declspec(dllexport) IM_RET_CODE ST_import (IM_MODE mode, char *filename,...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.