473,322 Members | 1,409 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

C, lexical

Is there any Lex code available that describes how to scan C programs?
I'd like to
read someting related to this. One of my doubs is how C deals with
ambiguities,
for example, `a = x/*p;' or `a = x//*...*/-3;' (considering c99's
`//').

thanks in advance,

n.

Nov 15 '05 #1
13 1914
"Lucas Zimmerman" <ne******@gmail.com> wrote:
Is there any Lex code available that describes how to scan C programs?
I'd like to
read someting related to this. One of my doubs is how C deals with
ambiguities,
for example, `a = x/*p;' or `a = x//*...*/-3;' (considering c99's
`//').


Well, it's not C99, but maybe a good starting point:

http://www.lysator.liu.se/c/ANSI-C-grammar-l.html

Best Regards
--
Irrwahn Grausewitz (ir*******@freenet.de)
welcome to clc : http://www.ungerhu.com/jxh/clc.welcome.txt
clc faq-list : http://www.faqs.org/faqs/C-faq/faq/
clc frequent answers: http://benpfaff.org/writings/clc.
Nov 15 '05 #2
Irrwahn Grausewitz wrote:
"Lucas Zimmerman" <ne******@gmail.com> wrote:
Is there any Lex code available that describes how to scan C programs?
I'd like to
read someting related to this. One of my doubs is how C deals with
ambiguities,
for example, `a = x/*p;' or `a = x//*...*/-3;' (considering c99's
`//').


Well, it's not C99, but maybe a good starting point:

http://www.lysator.liu.se/c/ANSI-C-grammar-l.html

Best Regards


Amazing document! thanks a lot Irrwahn.
Interesting how `char x<:N:>;' is valid in C. Is this c99 too?
I'm still learning C after 3 years studying it!! There is always
something
new to know about this language.

thanks once again,

n.

Nov 15 '05 #3
Lucas Zimmerman wrote:
Is there any Lex code available that describes how to scan C programs?
I'd like to
read someting related to this. One of my doubs is how C deals with
ambiguities,
for example, `a = x/*p;' or `a = x//*...*/-3;' (considering c99's
`//').


Those are not ambiguous because C specifies the processing order. The
first example contains the start of comment. The second example
performs a division in C90 and fragment "a = x" in C99.

Thad

Nov 15 '05 #4
Irrwahn Grausewitz wrote:
"Lucas Zimmerman" <ne******@gmail.com> wrote:
Is there any Lex code available that describes how to scan C programs?
I'd like to
read someting related to this. One of my doubs is how C deals with
ambiguities,
for example, `a = x/*p;' or `a = x//*...*/-3;' (considering c99's
`//').


Well, it's not C99, but maybe a good starting point:

http://www.lysator.liu.se/c/ANSI-C-grammar-l.html

Best Regards


I'm not sure but I think I found a bug in this code.
....
L?\"(\\.|[^\\"])*\" { count(); return(STRING_LITERAL); }
....

If I'm right, there is one backslash missing, so we would have this:

L?\"(\\.|[^\\\"])*\" { count(); return(STRING_LITERAL); /* right? */ }

insted of the original. It makes sense to me, since '\' is a lex regex
operator.

n.

Nov 15 '05 #5
"Lucas Zimmerman" <ne******@gmail.com> wrote:
<snip>
Interesting how `char x<:N:>;' is valid in C. Is this c99 too?


Yup, digraphs are still mentioned in the standard, and I do not
expect them to be dropped any time soon.

ISO/IEC 9899:1999 (E) 6.4.6p3:

In all aspects of the language, the six tokens (*)
<: :> <% %> %: %:%:
behave, respectively, the same as the six tokens
[ ] { } # ##
except for their spelling.

(*) These tokens are sometimes called ‘‘digraphs’’.

Addition: note, that in the document I mentioned upthread the
*trigraphs* are missing.

ISO/IEC 9899:1999 (E) 5.2.1.1p1

All occurrences in a source file of the following sequences of three
characters (called trigraph sequences) are replaced with the
corresponding single character.
??= # ??) ] ??! |
??( [ ??' ^ ??> }
??/ \ ??< { ??- ~
No other trigraph sequences exist. Each ? that does not begin one of
the trigraphs listed above is not changed.

Should you ever notice, that printf("Huh???/n"); prints Huh?
followed
by a new-line, you now know why. :)

Best regards
--
Irrwahn Grausewitz (ir*******@freenet.de)
welcome to clc : http://www.ungerhu.com/jxh/clc.welcome.txt
clc faq-list : http://www.faqs.org/faqs/C-faq/faq/
clc frequent answers: http://benpfaff.org/writings/clc.
Nov 15 '05 #6
Lucas Zimmerman wrote:
Is there any Lex code available that describes how to scan C programs?
I'd like to
read someting related to this. One of my doubs is how C deals with
ambiguities,
for example, `a = x/*p;' or `a = x//*...*/-3;' (considering c99's
`//').


C uses a "greedy parser", ie. it tries to make the largest token
possible at each point. So, x/*p is always the start of a comment, not x
divided by whatever p points to.

Your second example is equivalent to a = x/ -3; on C89, but equivalent
to a = x (with no semicolon) on C99. One of the stranger ways to tell
the difference at run time is:

[sbiber@eagle c]$ cat version.c
#include <stdio.h>

int main(void)
{
if(1//**/2
) printf("C99\n");
else printf("C89\n");

return 0;
}
[sbiber@eagle c]$ c89 version.c && ./a.out
C89
[sbiber@eagle c]$ c99 version.c && ./a.out
C99

Note how the closing parenthesis of the if statement must be on the next
line, so that it is not part of the C99 comment.

--
Simon.
Nov 15 '05 #7
Irrwahn Grausewitz wrote:

All occurrences in a source file of the following sequences of three
characters (called trigraph sequences) are replaced with the
corresponding single character.
??= # ??) ] ??! |
??( [ ??' ^ ??> }
??/ \ ??< { ??- ~
No other trigraph sequences exist. Each ? that does not begin one
of the trigraphs listed above is not changed.

Should you ever notice, that printf("Huh???/n"); prints Huh?
followed by a new-line, you now know why. :)


A more insidious example (plagiarized from www.gotw.ca article 86):

#include <stdio.h>

int main(void)
{
int x = 1;
int i;
for( i = 0; i < 100; ++i )
// What will the next line do? Increment???????????/
++x;
printf("%d\n", x);
}

Nov 15 '05 #8
"Lucas Zimmerman" <ne******@gmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
Irrwahn Grausewitz wrote:
"Lucas Zimmerman" <ne******@gmail.com> wrote:
Is there any Lex code available that describes how to scan C programs?
I'd like to
read someting related to this. One of my doubs is how C deals with
ambiguities,
for example, `a = x/*p;' or `a = x//*...*/-3;' (considering c99's
`//').


Well, it's not C99, but maybe a good starting point:

http://www.lysator.liu.se/c/ANSI-C-grammar-l.html

Best Regards


Amazing document! thanks a lot Irrwahn.
Interesting how `char x<:N:>;' is valid in C. Is this c99 too?
I'm still learning C after 3 years studying it!! There is always
something
new to know about this language.


Its been almost 25 years, and I'm still learning as well ;-)

Enjoy!

Chqrlie.
Nov 15 '05 #9
another question...

I tried to compile the following code with gcc:
------
#include <stdio.h>
@

int main(void) {
return 0;
}
-------

the output was:
t.c:2: error: syntax error at '@' token

My question then is: why gcc says `syntax error'? I'm not
sure what is happening here but I think the lexical analyzer
is passing '@' as a valid token to the parser and then parser
says `ok, I'm not expecting a @ so, syntax error'.

am I missing something? I thought lex would be responsible
for giving this error message since '@' is (AFAIC) not a valid
C token.

thanks a lot in advance once again,

n.

Nov 15 '05 #10
In article <11**********************@g49g2000cwa.googlegroups .com>,
Lucas Zimmerman <ne******@gmail.com> wrote:
I tried to compile the following code with gcc:
------
#include <stdio.h>
@

int main(void) {
return 0;
}
------- the output was:
t.c:2: error: syntax error at '@' token My question then is: why gcc says `syntax error'?
Why not?
I'm not
sure what is happening here but I think the lexical analyzer
is passing '@' as a valid token to the parser and then parser
says `ok, I'm not expecting a @ so, syntax error'. am I missing something? I thought lex would be responsible
for giving this error message since '@' is (AFAIC) not a valid
C token.


It appears to me that you are assuming that the program 'lex' is
being used to do lexical analysis, and that the result is passed
to gcc. gcc does not, however, use 'lex': it has its own built-in
lexical analyzer as -part- of its processing. gcc doesn't even
have a seperate preprocessing program (e.g., "cpp"): it does
everything up to an intermediate code representation in a single
unified program. There might be a bunch of different routines
that that unified program calls upon, but that part is all one
program, so all the error messages are going to appear to be
from the same program.
--
I was very young in those days, but I was also rather dim.
-- Christopher Priest
Nov 15 '05 #11
ro******@ibd.nrc-cnrc.gc.ca (Walter Roberson) writes:
In article <11**********************@g49g2000cwa.googlegroups .com>,
Lucas Zimmerman <ne******@gmail.com> wrote:
I tried to compile the following code with gcc:
------
#include <stdio.h>
@

int main(void) {
return 0;
}
-------

the output was:
t.c:2: error: syntax error at '@' token

My question then is: why gcc says `syntax error'?


Why not?
I'm not
sure what is happening here but I think the lexical analyzer
is passing '@' as a valid token to the parser and then parser
says `ok, I'm not expecting a @ so, syntax error'.

am I missing something? I thought lex would be responsible
for giving this error message since '@' is (AFAIC) not a valid
C token.


It appears to me that you are assuming that the program 'lex' is
being used to do lexical analysis, and that the result is passed
to gcc. gcc does not, however, use 'lex': it has its own built-in
lexical analyzer as -part- of its processing. gcc doesn't even
have a seperate preprocessing program (e.g., "cpp"): it does
everything up to an intermediate code representation in a single
unified program. There might be a bunch of different routines
that that unified program calls upon, but that part is all one
program, so all the error messages are going to appear to be
from the same program.


Or perhaps he was using "lex" as an abbreviation of "lexical
analyzer". (In any case, the "lex" program *generates* a lexical
analyzer.)

Some versions of gcc do use a separate preprocessor. For example,
"gcc -v" with version 2.95.2 shows that it invokes "cpp" followed by
"cc1". Later versions just invoke "cc1". (Later phases aren't
invoked if there's a failure in an earlier phase.)

This is off-topic, except that it illustrates that a compiler has a
lot of freedom in how it implements the translation phases described
in section 5.1.1.2 of the standard.

With gcc versions 3.4.4 and 4.0.0, the error message I get is
"error: stray '@' in program".

Also, note that a lone @ character *is* a valid preprocessor token,
though it isn't a valid token. This means that this:

#if 0
@
#endif
int main(void){}

is a legal program, but this:

#if 0
"
#endif
int main(void){}

isn't (it invokes undefined behavior).

The point of all this is that, although the standard defines 8
distinct translation phases, an implementation is not required to
implement them as separate sequential phases. As long as it processes
legal programs correctly and issues diagnostics where required, it can
do whatever it likes.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 15 '05 #12
In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.org> wrote:
Also, note that a lone @ character *is* a valid preprocessor token,
though it isn't a valid token. This means that this: #if 0
@
#endif
int main(void){} is a legal program,


Keith, I'm not quite sure how you get that? @ is not part of
the basic C character set, so how can its behaviour be well defined?

As the validity of the presence of @ would appear to be an
implementation extension, then that implementation extension could
treat @ as an alias for " for example.
--
Any sufficiently old bug becomes a feature.
Nov 15 '05 #13
ro******@ibd.nrc-cnrc.gc.ca (Walter Roberson) writes:
In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.org> wrote:
Also, note that a lone @ character *is* a valid preprocessor token,
though it isn't a valid token. This means that this:

#if 0
@
#endif
int main(void){}

is a legal program,


Keith, I'm not quite sure how you get that? @ is not part of
the basic C character set, so how can its behaviour be well defined?

As the validity of the presence of @ would appear to be an
implementation extension, then that implementation extension could
treat @ as an alias for " for example.


You're right (at least partly); I didn't think of that.

C99 5.2.1 says that the source character set includes *at least* a
specified set of characters (upper and lower case letters, digits,
space, horizontal tab, vertical tab, form feed, and 29 punctuation
characters, *not* including '@'). But '@' can be, an often is, an
"extended character".

For an implementation that doesn't define '@' as part of the source
character set, any occurrence of @ in a source file invokes undefined
behavior (which, as you say, can include treating it as an alias for ").
But if '@' *is* part of the source character set, then it's a legal
preprocessor token (but not a legal token).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 15 '05 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Matt Knepley | last post by:
I must be misunderstanding how Python 2.3 handles lexical scoping. Here is a sample piece of code: def run(): a = 1 def run2(b): print a run2(2) print a run()
0
by: Collin VanDyck | last post by:
Hello! I have set up a pluggable SAX transformation pipeline which is made up of individual nodes that transform the source XML, and for the most part it works well. Each node in the pipeline...
4
by: bariole | last post by:
Hi I am trying to make lexical analysis of some simplified html code with flex tool. However that kind of work is new to me and I don't know where to start. I have searched a web but I didn't...
18
by: jslowery | last post by:
I am not completely knowledgable about the status of lexical scoping in Python, but it was my understanding that this was added in a long time ago around python2.1-python2.2 I am using python2.4...
2
by: Frank-René Schäfer | last post by:
penSource Project 'Quex': http://quex.sf.net Last weekend, the lexical analyser generator 'Quex' has been released on SourceForge. Quex provides advanced features for mode definitions and...
6
by: enaeher | last post by:
I would expect this code: globalFnArray = ; for (var i = 0; i < 5; i++) { globalFnArray.push (function () { alert (i) }); } for (var j = 0; j < 5; j++) { globalFnArray(); } to alert 0, 1, 2,...
2
by: Jon Harrop | last post by:
Just debating somewhere else whether or not Python might be considered a functional programming language. Lua, Ruby and Perl all seem to provide first class lexical closures. What is the current...
14
by: Khookie | last post by:
Woah... is it just me or do C programmers don't bother talking about how cool C can be (compared to Lisp, Haskell, etc.) - functionally speaking? // Lexical scoping - via nested functions...
3
by: globalrev | last post by:
i cant figure outif python has lexical or general scope. it seems functions have lexical scope but with some restrictions and some non-function scopes are dynamic?
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.