Bill Cunningham wrote:
Lexical generators such as Bison, Flex, Lex, &c produce C tokens for the
parser or compiler. What do these C tokens look like? According to ANSI C,
what does the standard have to say about C tokens? Does anyone know?
The Standard defines "preprocessing token" and "token,"
but nothing called a "C token."
A "preprocessing token" is a header name, an identifier,
a "pp-number," a character constant, a string literal, or a
punctuator.
A "token" is a keyword, an identifier, a constant, a string
literal, or a punctuator.
You'll notice a certain amount of overlap. This arises from
the way the Standard describes the translation of C source code
into executable programs: in the early stages of translation
(roughly speaking, up through the point where the preprocessor
has finished its work), the translation is described in terms of
converting incoming characters into preprocessing tokens and
performing various manipulations on them. Later stages convert
the preprocessing tokens into tokens, and attach various meanings
to them. For example, the two terms make it easy to explain why
`sizeof' cannot be evaluated by the preprocessor.
What do these preprocessing tokens and tokens "look like?"
Whatever the implementor finds convenient and pleasing. The
compiler will typically build data structures describing the
preprocessing tokens and tokens constructed from the source, and
will record various bits of useful information to assist the
further actions of the translation. An "identifier," for example,
will probably carry an indication of its scope, of its linkage
(internal, external, or none), and a description of the thing
it names. It might also carry additional handy information like
"The `&' operator is never applied to this identifier, so it's
eligible to be put into a register" -- but all such decorations
are at the implementor's whim.
Personally, I favor a sort of deep teal -- long on the blue,
and not too much green.
--
Er*********@sun.com