By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,190 Members | 765 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,190 IT Pros & Developers. It's quick & easy.

What is a type?

P: n/a
I would like to add at the beginning of the C tutorial I am writing
a short blurb about what "types" are. I came up with the following text.

Please can you comment?
Did I miss something?
Is there something wrong in there?
--------------------------------------------------------------------
Types
A type is a definition for a sequence of storage bits. It gives the
meaning of the data stored in memory. If we say that the object a is an
int, it means that the bits stored at that location are to be understood
as a natural number that is built by consecutive additions of powers of
two. If we say that the type of a is a double, it means that the bits
are to be understood as the IEEE 754 standard sequences of bits
representing a double precision floating point value.

Types can be primitive types (i.e. built-in types) or composite types,
i.e. types built from several primitive types.

Functions have a type too. The type of a function is determined by the
type of its return value, and all its arguments. The type of a function
is its interface with the outside world: its inputs (arguments) and its
outputs (return value).
Types in C can be incomplete, i.e. they can exist as types but nothing
is known about them, neither their size nor their bit-layout. They are
useful for encapsulating data into entities that are known only to
certain parts of the program.
Each type can have an associated pointer type: for int we have int
pointer, for double we have double pointer, etc. We can have also
pointers that point to an unspecified object. They are written as void
*, i.e. pointers to void.

The primitive types in lcc-win32 are:
Type Size in lcc-win32 Standard?
bool 1 Available in C99
char (signed/unsigned) 1 yes
short (signed/unsigned) 2 yes
int (signed/unsigned) 4 yes
long (signed/unsigned) 4 yes
long long 8 Available in C99
float 4 yes
double 8 yes
long double 12 Available in C99
complex types 16 May be absent in some

implementations
qfloat 56 Specific to lcc-win32
bignum variable Specific to lcc-win32
Nov 14 '05 #1
Share this Question
Share on Google+
51 Replies


P: n/a
jacob navia wrote:

I would like to add at the beginning of the C tutorial I am writing
a short blurb about what "types" are.
I came up with the following text.

Please can you comment?
"primitive types" and "composite types"
seem similar to the terms "basic types" and "aggregate types".
I think by "built-in types", you mean basic types.
It's not clear whether or not composite types include unions.

You mention object types, incomplete types and function types,
but I would emphasize that those three major catagories
are the top layer in the type hierarchy.

I would say that function types,
as well as being determined by the return type,
are determined by the "parameters" rather than the "arguments".
If a function has a parameter of type int,
you can call it with an argument of type char.
Did I miss something?
Alignment requirements are according to type.
Is there something wrong in there?
C doesn't mandate IEEE floating point.
--------------------------------------------------------------------
Types
A type is a definition for a sequence of storage bits. It gives the
meaning of the data stored in memory.
If we say that the object a is an int,
it means that the bits stored at that location are to be understood
as a natural number that is built by consecutive additions
of powers of two. If we say that the type of a is a double,
it means that the bits
are to be understood as the IEEE 754 standard sequences of bits
representing a double precision floating point value.

Types can be primitive types (i.e. built-in types) or composite types,
i.e. types built from several primitive types.

Functions have a type too. The type of a function is determined by the
type of its return value, and all its arguments.
The type of a function
is its interface with the outside world: its inputs (arguments)
and its outputs (return value).
Types in C can be incomplete, i.e. they can exist as types but nothing
is known about them, neither their size nor their bit-layout. They are
useful for encapsulating data into entities that are known only to
certain parts of the program.
Each type can have an associated pointer type: for int we have int
pointer, for double we have double pointer, etc. We can have also
pointers that point to an unspecified object. They are written as void
*, i.e. pointers to void.

The primitive types in lcc-win32 are:
Type Size in lcc-win32 Standard?
bool 1 Available in C99
char (signed/unsigned) 1 yes
short (signed/unsigned) 2 yes
int (signed/unsigned) 4 yes
long (signed/unsigned) 4 yes
long long 8 Available in C99
float 4 yes
double 8 yes
long double 12 Available in C99
complex types 16 May be absent in some

implementations
qfloat 56 Specific to lcc-win32
bignum variable Specific to lcc-win32


--
pete
Nov 14 '05 #2

P: n/a
pete wrote:

jacob navia wrote:

Did I miss something? Types A type is a definition for a sequence of storage bits.
It gives the meaning of the data stored in memory.


I think that's closer to the definition of "object type".

Object types also apply to constant expressions.

--
pete
Nov 14 '05 #3

P: n/a
pete wrote:
jacob navia wrote:

Did I miss something?

Alignment requirements are according to type.


I am not sure that this should be at the beginning of the tutorial. I
mention it when I speak about structures later.
Is there something wrong in there?

C doesn't mandate IEEE floating point.


Right. Will add that.
Nov 14 '05 #4

P: n/a
In article <news:41***********************@news.wanadoo.fr>
jacob navia <ja***@jacob.remcomp.fr> wrote:
Did I miss something?
Is there something wrong in there?
In addition to what others wrote:
long double 12 Available in C99
complex types 16 May be absent in some
implementations


complex types are "available in C99", in three flavors: complex
float, complex double, and complex long double. It sounds like
your "complex"es are only available in the "double" variety
(the size being 16 in lcc-win32 -- one should find 8, 16, and 24
given the sizes of float, double, and long double).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #5

P: n/a
Chris Torek wrote:
In article <news:41***********************@news.wanadoo.fr>
jacob navia <ja***@jacob.remcomp.fr> wrote:
Did I miss something?
Is there something wrong in there?

In addition to what others wrote:

long double 12 Available in C99
complex types 16 May be absent in some
implementations

complex types are "available in C99", in three flavors: complex
float, complex double, and complex long double. It sounds like
your "complex"es are only available in the "double" variety
(the size being 16 in lcc-win32 -- one should find 8, 16, and 24
given the sizes of float, double, and long double).


Yes, in principle, but lcc-win32 implements (as of today) only
one kind of complex numbers.

As far as I understood the standard, complex types are not
mandatory for a conforming implementation and many small C
implementations do not provide them at all.

Now, in a tutorial, I can't say everything at once. I explain this
later when I speak about complex types.
Nov 14 '05 #6

P: n/a
Chris Torek wrote:

I re-read the complex number implementation and I have atypo
in the table. It should be 32, not 16. I implement complexes
only as long double _Complex, and I align them at 16 bytes.

Since this is the highest precision, the results should not be
affected. A problem could arise when reading say float _Complex
from a binary file written by other implementation.

Thanks for your answer.

jacob
Nov 14 '05 #7

P: n/a
jacob navia <ja***@jacob.remcomp.fr> writes:
I would like to add at the beginning of the C tutorial I am writing
a short blurb about what "types" are. I came up with the following text.

Please can you comment?
Did I miss something?
Is there something wrong in there?
--------------------------------------------------------------------


Hi Jacob,

As a tutorial introduction I think what you wrote is pretty good. I'm
not sure what you're assuming about the backgrounds or experience of
your readers, so it's hard to assess whether it's too much or too
little.

There is an important piece that has been glossed over, namely the
distinction between program type and representation type. I wrote the
text below as another tack on how to explain what a type is. On
re-reading your text, I think I'm assuming a greater programming
background than your text was. In spite of that, you may find this
explanation helpful. Enjoy.

================================================== ====================

The word 'type' is used to mean one of two notions, related but
distinct, that have to do with how different kinds of values are
stored or operated on within a computer program.

The first notion has to do with the representation of a variable or
function in program memory. For variables: how many units of memory
does it occupy; what do different bit configurations mean in terms of
the set of values the bits are supposed to represent; which sets of
bits occupy which successive memory units; constraints on which sets
of memory units it can occupy (often called "alignment") - properties
like these define the representation ("type") of a variable or
intermediate value. For functions: what calling conventions need to
be observed when calling the function - where should its parameters
go, where will the return result go, what sort of register saving will
be done; what assumptions are made about the representations for the
arguments, and what representation can be expected for the result -
these properties, and perhaps a few other similar ones, define how a
function is to be represented in program memory (at least, from the
point of view of wanting to call the function, because the only thing
that can be "done" with a function is call it [1]).

The second notion has to do with compile time properties on some
syntactic program elements - declarators, variable names, expressions,
and function definitions are some examples - that specify or constrain
the representations of various run-time program elements - variable
values, intermediate values, or compiled functions. These "types" are
indicated in C by the familiar 'int', 'char', etc, and array, pointer
and function types that are used in C programs.

Historically these two notions were considered to be synonymous and
the word "type" was used for both. Further reflection will show that
they are different. For example, on a machine with eight bit bytes, a
variable of type 'char' may have an eight-bit signed representation,
or it may have an eight-bit unsigned representation. An external
variable declared 'extern int a[];' may be represented by 10 integers
or by 100 integers. A "program type" may also carry more information
than a "representation type": for example, a pointer declared 'const
int *p' will (on most machines) have exactly the same representation
as another pointer declared without the 'const'; or, in the case of
an 'enum' type, the representation will be exactly the same as one of
the integral types, yet the presence of the 'enum' type indicator in
the program text allows more thorough checking (of some programs) at
compile time, even though the (ANSI-standard) compiler itself allows
enum's and int's to be pretty much freely mixed.

It's important that the assumptions about what the correspondence is
between program types and representation types be made consistently
across an implementation. When compiling a program with a compiler
that represents 'int' as 32 bits, it usually will be disastrous to
call a library function that was compiled with a compiler that
represented 'int' with only 16 bits. The program type - int - is the
same in both cases, but the representation type differs between the
two compilers. But, different implementations can and do make
different choices for the mapping of program types to representation
types, sometimes even on the same machine architecture.

Because of these different choices that are made in different
implementations, experienced C developers try to minimize any
dependencies in their programs on the mapping between program types
and representation types. This goal is furthered by use of standardly
defined mechanisms like 'sizeof', 'CHAR_BIT', and so forth, so that
specific representation-type choices will not (insofar as is possible)
influence program behavior.

----------------

[1] In C, functions may also be "operated on" by taking their address
for a pointer-to-function. Depending on specifics of the particular
machine architecture, this requirements for this operation might also
be included in a function's representation.

Nov 14 '05 #8

P: n/a
I think that you have an important point what type *attributes* are
concerned.

In your example of the difference between int and const int, (the same
with volatile int) the volatile/const qualifiers are type attributes.
The same should be said for functions with _stdcall calling convention
for instance. The basic type of
int _stdcall foo(int);
is the same as
int foo(int);
but this two types differ in the attribute _stdcall.

For a beginner, it looks to me easier to understand the concept of
type attributes that exist only at compile time.

Did I understood you correctly?

jacob
Nov 14 '05 #9

P: n/a
jacob navia <ja***@jacob.remcomp.fr> writes:
I think that you have an important point what type *attributes* are
concerned.

In your example of the difference between int and const int, (the same
with volatile int) the volatile/const qualifiers are type attributes.
The same should be said for functions with _stdcall calling convention
for instance. The basic type of
int _stdcall foo(int);
is the same as
int foo(int);
but this two types differ in the attribute _stdcall.

For a beginner, it looks to me easier to understand the concept of
type attributes that exist only at compile time.

Did I understood you correctly?

jacob


It's true that type qualifiers provide the most ready
examples of the program/represenation type distinction,
but not the only ones. Consider: on my platform (gcc
on x86 Linux), both 'int' and 'long' are four byte signed
quantities. The program types are clearly different,
yet the representation types are exactly the same. Also,
in some implementations, the two types

typedef int T1;
typedef struct { int x; } T2;

have exactly the same representation type (again, signed
integers of some length); passing one when the other is
expected wouldn't cause any runtime problems. Yet clearly
these different program types can be distinguished by the
compiler.

The reverse situation - where a single program type
corresponds to multiple representation types - normally
happens in C only for "incomplete [program] types". At
least, I can't think of other cases right off the top of my
head. But certainly it comes up at least in the case
of incomplete types.

So I see the "program type/representation type" distinction
as more fundamental than the distinction of qualified types
vs unqualified types.

How - and how much - one should weave all of these notions
into a tutorial intended for novices - that's not an easy
question to answer. On some machines, for example, pointers
to read-only areas have a different representation than
pointers that can be used for writing (often with a
"write-protected" or "write-enabled" bit); on such machines
the const/non-const distinction _does_ show up in the
representation type. This kind of fine point is almost
certainly too much for the tutorial that I think you're
writing. But I do think that the distinction between
program types and representation types is essential for
people, even novices, to have explained, and to understand
at least at some level.

I hope this explanation helps with your writing.
Nov 14 '05 #10

P: n/a
In <41***********************@news.wanadoo.fr> jacob navia <ja***@jacob.remcomp.fr> writes:
As far as I understood the standard, complex types are not
mandatory for a conforming implementation and many small C
implementations do not provide them at all.


Complex types are optional *only* for freestanding C99 implementations.

For hosted implementations:

11 There are three complex types, designated as float _Complex,
double _Complex, and long double _Complex. The real floating
and complex types are collectively called the floating types.

12 For each floating type there is a corresponding real type,
which is always a real floating type. For real floating types,
it is the same type. For complex types, it is the type given by
deleting the keyword _Complex from the type name.

13 Each complex type has the same representation and alignment
requirements as an array type containing exactly two elements
of the corresponding real type; the first element is equal to
the real part, and the second element to the imaginary part,
of the complex number.

Then again, we know your compiler is not conforming to *any* C standard...

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union
Nov 14 '05 #11

P: n/a
jacob navia wrote:
I would like to add at the beginning of the C tutorial I am writing
a short blurb about what "types" are. I came up with the following text.

Please can you comment?
Did I miss something?
Is there something wrong in there?
--------------------------------------------------------------------
Types
A type is a definition for a sequence of storage bits. It gives the
meaning of the data stored in memory. If we say that the object a is an
int, it means that the bits stored at that location are to be understood
as a natural number that is built by consecutive additions of powers of
two. If we say that the type of a is a double, it means that the bits
are to be understood as the IEEE 754 standard sequences of bits
representing a double precision floating point value. [...]


One problem with this explanation is that it relies
on the idea of "a sequence of storage bits," which would
seem to imply that a type exists only in connection with
memory. However, values have types even if they're not
memory-resident. For example, in `x * 2' the `2' has
type `int' even if the compiler uses "add x,x" or maybe
"shl x,1" to calculate the value, thus expunging all traces
of "two-ness" from the code.

To a beginner, a memory-centric explanation of "type"
may be helpful: it has a comforting solidity in what the
novice may perceive as a sea of abstraction. But I think
the approach has several drawbacks. It's inaccurate (as
shown above), it doesn't cover incomplete types (what's
the "sequence of storage bits" for a `void'?), and it takes
a bit of a stretch to get it to cover function types.

The worst feature of the memory-centric approach may be
that it encourages people to think about the representations
of values rather than about the values themselves. As a
class, C programmers seem all too susceptible to this
temptation (how often have you seen 0xFF referred to as a
negative value?), and anything one can do to *dis*courage
the practice is a blow for Truth, Justice, and the Amer--
er, Standard Way.

The challenge, of course, is to devise an explanation
that is both correct and comprehensible. IMHO, you've gone
for the short-term benefit of easy comprehension at the cost
of the long-term drawback of a mental model that's askew
from the truth of the language.

--
Er*********@sun.com

Nov 14 '05 #12

P: n/a
Eric Sosman wrote:

One problem with this explanation is that it relies
on the idea of "a sequence of storage bits," which would
seem to imply that a type exists only in connection with
memory.
Types are associated with objects, and objects must exist
in memory somewhere.
However, values have types even if they're not
memory-resident. For example, in `x * 2' the `2' has
type `int' even if the compiler uses "add x,x" or maybe
"shl x,1" to calculate the value, thus expunging all traces
of "two-ness" from the code.

A compiler that does constant folding (the general case)
doesn't destroy any types. It eliminates the objects (the
constants) and with the objects, their types disappear too.
I see no contradiction. The two in x*2 is eliminated and
with it its type. But until is eliminated the type exists
as a way of describing the bits in the machine representation
of two.

Note that types in this context are just descriptions of machine
representations, as I said in my proposal.
To a beginner, a memory-centric explanation of "type"
may be helpful: it has a comforting solidity in what the
novice may perceive as a sea of abstraction. But I think
the approach has several drawbacks. It's inaccurate (as
shown above), it doesn't cover incomplete types (what's
the "sequence of storage bits" for a `void'?),
void means "non", i.e. no type, and no corresponding object.
int fn(void)
means that fn has no objects declared as arguments.

and it takes
a bit of a stretch to get it to cover function types.

Why?

I define in my proposal the type of a function as the union
of the type of the return value (output) and arguments (inputs)
of the function.

This is clear and quite evident, at least in the lcc compiler
function types are treated that way.
The worst feature of the memory-centric approach may be
that it encourages people to think about the representations
of values rather than about the values themselves.
Types are descriptions of memory objects. Note that all we can
do in a machine is to abstract from a real number or from any
real world object *some* characteristics and *represent* it
in the machine.

When I write:

typedef struct tagPerson { char *name; int age; } Person;

I mean that I abstract from a real world person that has billions
of different characteristics, genetic code, eye color,
bank account level, number of fingers, etc. I make a machine
"type" where only two characteristics out of the billions
are considered: the name and the age.

As a
class, C programmers seem all too susceptible to this
temptation (how often have you seen 0xFF referred to as a
negative value?), and anything one can do to *dis*courage
the practice is a blow for Truth, Justice, and the Amer--
er, Standard Way.

It depends on the context where 0xff is used. It can be
a constant (255) or understood as negative because the highest
bit is set, or anything else...
The challenge, of course, is to devise an explanation
that is both correct and comprehensible. IMHO, you've gone
for the short-term benefit of easy comprehension at the cost
of the long-term drawback of a mental model that's askew
from the truth of the language.


Well please make a counter-proposal... How would you speak
about values without using some definition of type?
jacob
Nov 14 '05 #13

P: n/a
jacob navia wrote:
Eric Sosman wrote:
One problem with this explanation is that it relies
on the idea of "a sequence of storage bits," which would
seem to imply that a type exists only in connection with
memory.
Types are associated with objects, and objects must exist
in memory somewhere.


No and no, I think. Consider

#include <stdio.h>
int main(void) {
struct nonsuch { double x; int y; };
printf ("sizeof(struct nonsuch) = %d\n",
(int)sizeof(struct nonsuch));
return 0;
}

I make two claims about this program: First, that it
defines and uses the type `struct nonsuch', and second,
that no `struct nonsuch' object exists in memory -- nor
anywhere else, for that matter.
However, values have types even if they're not
memory-resident. For example, in `x * 2' the `2' has
type `int' even if the compiler uses "add x,x" or maybe
"shl x,1" to calculate the value, thus expunging all traces
of "two-ness" from the code.


A compiler that does constant folding (the general case)
doesn't destroy any types. It eliminates the objects (the
constants) and with the objects, their types disappear too.
I see no contradiction. The two in x*2 is eliminated and
with it its type. But until is eliminated the type exists
as a way of describing the bits in the machine representation
of two.


No, `2' doesn't lose its `int'-ness because of whatever
trickery the compiler employs in code generation. To
demonstrate this, let's try another toy program (I've
switched to `double' to make the demonstration clearer):

#include <stdio.h>
int main(void) {
char x;
printf ("%d ?= %d\n", (int)sizeof(x),
(int)sizeof(x*2.0));
return 0;
}

Most machines will print something like "1 ?= 8", showing
that the two `sizeof' operands have different sizes. Why
do they have different sizes? Because they have different
types. Why do they have different types? Because in the
second instance the presence of a `double' operand causes
the expression to have type `double' as well. Note that
the `2.0' need not exist in the generated code at all,
because it's never even evaluated -- yet it has a type
nonetheless, and the influence of that type is seen in
the output.
Note that types in this context are just descriptions of machine
representations, as I said in my proposal.
Yes, you said that. I've given some reasons why I
think it's an unwise claim.
To a beginner, a memory-centric explanation of "type"
may be helpful: it has a comforting solidity in what the
novice may perceive as a sea of abstraction. But I think
the approach has several drawbacks. It's inaccurate (as
shown above), it doesn't cover incomplete types (what's
the "sequence of storage bits" for a `void'?),


void means "non", i.e. no type, and no corresponding object.
int fn(void)
means that fn has no objects declared as arguments.


`void' does *not* mean "no type," not ever. Like some
other C keywords and symbols (c.f. `static'), its meaning
is context-dependent:

- In the particular context you cite, it means "this
function takes no arguments."

- In all other contexts, it means "an incomplete type
that cannot be completed."

In neither case does it mean "no type" or "non."
and it takes
a bit of a stretch to get it to cover function types.


Why?


All right, then, what's the "sequence of storage bits"
that represents the type-generic sqrt() function? What's
the "sequence of storage bits" that is the representation
of a function that's been inlined in five places and been
optimized differently in each of them? Or to go really
purist and ivory-tower on you, where does the C Standard
require that functions occupy memory at all?
[...]
The worst feature of the memory-centric approach may be
that it encourages people to think about the representations
of values rather than about the values themselves.


Types are descriptions of memory objects.


"Are not!"

"Are so!"

"Are not!"

"Are so!"

.... all right, you win. My fingers are getting tired.
[...] As a
class, C programmers seem all too susceptible to this
temptation (how often have you seen 0xFF referred to as a
negative value?), and anything one can do to *dis*courage
the practice is a blow for Truth, Justice, and the Amer--
er, Standard Way.


It depends on the context where 0xff is used. It can be
a constant (255) or understood as negative because the highest
bit is set, or anything else...


If the context is "in C source code," 0xFF is a postive
value of type `int'. Period, end of sentence. Allow me to
suggest that if this isn't clear to you, you're not in a
position to be explaining types to anyone.
The challenge, of course, is to devise an explanation
that is both correct and comprehensible. IMHO, you've gone
for the short-term benefit of easy comprehension at the cost
of the long-term drawback of a mental model that's askew
from the truth of the language.


Well please make a counter-proposal... How would you speak
about values without using some definition of type?


You've turned it around, or maybe I've written less
clearly than I like to think I do. In C, every value has a
type and it would be folly to say otherwise, or to omit the
notion of a value's type when discussing its nature. But
it is *not* the case that every type has a value (c.f. `void')
or that every value of a type must exist in memory (c.f.
the eliminated constants, non-evaluated `sizeof' operands,
and so on). It's the latter that I think you're claiming and
that I beg to differ with.

As for making a counter-proposal -- well, I'm not an
accomplished author of textbooks. Good explanations sail
between Scylla and Charybdis: Perfect correctness may be
perfectly incomprehensible, while a facile exposition may
convey misinformation. That's why I admire and enjoy
K&R, Knuth, and the like: they're correct (usually) and
expressive (usually), and neither the correctness nor the
expression suffers on behalf of the other. It takes
artistry to steer such a boat, to stay clear of both the
rocks and the whirlpool. If you can manage to find the
right course, you'll have made a valuable contribution.

--
Er*********@sun.com

Nov 14 '05 #14

P: n/a
Eric Sosman <er*********@sun.com> writes:
jacob navia wrote:
I would like to add at the beginning of the C tutorial I am writing
a short blurb about what "types" are. I came up with the following text.

Please can you comment?
Did I miss something?
Is there something wrong in there?
--------------------------------------------------------------------
Types
A type is a definition for a sequence of storage bits. It gives the
meaning of the data stored in memory. If we say that the object a is an
int, it means that the bits stored at that location are to be understood
as a natural number that is built by consecutive additions of powers of
two. If we say that the type of a is a double, it means that the bits
are to be understood as the IEEE 754 standard sequences of bits
representing a double precision floating point value. [...]
One problem with this explanation is that it relies
on the idea of "a sequence of storage bits," which would
seem to imply that a type exists only in connection with
memory. However, values have types even if they're not
memory-resident. For example, in `x * 2' the `2' has
type `int' even if the compiler uses "add x,x" or maybe
"shl x,1" to calculate the value, thus expunging all traces
of "two-ness" from the code.


Syntactic elements have [program] types. It's the syntactic element
'2' that has type int, not some runtime value 2, which as you point
out may not exist. Values and other runtime entities have a
representation type impressed on them by the code accessing the value
or entity at that particular time. For linguistic convenience we
sometimes say "this object is of type int" but what's meant is that
the memory is being interpreted as having the representation type that
corresponds to the program type int for that implementation.

In case it isn't clear, I'm basically agreeing with what you're
saying, just trying to give more precise language to say it.

To a beginner, a memory-centric explanation of "type"
may be helpful: it has a comforting solidity in what the
novice may perceive as a sea of abstraction. But I think
the approach has several drawbacks. It's inaccurate (as
shown above), it doesn't cover incomplete types (what's
the "sequence of storage bits" for a `void'?), and it takes
a bit of a stretch to get it to cover function types.

The worst feature of the memory-centric approach may be
that it encourages people to think about the representations
of values rather than about the values themselves. As a
class, C programmers seem all too susceptible to this
temptation (how often have you seen 0xFF referred to as a
negative value?), and anything one can do to *dis*courage
the practice is a blow for Truth, Justice, and the Amer--
er, Standard Way.
It's important to understand both program types and representation
types. The problems mentioned above, and lots of others having
to do with the differences between different implemenations,
can be discussed and explained using the relationship between
program types and representation types. Incidentally, note
that program types are defined by the C standard, but representation
types are determined (sometimes with standard-imposed constraints)
by the machine architecture and compiler at hand.

The challenge, of course, is to devise an explanation
that is both correct and comprehensible.


Agreed. I'd be interested to see what suggestions some of
the regulars would write on this topic.
Nov 14 '05 #15

P: n/a
jacob navia <ja***@jacob.remcomp.fr> writes:
Eric Sosman wrote:
>
One problem with this explanation is that it relies
on the idea of "a sequence of storage bits," which would
seem to imply that a type exists only in connection with
memory.
Types are associated with objects, and objects must exist
in memory somewhere.


An object, or a function, is treated as though it has a certain
representation type at any particular point in (execution) time.
But the memory exists independently of the representation type
that is used to access it at any particular point. Also values
can come into existence that prior to coming into existence
weren't stored in any object - consider

int a[10];
* (char *) &a[3] = 0;

The address value &a[3] springs into existence having an address
representation type, but very likely wasn't previously stored in any
object.

However, values have types even if they're not
memory-resident. For example, in `x * 2' the `2' has
type `int' even if the compiler uses "add x,x" or maybe
"shl x,1" to calculate the value, thus expunging all traces
of "two-ness" from the code.


A compiler that does constant folding (the general case)
doesn't destroy any types. It eliminates the objects (the
constants) and with the objects, their types disappear too.
I see no contradiction. The two in x*2 is eliminated and
with it its type. But until is eliminated the type exists
as a way of describing the bits in the machine representation
of two.


Objects are run-time entities; they don't exist during the
compilation process. Instead, the compiler manipulates syntactic and
tree elements that correspond to objects (and functions!) that might,
or might not, exist at run-time. Perhaps a small distinction, but a
significant one. Precise language is important.

Note that types in this context are just descriptions of machine
representations, as I said in my proposal.


There are program types and representation types, and it is
important to understand both. So, it is important to explain
both.

To a beginner, a memory-centric explanation of "type"
may be helpful: it has a comforting solidity in what the
novice may perceive as a sea of abstraction. But I think
the approach has several drawbacks. It's inaccurate (as
shown above), it doesn't cover incomplete types (what's
the "sequence of storage bits" for a `void'?),


void means "non", i.e. no type, and no corresponding object.
int fn(void)
means that fn has no objects declared as arguments.


The type 'void' is not "no type", but an incomplete [program] type
that can't be completed. It's right that there is no representation
type corresponding to program type 'void'; but, the type 'void' is
still a [program] type.

The worst feature of the memory-centric approach may be
that it encourages people to think about the representations
of values rather than about the values themselves.


Types are descriptions of memory objects. Note that all we can
do in a machine is to abstract from a real number or from any
real world object *some* characteristics and *represent* it
in the machine.


Program types are (sometimes somewhat abstract) specifications for how
various syntactic entities can be combined. On a particular
implementation, they also specify a mapping to a corresponding
representation type (or perhaps several representation types).

Nov 14 '05 #16

P: n/a
Tim Rentsch wrote:
It's important to understand both program types and representation
types.


Does Dennis M. Ritchie understand the difference?
Aren't program types and representation types,
something that you just made up?

--
pete
Nov 14 '05 #17

P: n/a
On Tue, 05 Oct 2004 10:58:41 GMT, pete <pf*****@mindspring.com> wrote:
Tim Rentsch wrote:
It's important to understand both program types and representation
types.
Does Dennis M. Ritchie understand the difference?


Yes.
Aren't program types and representation types,
something that you just made up?


No.

HTH. HAND.

Nov 14 '05 #18

P: n/a
Tim Rentsch wrote:
Syntactic elements have [program] types. It's the syntactic element
'2' that has type int, not some runtime value 2, which as you point
out may not exist. Values and other runtime entities have a
representation type impressed on them by the code accessing the value
or entity at that particular time. For linguistic convenience we
sometimes say "this object is of type int" but what's meant is that
the memory is being interpreted as having the representation type that
corresponds to the program type int for that implementation.

If I understand you correctly, you make a difference between program
types that correspond to the abstract types defined in the C standard,
and representation types that are the product of applying those abstract
types to a specific machine architecture.

You agree that types are definitions of how to interpret a sequence of
bits in memory:

You wrote: For linguistic convenience we
sometimes say "this object is of type int" but what's meant is that
the memory is being interpreted as having the representation type that
corresponds to the program type int for that implementation.


Using your terminology, my type definition would correspond to the
representation types.

1) Abstract type: int, as defined by the C standard.

2) Concrete representation type: int as a sequence of 32 bits as
implemented by the lcc-win32 compiler.

The standard constraints the possible representations of int (it should
have at least 16 bits for instance), and lcc-win32 implements that
abstract type by choosing a machine word length for the "int" concrete
representation.

Did I understood you correctly?

Thanks for your contribution, you make an intersting point here.

I think a similar wording as above could be very well within the reach
of a beginner.
Nov 14 '05 #19

P: n/a
pete <pf*****@mindspring.com> writes:
Tim Rentsch wrote:
It's important to understand both program types and representation
types.


Does Dennis M. Ritchie understand the difference?
Aren't program types and representation types,
something that you just made up?


I'm confident he understands the two ideas. I expect the names would
also make sense to him, perhaps without previous explanation, but
almost certainly after getting the descriptions posted recently in the
NG.

I admit to having chosen these particular names myself; the
concepts though have a much longer pedigree, appearing in the
literature starting in the early 1980's.

To be clear, the ideas are what's important (IMO) to understand. I
think the names are fairly good and reasonably evocative (if I do say
so myself), but the two distinct ideas are the important thing.
Nov 14 '05 #20

P: n/a
jacob navia <ja***@jacob.remcomp.fr> writes:
Tim Rentsch wrote:
Syntactic elements have [program] types. It's the syntactic element
'2' that has type int, not some runtime value 2, which as you point
out may not exist. Values and other runtime entities have a
representation type impressed on them by the code accessing the value
or entity at that particular time. For linguistic convenience we
sometimes say "this object is of type int" but what's meant is that
the memory is being interpreted as having the representation type that
corresponds to the program type int for that implementation.

If I understand you correctly, you make a difference between program
types that correspond to the abstract types defined in the C standard,
and representation types that are the product of applying those abstract
types to a specific machine architecture.


Yes, with two minor corrections. The term "abstract type" is generally
used to mean a different concept; any of the terms "program type", "C type",
or "standard C type" would be more appropriate. Also, representation types
exist independently; they aren't the "product of applying" the C types.
A more accurate way of saying it would be that, on a particular machine,
a compiler chooses a representation type that will correspond to each
C program type. There are also representation types that don't correspond
to any C program type (although most of these won't ever be used by
the compiler).

You agree that types are definitions of how to interpret a sequence of
bits in memory:
You left out "representation" in that sentence. Representation types
do determine how to interpret memory values, in the case of data
objects; it's a little dangerous to call memory values "a sequence
of bits", since part of the representation type would determine
in what order the memory units are processed. And, functions also
have a representation type, but in the case of functions the representation
type determines how they should be called, not how the memory storing
the function object code will be interpreted.
You wrote:
> For linguistic convenience we
> sometimes say "this object is of type int" but what's meant is that
> the memory is being interpreted as having the representation type that
> corresponds to the program type int for that implementation.
Using your terminology, my type definition would correspond to the
representation types.


I think that's right. That's a little bit dangerous, since it
encourages people to think in terms of the representations, which
will change from machine to machine, or from compiler to compiler.
Or even, in some cases, from compilation to compilation. That's why
it's also important to explain program types.
1) Abstract type: int, as defined by the C standard.
Again, the term "abstract type" shouldn't be used, since it's
standardly used in the literature to mean something very different.

2) Concrete representation type: int as a sequence of 32 bits as
implemented by the lcc-win32 compiler.
Similarly, the word "concrete" here should not be used, since it
means something different in the literature. The type 'int' is
a concrete type, as the term is standardly used. You might try
"architecture/compiler specific representation".

The standard constraints the possible representations of int (it should
have at least 16 bits for instance), and lcc-win32 implements that
abstract type by choosing a machine word length for the "int" concrete
representation.

Did I understood you correctly?
I believe so (reiterating my comments about not using "abstract" and
"concrete" for these purposes).

Thanks for your contribution, you make an intersting point here.

I think a similar wording as above could be very well within the reach
of a beginner.


You are most welcome. I think so too.
Nov 14 '05 #21

P: n/a
Tim Rentsch <tx*@alumnus.caltech.edu> writes:
pete <pf*****@mindspring.com> writes:
Tim Rentsch wrote:
> It's important to understand both program types and representation
> types.


Does Dennis M. Ritchie understand the difference?
Aren't program types and representation types,
something that you just made up?


I'm confident he understands the two ideas. I expect the names would
also make sense to him, perhaps without previous explanation, but
almost certainly after getting the descriptions posted recently in the
NG.

I admit to having chosen these particular names myself; the
concepts though have a much longer pedigree, appearing in the
literature starting in the early 1980's.

To be clear, the ideas are what's important (IMO) to understand. I
think the names are fairly good and reasonably evocative (if I do say
so myself), but the two distinct ideas are the important thing.


Personally, I find the term "representation type" confusing. It's
certainly important to distinguish between a "type" (as used in the
standard and in the abstract machine) and "representation" (as spplied
to the hardware), but I don't find it useful to apply the term "type"
to the latter. FWIW, YMMV.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #22

P: n/a
Keith Thompson <ks***@mib.org> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> writes:
pete <pf*****@mindspring.com> writes:
Tim Rentsch wrote:

> It's important to understand both program types and representation
> types.

Does Dennis M. Ritchie understand the difference?
Aren't program types and representation types,
something that you just made up?


I'm confident he understands the two ideas. I expect the names would
also make sense to him, perhaps without previous explanation, but
almost certainly after getting the descriptions posted recently in the
NG.

I admit to having chosen these particular names myself; the
concepts though have a much longer pedigree, appearing in the
literature starting in the early 1980's.

To be clear, the ideas are what's important (IMO) to understand. I
think the names are fairly good and reasonably evocative (if I do say
so myself), but the two distinct ideas are the important thing.


Personally, I find the term "representation type" confusing. It's
certainly important to distinguish between a "type" (as used in the
standard and in the abstract machine) and "representation" (as spplied
to the hardware), but I don't find it useful to apply the term "type"
to the latter. FWIW, YMMV.


I chose the term "representation type" because historically the word
"type" was used to mean both concepts, and I thought it might help
people who are used to thinking the two concepts are synonymous.
I agree though that a different term might be better. Anyone
have any suggestions?

Nov 14 '05 #23

P: n/a
Tim Rentsch <tx*@alumnus.caltech.edu> writes:
Keith Thompson <ks***@mib.org> writes:

[...]
Personally, I find the term "representation type" confusing. It's
certainly important to distinguish between a "type" (as used in the
standard and in the abstract machine) and "representation" (as spplied
to the hardware), but I don't find it useful to apply the term "type"
to the latter. FWIW, YMMV.


I chose the term "representation type" because historically the word
"type" was used to mean both concepts, and I thought it might help
people who are used to thinking the two concepts are synonymous.
I agree though that a different term might be better. Anyone
have any suggestions?


What about just "representation"?

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #24

P: n/a
Tim Rentsch wrote:
So I see the "program type/representation type" distinction
as more fundamental than the distinction of qualified types
vs unqualified types.


You are right. I think the best example are enumeration types
that only exist at run time, and where the "underlying" type
is int. The types are different but the underlying representation
is the same.

The same holds for
int a;
and
struct { int a;} a;

Two types with exactly the same representation.

It will be a challenge to explain all this subtetlies
in a tutorial, but somehow I find it necessary.

I think it can be done by proposing a rough approach
at the beginning, and refining it later.

But I am against writing yet another C for dummies. I
want to try to explain the complexity of C without
making believe people that they are learning yet another
version of BASIC or other beginner's language.

C programmers have to know more details of the actual
hardware because C allows you to use directly that
hardware.

Hardware is represented in C as a sequential space of
addreses, where are stored the data and the preprogrammed
(compiled) instructions sequences, that act with those
data and maybe further inputs.

Building types and procedures is called "programming".

A type can be several things, starting with the simple
ones, passive data. This data is stored in integers,
the only thing that machines understand: bits.

A *type* for this kind of data means a coded algorithm
description for the usage of the data. Text is stored
using an alphabet, the most common alphabet being the
"ASCII" format. An alphabet is a common convention for
writing letters as integers. We say that 'A' will be 65
and be done with it. Other alphabets can (and are) used
of course, C is not tied to ASCII.

The type char means then, that the data stored in
consecutive addresses is to be understood as a sentence
like:
"Please enter the amount"
that should be shown at default centered coordinates
with some button underneath and an edit field.

Integers can be used to store text, or colors if you
like: you make an alphabet and assign integers to the whole
color spectrum. The integers are interpreted by the screen
hardware as colors to be shown. Images can be stored as
integers that encode intensity levels and direct the
hardware to display the image.

Integers can store audio, as the many files floating
around will prove... Bach, Ravel, and many others can be
encoded in integers that represent a waveform at an
implicit frequency.

Integers can be used to encode other integers, so you
get formats like mp3 that take the voluminous sequence
of integers produced by the sampler and spit integers
again, but much less.

The type of the integers changes. It will be of no use
to the mp3 decoder to try to understand a photograph as
a song. Or maybe, we should hear what comes out of it
who knows...

In any case the type of the mp3 data is a song, not
a photograph, and if you display it as text (you can
do that one day to "see" a song) it will not be
meaningful either, the type is wrong.

All machines handle basically nothing else but bits.
To find our way we define types of data, i.e. we
ascribe a specific meaning to each bit of what we are
processing: a song, a photograph, some text, a number,
whatever.

Integers aren't all, as everyone knows, 0.5 exists,
and it is not an integer. Well, that doesn't hold.

We can approximate real numbers by using two integers,
the mantissa and the exponent, and we can figure out
clever ways of adding those integer pairs (or floating
point numbers, see later in this tutorial).

Integers can be arbitrarily big, with today's hard disk
capacity storing an integer of 200 GB is possible.

Smaller integers can be handled with a lot less problems
however, and for most applications, double precision is
already quite good.

Note that we "approximate" real numbers, never really
touching them. There is a quantization loss in the encoding,
and a whole seri'es of problems implicit in the digital
nature of the encoding.

Some integers of 1 units are called "chars" and they are
used to encode the alphabet, making the machine store text.

In C you can at any moment change your mind and start
interpreting the same bits in another way. You make a
cast operation, i.e. you apply to some address a new
type.

Several simple types can be combined into aggregates,
i.e. a related bunch of data like integers, character
strings, real numbers, etc. This aggregates can have
relationships between them, represented by pointers to
other aggregates. This are composite types (structures
or unions)

The data is handled in procedures, i.e. a sequences of
instructions that receive some inputs, and produce some
output or modification of the program state. The type
of a procedure is strictly defined by the type of its
inputs and the type of the output (return value).

There are yet another kind of types, where you make
a distinction between two objects that have the same
representation. You can build, for instance, an "enumerated"
type, that is in fact an integer, but encodes special
meaning.

There are even types that you can't figure out at all:
opaque types. This type is, for instance:

struct unknown *bn;

and struct "unknown" is nowhere defined. Or even worst:

void *bn;

A pointer that points to an unknown object.
You can do only one thing with this pointer:
pass it around.

Usually you receive an opaque pointer from a library
that wants to hide the details of how they do their
stuff from you. This is good for you, since you are
using the library precisely because you do not want to
know a lot about it and just use it.

This allows the library writers too, to change all their
internal stuff without needing a change in all the
customer base. Opaque types are like firewalls. They limit the
growth of the interdependency between the several
parts that make a whole program.
Well it will be something around this lines. Thanks for
the feedback.
Nov 14 '05 #25

P: n/a
jacob navia wrote:
You are right. I think the best example are enumeration types
that only exist at run time, and where the "underlying" type
is int. The types are different but the underlying representation
is the same.


Should be obviously compile time, not run-time what a blunder.
Enumerations exist only at compile time. At run time the
underlying type is used. The circuit doesn't have any
idea of enumerations.

I pressed the send button too soon.
Nov 14 '05 #26

P: n/a
Keith Thompson <ks***@mib.org> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> writes:
Keith Thompson <ks***@mib.org> writes:

[...]
Personally, I find the term "representation type" confusing. It's
certainly important to distinguish between a "type" (as used in the
standard and in the abstract machine) and "representation" (as spplied
to the hardware), but I don't find it useful to apply the term "type"
to the latter. FWIW, YMMV.


I chose the term "representation type" because historically the word
"type" was used to mean both concepts, and I thought it might help
people who are used to thinking the two concepts are synonymous.
I agree though that a different term might be better. Anyone
have any suggestions?


What about just "representation"?


If we use the term "representation" by itself, there's an ambiguity
about whether the notion under discussion is generic or specific.
Individual values have a 'representation'; storage that holds any of
a set of values has (or is accessed using) a 'representation type'.
("What's the representation for a NULL pointer?") Also, using
"representation" by itself doesn't work very well for functions; we
aren't interested in the particular bits that make up a function's
object code, but we are interested in what calling conventions are
necessary to call it. An unadorned "representation" tends to evoke
bit patterns more than it does calling sequences.

Certainly there is some precedent for using "representation" in these
kinds of discussions - "two's complement representation", for example.
For informal discussions it's probably fine. For more precise
descriptions, however, talking about the relationship between "types"
at compile-time and at run-time, "representation type" seems more
both more accurate and more evocative.

[Still good to have gotten the suggestion - thank you.]

Other ideas? Surely there must be some...
Nov 14 '05 #27

P: n/a
jacob navia <ja***@jacob.remcomp.fr> writes:
Tim Rentsch wrote:
So I see the "program type/representation type" distinction
as more fundamental than the distinction of qualified types
vs unqualified types.
You are right. I think the best example are enumeration types
that only exist at run time, and where the "underlying" type
is int. The types are different but the underlying representation
is the same.


(As you acknowledged later, that should be "only exist at compilation
time".)

This makes it sound like an enumeration type and type "int" are
fundamentally different things, that an enumeration type exists at
compilation time, but type "int" exists at run time. This is
misleading. The type "int", like an enumeration type, exists only at
compilation time. Rather than saying that the underlying type of an
enumeration type is int, it's more accurate to say that the
enumeration type and int have the same representation. (That
representation might be a machine word, for example.)

And, of course, the representation of an enumeration type may or may
not be the same as the representation of type int; it's up to the
implementation to choose an underlying representation that can hold
all the specified values.

As a rule of thumb, anything having to do with the C language exists
only in your source program or at compilation time, not at run time.
(That's not completely true, since a lot of the names overlap.)

In my opinion, it's best to use the term "type" only for things that
are types in C, not for entities like machine words that exist at run
time.

[...]
Hardware is represented in C as a sequential space of
addreses, where are stored the data and the preprogrammed
(compiled) instructions sequences, that act with those
data and maybe further inputs.
That assumes a particular runtime model, one not required by the C
standard. There isn't necessarily a single sequential address space.
Data and code could be in separate address spaces; for that matter,
each object (declared or created by malloc()) could be in a distinct
address space. The existence of pointer arithmetic implies a
sequential address space within a single object, but not across
objects. A function address could be anything that allows the
function to be called; it could easily be an index into a system table
rather than a machine-level address.

[...]
In C you can at any moment change your mind and start
interpreting the same bits in another way. You make a
cast operation, i.e. you apply to some address a new
type.
A cast operator specifies a type conversion. Not all such conversions
are simple reinterpretations of the bits. Conversions between integer
and floating-point types almost certainly do more than just copying
the bits; conversions between pointer types may or may not do so.
What you're talking about is type punning; converting addresses is one
of several ways to achieve that.

[...]
There are even types that you can't figure out at all:
opaque types. This type is, for instance:

struct unknown *bn;

and struct "unknown" is nowhere defined. Or even worst:

void *bn;

A pointer that points to an unknown object.
You can do only one thing with this pointer:
pass it around.


These are called incomplete types; it's best to keep your terminology
consistent with standard usage.

Incomplete types and opaque types are two different things, and an
opaque type needn't be an incomplete type. For example, the type FILE
in <stdio.h> is opaque as far as the programmer is concerned (the
standard says nothing about its contents), but I can see what's in it
(in many implementations) by viewing the appropriate header file.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #28

P: n/a
Tim Rentsch <tx*@alumnus.caltech.edu> writes:
Keith Thompson <ks***@mib.org> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> writes:
> Keith Thompson <ks***@mib.org> writes: [...]
>> Personally, I find the term "representation type" confusing. It's
>> certainly important to distinguish between a "type" (as used in the
>> standard and in the abstract machine) and "representation" (as spplied
>> to the hardware), but I don't find it useful to apply the term "type"
>> to the latter. FWIW, YMMV.
>
> I chose the term "representation type" because historically the word
> "type" was used to mean both concepts, and I thought it might help
> people who are used to thinking the two concepts are synonymous.
> I agree though that a different term might be better. Anyone
> have any suggestions?


What about just "representation"?


If we use the term "representation" by itself, there's an ambiguity
about whether the notion under discussion is generic or specific.
Individual values have a 'representation'; storage that holds any of
a set of values has (or is accessed using) a 'representation type'.
("What's the representation for a NULL pointer?")


In my opinion, what you're calling a "representation type" isn't a
type at all. You can talk about the representation of a type (32-bit
two's-complement), or the representation of a particular value
(hexadecimal DEADBEEF); as long as you're careful, I don't see much of
a problem using the same word for both. If you want to distinguish,
you might consider using a term like "type representation".
Also, using
"representation" by itself doesn't work very well for functions; we
aren't interested in the particular bits that make up a function's
object code, but we are interested in what calling conventions are
necessary to call it. An unadorned "representation" tends to evoke
bit patterns more than it does calling sequences.


I wouldn't talk about the "representation" of a function at all,
either of its object code or of its calling convention. Data items
have representations.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #29

P: n/a
Keith Thompson <ks***@mib.org> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> writes:
Keith Thompson <ks***@mib.org> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> writes:
> Keith Thompson <ks***@mib.org> writes:
[...]
>> Personally, I find the term "representation type" confusing. It's
>> certainly important to distinguish between a "type" (as used in the
>> standard and in the abstract machine) and "representation" (as spplied
>> to the hardware), but I don't find it useful to apply the term "type"
>> to the latter. FWIW, YMMV.
>
> I chose the term "representation type" because historically the word
> "type" was used to mean both concepts, and I thought it might help
> people who are used to thinking the two concepts are synonymous.
> I agree though that a different term might be better. Anyone
> have any suggestions?

What about just "representation"?
If we use the term "representation" by itself, there's an ambiguity
about whether the notion under discussion is generic or specific.
Individual values have a 'representation'; storage that holds any of
a set of values has (or is accessed using) a 'representation type'.
("What's the representation for a NULL pointer?")


In my opinion, what you're calling a "representation type" isn't a
type at all.


Wouldn't you say pointers that store just an address and pointers that
store an address and a length are different types of pointers?
Wouldn't you say that a 'cdecl' function and a 'stdcall' function are
different types of functions? Wouldn't you say a number stored in
host order and a number stored in network order are different types of
numbers (even if on the host in question values in the two orderings
always had the same representations)? It makes just as much sense to
say that there are different types of representations as it does to
say that there are different types of variables.

You can talk about the representation of a type (32-bit
two's-complement), or the representation of a particular value
(hexadecimal DEADBEEF); as long as you're careful, I don't see much of
a problem using the same word for both.
It's applicable to one but not the other. The word representation
means "likeness or image"; unless there is something stored somewhere
in the running program, such a byte with a '4' in it and a rule like
"'4' means int", the phrase "the representation of a type" is a misuse
of language. That's a pretty good indicator that this path isn't
the right one to go down.

If you want to distinguish,
you might consider using a term like "type representation".


I don't mean "the representation of a type"; what I mean is "the type
of representation". It seems like "representation type" is a better
term for that.

Suggestions from other quarters have been "machine type",
"representational type", "implementation type", and "representation
schema". Are any of those less confusing or less misleading
than "representation type"?

Also, using
"representation" by itself doesn't work very well for functions; we
aren't interested in the particular bits that make up a function's
object code, but we are interested in what calling conventions are
necessary to call it. An unadorned "representation" tends to evoke
bit patterns more than it does calling sequences.


I wouldn't talk about the "representation" of a function at all,
either of its object code or of its calling convention. Data items
have representations.


That you wouldn't use the same word for functions is a good indication
that it's not really the right term here. Both "data values" and
"function values" have differing patterns of implementation. What
we're trying to find is a term that captures and expresses the idea of
an "implementational pattern" - the same term should apply equally to
differences in function implementation as it does to differences in
data implementation.
Nov 14 '05 #30

P: n/a
Tim Rentsch <tx*@alumnus.caltech.edu> writes:
Keith Thompson <ks***@mib.org> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> writes:
> Keith Thompson <ks***@mib.org> writes: [...]
In my opinion, what you're calling a "representation type" isn't a
type at all.
Wouldn't you say pointers that store just an address and pointers that
store an address and a length are different types of pointers?
Wouldn't you say that a 'cdecl' function and a 'stdcall' function are
different types of functions? Wouldn't you say a number stored in
host order and a number stored in network order are different types of
numbers (even if on the host in question values in the two orderings
always had the same representations)? It makes just as much sense to
say that there are different types of representations as it does to
say that there are different types of variables.


I wouldn't call those things "types" at all. A type is something that
exists in your program. Using the term "type", in a very similar
context, to refer to a different concept can only cause confusion.

On an implementation where int and long have the same representation
(say, both are 32-bit two's-complement), they are distinct types. A
pointer to int and a pointer to long are distinct types. I might call
pointers that store just an address and pointers that store an address
and a length different *kinds* of pointers.
You can talk about the representation of a type (32-bit
two's-complement), or the representation of a particular value
(hexadecimal DEADBEEF); as long as you're careful, I don't see much of
a problem using the same word for both.


It's applicable to one but not the other. The word representation
means "likeness or image"; unless there is something stored somewhere
in the running program, such a byte with a '4' in it and a rule like
"'4' means int", the phrase "the representation of a type" is a misuse
of language. That's a pretty good indicator that this path isn't
the right one to go down.


I disagree; it's perfectly appropriate to refer to the representation
of a type. The standard does this.

[...] I don't mean "the representation of a type"; what I mean is "the type
of representation". It seems like "representation type" is a better
term for that.

Suggestions from other quarters have been "machine type",
"representational type", "implementation type", and "representation
schema". Are any of those less confusing or less misleading
than "representation type"?


Just about anything that doesn't use the word "type" for something
that isn't a C type would be better than "representation type".
> Also, using
> "representation" by itself doesn't work very well for functions; we
> aren't interested in the particular bits that make up a function's
> object code, but we are interested in what calling conventions are
> necessary to call it. An unadorned "representation" tends to evoke
> bit patterns more than it does calling sequences.


I wouldn't talk about the "representation" of a function at all,
either of its object code or of its calling convention. Data items
have representations.


That you wouldn't use the same word for functions is a good indication
that it's not really the right term here. Both "data values" and
"function values" have differing patterns of implementation. What
we're trying to find is a term that captures and expresses the idea of
an "implementational pattern" - the same term should apply equally to
differences in function implementation as it does to differences in
data implementation.


That I woudn't use the same word for functions indicates that
functions and data items are very different things. On the C level,
the word "type" applies to both. On the implementation level, there
is no term that applies to both (except perhaps something vague like
"entity").

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #31

P: n/a
Keith Thompson wrote:
jacob navia <ja***@jacob.remcomp.fr> writes:
Tim Rentsch wrote:
So I see the "program type/representation type" distinction
as more fundamental than the distinction of qualified types
vs unqualified types.
You are right. I think the best example are enumeration types
that only exist at run time, and where the "underlying" type
is int. The types are different but the underlying representation
is the same.

(As you acknowledged later, that should be "only exist at compilation
time".)

This makes it sound like an enumeration type and type "int" are
fundamentally different things, that an enumeration type exists at
compilation time, but type "int" exists at run time. This is
misleading. The type "int", like an enumeration type, exists only at
compilation time.


Surely not. A glance at the instruction set of a common circuit
for instance (x86) will reveal that the circuit supports integer
operations in hardware. The type int is perfectly supported by
almost all CPUs around.

Types (and type associated concepts) exist in the hardware itself.

C allows for a clear mapping of those hardware types into language
types, but it is obvious that in all processors the type int is
supported...

Floating point can exist at run time too, obviously, and many
CPUs support the type double, float and long double.

These types exist in hardware.

Rather than saying that the underlying type of an
enumeration type is int, it's more accurate to say that the
enumeration type and int have the same representation. (That
representation might be a machine word, for example.)

Enumerated types can be used to encode sets for instance. But this
concepts are not in any instruction generated by the compiler.

The programmer writes:

if (current.flags & (digit|letter)) {
}
And the machine understands:
if (current.flags & 36)
And, of course, the representation of an enumeration type may or may
not be the same as the representation of type int; it's up to the
implementation to choose an underlying representation that can hold
all the specified values.

It would be surprising if it would be floating point however ...

As a rule of thumb, anything having to do with the C language exists
only in your source program or at compilation time, not at run time.
(That's not completely true, since a lot of the names overlap.)

The task of the compiler is just to translate the instructions written
by the programmer as faithfully as possible into machine instructions
that do exactly what was written.

Most types (fortunately) *exist* at run time in a quite real manner.
C exists at run time sorry. For instance when you specify:

float d = (float) a;
and a was a double, the machine will shed precision by writing the
number into memory as a float and re-reading it.

The whole edifice of a programming language is the faithful copy
of program concepts into run-time objects.

In my opinion, it's best to use the term "type" only for things that
are types in C, not for entities like machine words that exist at run
time.

I have to disagree here. The machine should follow exactly the type
description specified in the source program.
[...]

Hardware is represented in C as a sequential space of
addreses, where are stored the data and the preprogrammed
(compiled) instructions sequences, that act with those
data and maybe further inputs.

That assumes a particular runtime model, one not required by the C
standard. There isn't necessarily a single sequential address space.


Some times. I remember the segmented model, and there are many CPUs that
have disjoint data/program areas, and even disjoint data areas of
different types (EPROM, RAM, disks, etc) Within this segments of
memory addresses, a linear sequence exists. When I get a 120GB disk
I can write from byte zero to byte 120GB - formatting overhead.
When I get a 128MB memory disk in RAM/USB the sequence is linear
again.

You yourself are written in a linear sequence of base-pairs, written
atom after atom in your DNA.
Data and code could be in separate address spaces; for that matter,
each object (declared or created by malloc()) could be in a distinct
address space. The existence of pointer arithmetic implies a
sequential address space within a single object, but not across
objects.
What is an object?
Is a character in a character string an object? Can we
imagine a character string where each character resides
in a different address space?

Weird isn't it? It wouldn't be handy.
A function address could be anything that allows the
function to be called; it could easily be an index into a system table
rather than a machine-level address.
In C
(FnTable[index])(arg1,arg2)
is different from
fnptr(arg1,arg2)

A function expression must resolve to a machine address. This way
it can be passed around as an integer very efficiently. Most of
the power of C comes from this facility, functions as simple integers.

An efficient way of passing a *lot* of context.

[...]

In C you can at any moment change your mind and start
interpreting the same bits in another way. You make a
cast operation, i.e. you apply to some address a new
type.

A cast operator specifies a type conversion. Not all such conversions
are simple reinterpretations of the bits. Conversions between integer
and floating-point types almost certainly do more than just copying
the bits; conversions between pointer types may or may not do so.
What you're talking about is type punning; converting addresses is one
of several ways to achieve that.


True, I was speaking about re-interpreting because I wanted to emphasize
that memory is interpreted by the program. With this I am introducing
the discussion about strongly typed/weakly typed languages, that I hope
to come later on. This facility of re-interpreting memory is absent or
much more difficult in several other languages. As everything this can
be handy if well used, or a nightmare if abused.
[...]

There are even types that you can't figure out at all:
opaque types. This type is, for instance:

struct unknown *bn;

and struct "unknown" is nowhere defined. Or even worst:

void *bn;

A pointer that points to an unknown object.
You can do only one thing with this pointer:
pass it around.

These are called incomplete types; it's best to keep your terminology
consistent with standard usage.


struct unknown * is incomplete, void * not. To a beginner, an
expression like void * must be utterly strange. I will explain
this more later on.
Incomplete types and opaque types are two different things, and an
opaque type needn't be an incomplete type. For example, the type FILE
in <stdio.h> is opaque as far as the programmer is concerned (the
standard says nothing about its contents), but I can see what's in it
(in many implementations) by viewing the appropriate header file.


If you can see the contents, then its not opaque. Of course, any
non-opaque structure can be converted in an opaque one if you refuse
to look into it, but this would be playing with words. Normally, since
it is not specified in the standard it is better not to mess with it,
I agree with that, but with real opaque structures like void * that
is no longer possible. You can't use them, they enforce themselves
by definition.

Thanks for your feedback.

Nov 14 '05 #32

P: n/a
I think you make a too sharp separation between the language
and the concrete run time. I have a different opinion, but I
answered you two threads above this one.

In a few words again:

The crux of the matter is that the language is implemented at
run time by the compiler, that translates the types specified
in the program into run time types. The run time types exist
and they are the types specified in the program text.

C programs exist at run time with all their type machinery
active and running as specified.
Nov 14 '05 #33

P: n/a
jacob navia <ja***@jacob.remcomp.fr> writes:
Keith Thompson wrote:
jacob navia <ja***@jacob.remcomp.fr> writes: [...]
This makes it sound like an enumeration type and type "int" are
fundamentally different things, that an enumeration type exists at
compilation time, but type "int" exists at run time. This is
misleading. The type "int", like an enumeration type, exists only at
compilation time.
Surely not. A glance at the instruction set of a common circuit
for instance (x86) will reveal that the circuit supports integer
operations in hardware. The type int is perfectly supported by
almost all CPUs around.

Types (and type associated concepts) exist in the hardware itself.

C allows for a clear mapping of those hardware types into language
types, but it is obvious that in all processors the type int is
supported...


You probably won't find the term "int" in a CPU reference manual.
"int" is a C type, not a machine-level concept. The corresponding
CPU-level concept is probably something like a "word".

Yes, of course the CPU supports integer operations (note the
relatively generic term "integer" rather than the C-specific term
"int"). It happens that C's type "int" is very likely to be mapped
almost directly to a machine "word", but they're still two different
concepts that exist in two different contexts. In C, "int" and "long"
are two distinct types, even if they're both the same size; in the
CPU, that distinction no longer exists.
Floating point can exist at run time too, obviously, and many
CPUs support the type double, float and long double.

These types exist in hardware.
They exist *as types* only in a C program, either in source or during
compilation. They are mapped onto operations in the hardware. (And
some CPUs don't directly support floating point, but emulate it in
software.)

[...]
As a rule of thumb, anything having to do with the C language exists
only in your source program or at compilation time, not at run time.
(That's not completely true, since a lot of the names overlap.)


The task of the compiler is just to translate the instructions written
by the programmer as faithfully as possible into machine instructions
that do exactly what was written.


The task of the compiler is to *map* what the programmer wrote into
machine instructions. The nature of that mapping varies from one
compiler to another, and from one CPU to another. Types are a
high-level language concept. Pretending that the CPU-level concepts
are the "types" in the same sense as int and long is misleading.
Most types (fortunately) *exist* at run time in a quite real manner.
C exists at run time sorry. For instance when you specify:

float d = (float) a;
and a was a double, the machine will shed precision by writing the
number into memory as a float and re-reading it.
Unless float and double are the same size (which is perfectly legal).

Or the CPU manual might refer to single-precision and double-precision
reals. Yes, the C types are most likely mapped directly to certain
machine-level representations, but the *types* float and double exist
only on the C side of that mapping.

[...]
In my opinion, it's best to use the term "type" only for things that
are types in C, not for entities like machine words that exist at run
time.


I have to disagree here. The machine should follow exactly the type
description specified in the source program.


If it followed it exactly, wouldn't there be distinct machine-level
"types" for int and long?
Hardware is represented in C as a sequential space of
addreses, where are stored the data and the preprogrammed
(compiled) instructions sequences, that act with those
data and maybe further inputs.

That assumes a particular runtime model, one not required by the C
standard. There isn't necessarily a single sequential address space.


Some times.

[snip]

Yes, sometimes there is, sometimes there isn't. I'm not sure what
point you're trying to make; my point is that asserting that "Hardware
is represented in C as a sequential space of addreses" is misleading.

[...]
What is an object?
The term is defined in the standard.
Is a character in a character string an object?
Yes, and it happens to be a component of another object.
Can we
imagine a character string where each character resides
in a different address space?
No. Or rather, we can imagine it, but it wouldn't be legal in C.

To use a concrete example:

char s[6] = "hello";

s[1] and s[2] are both objects of type char. s is an object of type
char[6]. Since the object s has to be in a locally linear address
space (which it may or may not share with other objects), it follows
that s[1] and s[2] must be in the same address space. Thus it's legal,
for example, to compute (&s[2] - &s[1]).

Given:

char t[6] = "fubar";

s and t could be in distinct address spaces, and computing (&t[2] -
&s[1]) invokes undefined behavior.
> A function address could be anything that allows the
function to be called; it could easily be an index into a system table
rather than a machine-level address.


In C
(FnTable[index])(arg1,arg2)
is different from
fnptr(arg1,arg2)

A function expression must resolve to a machine address. This way
it can be passed around as an integer very efficiently. Most of
the power of C comes from this facility, functions as simple integers.

An efficient way of passing a *lot* of context.


I meant that a function pointer could be implemented as an index into
a system table. On many implementations, a function pointer is
implemented as a machine address, which looks like an integer index
into the entire address space (either of the machine or of the current
process). But an implementation that represented function pointers as
indices could be conforming.

The C implementation on the AS/400 represents function pointers as
some kind of large descriptor, not as a machine address.

All this is equally true for object pointers. Any pointer can be
represented in nearly any way the implementation chooses, as long as
the semantics are implemented consistently. Implementing pointers as
machine addresses happens to result in more efficient code on most
machines, but the C standard specifically doesn't require it.
In C you can at any moment change your mind and start
interpreting the same bits in another way. You make a
cast operation, i.e. you apply to some address a new
type.

A cast operator specifies a type conversion. Not all such conversions
are simple reinterpretations of the bits. Conversions between integer
and floating-point types almost certainly do more than just copying
the bits; conversions between pointer types may or may not do so.
What you're talking about is type punning; converting addresses is one
of several ways to achieve that.


True, I was speaking about re-interpreting because I wanted to emphasize
that memory is interpreted by the program. With this I am introducing
the discussion about strongly typed/weakly typed languages, that I hope
to come later on. This facility of re-interpreting memory is absent or
much more difficult in several other languages. As everything this can
be handy if well used, or a nightmare if abused.


My objection was to the reference to a cast operator. Not all casts
just re-interpret their operands, and not all type punning is done via
cast operators.
There are even types that you can't figure out at all:
opaque types. This type is, for instance:

struct unknown *bn;

and struct "unknown" is nowhere defined. Or even worst:

void *bn;

A pointer that points to an unknown object.
You can do only one thing with this pointer:
pass it around.

These are called incomplete types; it's best to keep your terminology
consistent with standard usage.


struct unknown * is incomplete, void * not. To a beginner, an
expression like void * must be utterly strange. I will explain
this more later on.


Neither "struct unknown *" nor "void *" is an incomplete type; they're
pointers, possibly to incomplete types. "struct unknown" is an
incomplete type if there's no definition for the complete type.
"void" is an incomplete type that cannot be completed.
Incomplete types and opaque types are two different things, and an
opaque type needn't be an incomplete type. For example, the type FILE
in <stdio.h> is opaque as far as the programmer is concerned (the
standard says nothing about its contents), but I can see what's in it
(in many implementations) by viewing the appropriate header file.


If you can see the contents, then its not opaque. Of course, any
non-opaque structure can be converted in an opaque one if you refuse
to look into it, but this would be playing with words. Normally, since
it is not specified in the standard it is better not to mess with it,
I agree with that, but with real opaque structures like void * that
is no longer possible. You can't use them, they enforce themselves
by definition.


The type FILE is an odd case. It's intended to act like an opaque
type, in the sense that the user isn't supposed to look inside it --
and a conforming implementation probably could make it a genuine
incomplete type, hiding the actual definition inside the library
implementation.
Thanks for your feedback.


You're welcome.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #34

P: n/a
Keith Thompson wrote:
.... snip ...
The type FILE is an odd case. It's intended to act like an opaque
type, in the sense that the user isn't supposed to look inside it --
and a conforming implementation probably could make it a genuine
incomplete type, hiding the actual definition inside the library
implementation.


Unfortunately making it truly opaque would often prevent the very
useful implementation of getc and putc as macros. So the only
solution left is to yell stridently at programmers who read the
definition of FILE.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #35

P: n/a
CBFalconer <cb********@yahoo.com> writes:
Keith Thompson wrote:
The type FILE is an odd case. It's intended to act like an opaque
type, in the sense that the user isn't supposed to look inside it --
and a conforming implementation probably could make it a genuine
incomplete type, hiding the actual definition inside the library
implementation.


Unfortunately making it truly opaque would often prevent the very
useful implementation of getc and putc as macros. So the only
solution left is to yell stridently at programmers who read the
definition of FILE.


Good point, I had forgotten about that.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #36

P: n/a
CBFalconer <cb********@yahoo.com> wrote:
Keith Thompson wrote:
The type FILE is an odd case. It's intended to act like an opaque
type, in the sense that the user isn't supposed to look inside it --
and a conforming implementation probably could make it a genuine
incomplete type, hiding the actual definition inside the library
implementation.


Unfortunately making it truly opaque would often prevent the very
useful implementation of getc and putc as macros. So the only
solution left is to yell stridently at programmers who read the
definition of FILE.


Not necessarily. The standard headers needn't be available as files; and
even when an implementor wants to make most of his <stdio.h> legible to
the user, nothing need stop him from having it contain something like

#include <FILE_magic>
#define FILE __FILE_magic_FILE
#define getc __FILE_magic_getc

Richard
Nov 14 '05 #37

P: n/a
In <ln************@nuthaus.mib.org> Keith Thompson <ks***@mib.org> writes:
The type FILE is an odd case. It's intended to act like an opaque
type, in the sense that the user isn't supposed to look inside it --
and a conforming implementation probably could make it a genuine
incomplete type, hiding the actual definition inside the library
implementation.


Nope, it cannot.

2 The types declared are size_t (described in 7.17);

FILE

which is an object type ...

An incomplete type is not an object type, so your implementation would be
non-conforming.

I see no good reason for this requirement in the standard (private copies
of FILE objects generated by the standard C library are useless), but the
requirement is there and cannot be ignored by conforming implementations.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union
Nov 14 '05 #38

P: n/a
Richard Bos wrote:
CBFalconer <cb********@yahoo.com> wrote:
Keith Thompson wrote:
The type FILE is an odd case. It's intended to act like an opaque
type, in the sense that the user isn't supposed to look inside it --
and a conforming implementation probably could make it a genuine
incomplete type, hiding the actual definition inside the library
implementation.


Unfortunately making it truly opaque would often prevent the very
useful implementation of getc and putc as macros. So the only
solution left is to yell stridently at programmers who read the
definition of FILE.


Not necessarily. The standard headers needn't be available as files;
and even when an implementor wants to make most of his <stdio.h>
legible to the user, nothing need stop him from having it contain
something like

#include <FILE_magic>
#define FILE __FILE_magic_FILE
#define getc __FILE_magic_getc


How does this prevent the snooper from reading FILE_magic? If you
wire those definitions into the compiler, and eliminate the
existence of FILE_magic, then you have given up the flexibility of
revising the actual FILE implementation, causing attendant future
pain.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #39

P: n/a
Dan Pop wrote:
In <ln************@nuthaus.mib.org> Keith Thompson <ks***@mib.org> writes:

The type FILE is an odd case. It's intended to act like an opaque
type, in the sense that the user isn't supposed to look inside it --
and a conforming implementation probably could make it a genuine
incomplete type, hiding the actual definition inside the library
implementation.

Nope, it cannot.

2 The types declared are size_t (described in 7.17);

FILE

which is an object type ...

An incomplete type is not an object type, so your implementation would be
non-conforming.


A substantial amount of hiding is possible even so,
at the cost of one extra indirection level:

/* <stdio.h> */
typedef struct __file_magic *FILE;
...

`FILE' is now an object type (to wit, a pointer), and that
much is revealed as required. The nature of what it points
to, though, remains hidden.

--
Er*********@sun.com

Nov 14 '05 #40

P: n/a
In <ck**********@news1brm.Central.Sun.COM> Eric Sosman <er*********@sun.com> writes:
Dan Pop wrote:
In <ln************@nuthaus.mib.org> Keith Thompson <ks***@mib.org> writes:

The type FILE is an odd case. It's intended to act like an opaque
type, in the sense that the user isn't supposed to look inside it --
and a conforming implementation probably could make it a genuine
incomplete type, hiding the actual definition inside the library
implementation.

Nope, it cannot.

2 The types declared are size_t (described in 7.17);

FILE

which is an object type ...

An incomplete type is not an object type, so your implementation would be
non-conforming.


A substantial amount of hiding is possible even so,
at the cost of one extra indirection level:

/* <stdio.h> */
typedef struct __file_magic *FILE;
...

`FILE' is now an object type (to wit, a pointer), and that
much is revealed as required. The nature of what it points
to, though, remains hidden.


Another way of hiding the internals is:

typedef unsigned char FILE[__file_size];

while the implementation operates with a completely different definition
of FILE, having the property that sizeof(__internal_FILE) == __file_size.

Even if the two types may not have the same alignment requirements, no
correct C program is going to be affected, because all valid FILE objects
are created by the implementation and are, therefore, guaranteed to be
correctly aligned.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union
Nov 14 '05 #41

P: n/a
Da*****@cern.ch (Dan Pop) writes:
In <ln************@nuthaus.mib.org> Keith Thompson <ks***@mib.org> writes:
The type FILE is an odd case. It's intended to act like an opaque
type, in the sense that the user isn't supposed to look inside it --
and a conforming implementation probably could make it a genuine
incomplete type, hiding the actual definition inside the library
implementation.
Nope, it cannot.

2 The types declared are size_t (described in 7.17);

FILE

which is an object type ...

An incomplete type is not an object type, so your implementation would be
non-conforming.


You're right, my mistake.
I see no good reason for this requirement in the standard (private copies
of FILE objects generated by the standard C library are useless), but the
requirement is there and cannot be ignored by conforming implementations.


It's already been pointed out elsethread that the common
implementation of getc and putc as macros requires visibility to the
internals of the FILE type; there's no good way to make those
internals visible for macro expansion without making them potentially
visible to the programmer.

On the other hand (as someone else pointed out), this could probably
be handled with some sort of compiler magic.

A strictly conforming program can depend on FILE being properly
declared as an object type (by declaring a FILE object); a *sensible*
program cannot.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #42

P: n/a
In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.org> wrote:
Da*****@cern.ch (Dan Pop) writes:
In <ln************@nuthaus.mib.org> Keith Thompson <ks***@mib.org> writes:

I see no good reason for this requirement in the standard (private copies
of FILE objects generated by the standard C library are useless), but the
requirement is there and cannot be ignored by conforming implementations.
It's already been pointed out elsethread that the common
implementation of getc and putc as macros requires visibility to the
internals of the FILE type; there's no good way to make those
internals visible for macro expansion without making them potentially
visible to the programmer.


One could, however, imagine an implementation deciding to reveal *part*
of the implementation (enough for the macros it cares about), and leaving
the rest undefined:

typedef struct __FILE_TAG {
char *_ptr;
int _cnt;
int __pad[__FILE_PAD];
} FILE;

<OT>sizeof (FILE) has to be correct mainly for those implementations
stuck with the "_iob" array, where stdout (for example) is &_iob[1]</OT>
On the other hand (as someone else pointed out), this could probably
be handled with some sort of compiler magic.

A strictly conforming program can depend on FILE being properly
declared as an object type (by declaring a FILE object); a *sensible*
program cannot.


Of course.

- jonathan
Nov 14 '05 #43

P: n/a
Dan Pop wrote:
Eric Sosman <er*********@sun.com> writes:
Dan Pop wrote:
Keith Thompson <ks***@mib.org> writes:

The type FILE is an odd case. It's intended to act like an opaque
type, in the sense that the user isn't supposed to look inside it --
and a conforming implementation probably could make it a genuine
incomplete type, hiding the actual definition inside the library
implementation.

Nope, it cannot.

2 The types declared are size_t (described in 7.17);

FILE

which is an object type ...

An incomplete type is not an object type, so your implementation
would be non-conforming.


A substantial amount of hiding is possible even so,
at the cost of one extra indirection level:

/* <stdio.h> */
typedef struct __file_magic *FILE;
...

`FILE' is now an object type (to wit, a pointer), and that
much is revealed as required. The nature of what it points
to, though, remains hidden.


Another way of hiding the internals is:

typedef unsigned char FILE[__file_size];

while the implementation operates with a completely different
definition of FILE, having the property that
sizeof(__internal_FILE) == __file_size.

Even if the two types may not have the same alignment requirements,
no correct C program is going to be affected, because all valid
FILE objects are created by the implementation and are, therefore,
guaranteed to be correctly aligned.


And how do you then implement getc and putc macros, which have to
be expanded in the users code, without incorporating magic numbers
and worse? Conceded you don't have to implement them as macros.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 14 '05 #44

P: n/a
In <ln************@nuthaus.mib.org> Keith Thompson <ks***@mib.org> writes:
Da*****@cern.ch (Dan Pop) writes:
I see no good reason for this requirement in the standard (private copies
of FILE objects generated by the standard C library are useless), but the
requirement is there and cannot be ignored by conforming implementations.
It's already been pointed out elsethread that the common


By someone with chronical problems when it comes to engaging his brain...
implementation of getc and putc as macros requires visibility to the
internals of the FILE type; there's no good way to make those
internals visible for macro expansion without making them potentially
visible to the programmer.


The decision should be left open to the implementor, rather than being
imposed on him by the standard. If the implementor doesn't provide *any*
<stdio.h> routine as a macro, why should he provide FILE as an object
type?

The (bogus) argument you're invoking works against the standard
requiring FILE to be an incomplete type, but, since this is not
the case and no one was advocating a change in this direction...

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union
Nov 14 '05 #45

P: n/a
In <41***************@yahoo.com> CBFalconer <cb********@yahoo.com> writes:
Dan Pop wrote:
Eric Sosman <er*********@sun.com> writes:
Dan Pop wrote:
Keith Thompson <ks***@mib.org> writes:

> The type FILE is an odd case. It's intended to act like an opaque
> type, in the sense that the user isn't supposed to look inside it --
> and a conforming implementation probably could make it a genuine
> incomplete type, hiding the actual definition inside the library
> implementation.

Nope, it cannot.

2 The types declared are size_t (described in 7.17);

FILE

which is an object type ...

An incomplete type is not an object type, so your implementation
would be non-conforming.

A substantial amount of hiding is possible even so,
at the cost of one extra indirection level:

/* <stdio.h> */
typedef struct __file_magic *FILE;
...

`FILE' is now an object type (to wit, a pointer), and that
much is revealed as required. The nature of what it points
to, though, remains hidden.


Another way of hiding the internals is:

typedef unsigned char FILE[__file_size];

while the implementation operates with a completely different
definition of FILE, having the property that
sizeof(__internal_FILE) == __file_size.

Even if the two types may not have the same alignment requirements,
no correct C program is going to be affected, because all valid
FILE objects are created by the implementation and are, therefore,
guaranteed to be correctly aligned.


And how do you then implement getc and putc macros, which have to
be expanded in the users code, without incorporating magic numbers
and worse? Conceded you don't have to implement them as macros.


Where does the "without incorporating magic numbers" requirement come
from? The standard headers are already required to define plenty of
magic numbers, like FILENAME_MAX, HUGE_VAL and the whole of <limits.h>
and <float.h>.

If I am the implementor, I *know* the internal structure of the byte
array, even if I don't explicitly reveal it in <stdio.h>.

For example, if I want to implement the (nonstandard) fileno function
as a macro, I can do:

#define __handle_offset 24
#define fileno(fp) (*(int *)&(*(fp))[__handle_offset])

It's certainly less readable than

#define fileno(fp) ((fp)->handle)

but who says that standard headers must contain nice C code?

Dan
Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union
Nov 14 '05 #46

P: n/a
Da*****@cern.ch (Dan Pop) writes:
In <ln************@nuthaus.mib.org> Keith Thompson <ks***@mib.org> writes:
Da*****@cern.ch (Dan Pop) writes:
I see no good reason for this requirement in the standard (private copies
of FILE objects generated by the standard C library are useless), but the
requirement is there and cannot be ignored by conforming implementations.
It's already been pointed out elsethread that the common


By someone with chronical problems when it comes to engaging his brain...


Yeah, whatever...
implementation of getc and putc as macros requires visibility to the
internals of the FILE type; there's no good way to make those
internals visible for macro expansion without making them potentially
visible to the programmer.


The decision should be left open to the implementor, rather than being
imposed on him by the standard. If the implementor doesn't provide *any*
<stdio.h> routine as a macro, why should he provide FILE as an object
type?


I agree.
The (bogus) argument you're invoking works against the standard
requiring FILE to be an incomplete type, but, since this is not
the case and no one was advocating a change in this direction...


I was merely speculating about why the standard requires FILE to be an
object type. Many, perhaps most, implementations do implement getc
and putc as macros; making FILE an object type makes that easier to
do. But I agree that the standard's requirement is superfluous; as
long as an implementation is *allowed* to make FILE an object type, I
can't think of a good reason for the standard to *require* it to be an
object type.

And yes, you make a valid point that I missed earlier, so don't bother
telling me to engage anything.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #47

P: n/a
Da*****@cern.ch (Dan Pop) writes:
In <41***************@yahoo.com> CBFalconer <cb********@yahoo.com> writes: [...]
And how do you then implement getc and putc macros, which have to
be expanded in the users code, without incorporating magic numbers
and worse? Conceded you don't have to implement them as macros.


Where does the "without incorporating magic numbers" requirement come
from? The standard headers are already required to define plenty of
magic numbers, like FILENAME_MAX, HUGE_VAL and the whole of <limits.h>
and <float.h>.

If I am the implementor, I *know* the internal structure of the byte
array, even if I don't explicitly reveal it in <stdio.h>.


Maybe. If I'm writing an implementation of <stdio.h>, I'm allowed to
assume that the offset of a certain member is, say, 24, but I'd rather
not have to change all my magic numbers if the compiler changes the
way it lays out struct members. The compiler and the library could
be, and commonly are, implememented by two entirely different groups.
For example, if I want to implement the (nonstandard) fileno function
as a macro, I can do:

#define __handle_offset 24
#define fileno(fp) (*(int *)&(*(fp))[__handle_offset])

It's certainly less readable than

#define fileno(fp) ((fp)->handle)

but who says that standard headers must contain nice C code?


They certainly don't have to, and what you suggest is perfectly legal,
but it's nice if the code is at least maintainable if not legible.

Another way to handle this might be to define an auxiliary struct type
for the actual implementation of the FILE type, and make FILE itself
something like a character array. For example:

struct __FILE {
char *_internal_foo;
int _internal_bar;
};

typedef char FILE[sizeof(struct __FILE)];

If getc is implemented as a macro, and it needs to refer to the
_internal_foo member, it can use

((struct __FILE*)stream)->_internal_foo

There are no magic numbers, but a user program can't mess things up by
referring directly to stdin->_internal_foo.

(If the standard didn't require FILE to be an object type, the
implementation could do the same thing with "typedef void FILE;".)

On the other hand, this probably doesn't buy you all that much. Any
programmer can still write horribly non-portable code that manipulates
the internals of the FILE structure by examining the header file
(assuming it's a file) and figuring out what casts he needs to use.
Even with the internals of the FILE structure completely visible, as
they are in many (most?) implementations, most programmers are wise
enough (or ignorant enough) not to do this kind of thing.

None of this stuff (whether FILE is an object type, and what's in it
if it is) should be of any concern to any programmers who aren't
actually implementing the standard library.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #48

P: n/a
Dan Pop wrote:
CBFalconer <cb********@yahoo.com> writes:
Dan Pop wrote:
Eric Sosman <er*********@sun.com> writes:
Dan Pop wrote:
> Keith Thompson <ks***@mib.org> writes:
>
>> The type FILE is an odd case. It's intended to act like an opaque
>> type, in the sense that the user isn't supposed to look inside it --
>> and a conforming implementation probably could make it a genuine
>> incomplete type, hiding the actual definition inside the library
>> implementation.
>
> Nope, it cannot.
>
> 2 The types declared are size_t (described in 7.17);
>
> FILE
>
> which is an object type ...
>
> An incomplete type is not an object type, so your implementation
> would be non-conforming.

A substantial amount of hiding is possible even so,
at the cost of one extra indirection level:

/* <stdio.h> */
typedef struct __file_magic *FILE;
...

`FILE' is now an object type (to wit, a pointer), and that
much is revealed as required. The nature of what it points
to, though, remains hidden.

Another way of hiding the internals is:

typedef unsigned char FILE[__file_size];

while the implementation operates with a completely different
definition of FILE, having the property that
sizeof(__internal_FILE) == __file_size.

Even if the two types may not have the same alignment requirements,
no correct C program is going to be affected, because all valid
FILE objects are created by the implementation and are, therefore,
guaranteed to be correctly aligned.


And how do you then implement getc and putc macros, which have to
be expanded in the users code, without incorporating magic numbers
and worse? Conceded you don't have to implement them as macros.


Where does the "without incorporating magic numbers" requirement come
from? The standard headers are already required to define plenty of
magic numbers, like FILENAME_MAX, HUGE_VAL and the whole of <limits.h>
and <float.h>.


.... snip about snarky ways to implement it ...

The whole point about making definitions and using them, defining
structures, etc. is to make it easy for the implementor to change
his mind, without destroying the users code. One you start playing
wierd games you make the whole system more error prone. We already
have a well known vendor of windowed OSs that does that, with
abysmal results.

So "without incorporating magic numbers" is not a requirement, but
just good practice.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 14 '05 #49

P: n/a
Keith Thompson <ks***@mib.org> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> writes:
Keith Thompson <ks***@mib.org> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> writes:
> Keith Thompson <ks***@mib.org> writes: [...] In my opinion, what you're calling a "representation type" isn't a
type at all.
Wouldn't you say pointers that store just an address and pointers that
store an address and a length are different types of pointers?
Wouldn't you say that a 'cdecl' function and a 'stdcall' function are
different types of functions? Wouldn't you say a number stored in
host order and a number stored in network order are different types of
numbers (even if on the host in question values in the two orderings
always had the same representations)? It makes just as much sense to
say that there are different types of representations as it does to
say that there are different types of variables.


I wouldn't call those things "types" at all.


Yes, so you said. I was hoping you would do more in the way of
motivating, clarifying or explaining rather than just restating your
opinion.

A type is something that exists in your program.
A machine-language-level program is still a program. Other programming
languages have types. Is there something special about machine language
that the notion of types can't be used when discussing machine language
programs?

Using the term "type", in a very similar
context, to refer to a different concept can only cause confusion.
Here you're being careless with language. First I have been careful
to distinguish two different terms, "representation type" and "program
type"; analogously, consider "Java type" and "C type". Second, even
disregarding that the two terms are distinct, the presence of the word
"type" in both _could_ lead to confusion, but certainly it's not the
case that it can _only_ cause confusion - it can also lead to
clarification and deeper understanding, at least for some people, as
this thread has shown.

On an implementation where int and long have the same representation
(say, both are 32-bit two's-complement), they are distinct types. A
pointer to int and a pointer to long are distinct types. I might call
pointers that store just an address and pointers that store an address
and a length different *kinds* of pointers.
Different kinds of pointers, different types of pointers, different
forms of pointers... I did say I was looking for terminology. What
makes one of these terms better than another? And why?

Incidentally, historically the early uses of the word "type" in
programming languages were basically as synonyms for "kind".

You can talk about the representation of a type (32-bit
two's-complement), or the representation of a particular value
(hexadecimal DEADBEEF); as long as you're careful, I don't see much of
a problem using the same word for both.


It's applicable to one but not the other. The word representation
means "likeness or image"; unless there is something stored somewhere
in the running program, such a byte with a '4' in it and a rule like
"'4' means int", the phrase "the representation of a type" is a misuse
of language. That's a pretty good indicator that this path isn't
the right one to go down.


I disagree; it's perfectly appropriate to refer to the representation
of a type. The standard does this.


I have no doubt that the standard uses the phrase "representation of a
type", or something like it. I'm just as confident that the phrase is
used as linguistic shorthand for a collective noun meaning "all of the
representations of values of the type", or perhaps "the manner in which
values of the type are represented." The evidence that this is so is
that the phrase is used only in connection with object types - it is
not used for function types or (some?) incomplete types. Right?

[...]
I don't mean "the representation of a type"; what I mean is "the type
of representation". It seems like "representation type" is a better
term for that.

Suggestions from other quarters have been "machine type",
"representational type", "implementation type", and "representation
schema". Are any of those less confusing or less misleading
than "representation type"?


Just about anything that doesn't use the word "type" for something
that isn't a C type would be better than "representation type".


Reading between the lines (and ignoring the hyperbole), it seems like
you think the word "type" should be used for, and only for, things
that are types in the C programming language. Surely most readers of
the newsgroup realize that the word type also has meaning in other
contexts; even if, given the context of comp.lang.c, we would expect
an unqualified use of "type" to be taken to mean a C source language
type, it seems reasonable to expect a qualified use of "type" to be
read according to context - "Ada parameter type", "polymorphic type",
"FORTRAN type arrays", etc. The comment given apparently disregards
this, makes no particular suggestion, and doesn't really respond to
the question posed.

> Also, using
> "representation" by itself doesn't work very well for functions; we
> aren't interested in the particular bits that make up a function's
> object code, but we are interested in what calling conventions are
> necessary to call it. An unadorned "representation" tends to evoke
> bit patterns more than it does calling sequences.

I wouldn't talk about the "representation" of a function at all,
either of its object code or of its calling convention. Data items
have representations.


That you wouldn't use the same word for functions is a good indication
that it's not really the right term here. Both "data values" and
"function values" have differing patterns of implementation. What
we're trying to find is a term that captures and expresses the idea of
an "implementational pattern" - the same term should apply equally to
differences in function implementation as it does to differences in
data implementation.


That I woudn't use the same word for functions indicates that
functions and data items are very different things. On the C level,
the word "type" applies to both. On the implementation level, there
is no term that applies to both (except perhaps something vague like
"entity").


Granted, functions and data items are different things. If you want
to refer to them as "entities", that's fine with me (and in fact
"entity" is not bad as a collective term). So the question becomes,
what term should we use to mean distinct categories -- or kinds, or
forms, or types -- of the various implementation (run-time) entities
(with the understanding that the term is analogous to the notion of
"type" in C source code)?
Nov 14 '05 #50

51 Replies

This discussion thread is closed

Replies have been disabled for this discussion.