Making Fatal Hidden Assumptions - Page 3

CBFalconer

We often find hidden, and totally unnecessary, assumptions being
made in code. The following leans heavily on one particular
example, which happens to be in C. However similar things can (and
do) occur in any language.

These assumptions are generally made because of familiarity with
the language. As a non-code example, consider the idea that the
faulty code is written by blackguards bent on foulling the
language. The term blackguards is not in favor these days, and for
good reason. However, the older you are, the more likely you are
to have used it since childhood, and to use it again, barring
specific thought on the subject. The same type of thing applies to
writing code.

I hope, with this little monograph, to encourage people to examine
some hidden assumptions they are making in their code. As ever, in
dealing with C, the reference standard is the ISO C standard.
Versions can be found in text and pdf format, by searching for N869
and N1124. [1] The latter does not have a text version, but is
more up-to-date.

We will always have innocent appearing code with these kinds of
assumptions built-in. However it would be wise to annotate such
code to make the assumptions explicit, which can avoid a great deal
of agony when the code is reused under other systems.

In the following example, the code is as downloaded from the
referenced URL, and the comments are entirely mine, including the
'every 5' linenumber references.

/* Making fatal hidden assumptions */
/* Paul Hsiehs version of strlen.
http://www.azillionmonkeys.com/qed/asmexample.html

Some sneaky hidden assumptions here:
1. p = s - 1 is valid. Not guaranteed. Careless coding.
2. cast (int) p is meaningful. Not guaranteed.
3. Use of 2's complement arithmetic.
4. ints have no trap representations or hidden bits.
5. 4 == sizeof(int) && 8 == CHAR_BIT.
6. size_t is actually int.
7. sizeof(int) is a power of 2.
8. int alignment depends on a zeroed bit field.

Since strlen is normally supplied by the system, the system
designer can guarantee all but item 1. Otherwise this is
not portable. Item 1 can probably be beaten by suitable
code reorganization to avoid the initial p = s - 1. This
is a serious bug which, for example, can cause segfaults
on many systems. It is most likely to foul when (int)s
has the value 0, and is meaningful.

He fails to make the valid assumption: 1 == sizeof(char).
*/

#define hasNulByte(x) ((x - 0x01010101) & ~x & 0x80808080)
#define SW (sizeof (int) / sizeof (char))

int xstrlen (const char *s) {
const char *p; /* 5 */
int d;

p = s - 1;
do {
p++; /* 10 */
if ((((int) p) & (SW - 1)) == 0) {
do {
d = *((int *) p);
p += SW;
} while (!hasNulByte (d)); /* 15 */
p -= SW;
}
} while (*p != 0);
return p - s;
} /* 20 */

Let us start with line 1! The constants appear to require that
sizeof(int) be 4, and that CHAR_BIT be precisely 8. I haven't
really looked too closely, and it is possible that the ~x term
allows for larger sizeof(int), but nothing allows for larger
CHAR_BIT. A further hidden assumption is that there are no trap
values in the representation of an int. Its functioning is
doubtful when sizeof(int) is less that 4. At the least it will
force promotion to long, which will seriously affect the speed.

This is an ingenious and speedy way of detecting a zero byte within
an int, provided the preconditions are met. There is nothing wrong
with it, PROVIDED we know when it is valid.

In line 2 we have the confusing use of sizeof(char), which is 1 by
definition. This just serves to obscure the fact that SW is
actually sizeof(int) later. No hidden assumptions have been made
here, but the usage helps to conceal later assumptions.

Line 4. Since this is intended to replace the systems strlen()
function, it would seem advantageous to use the appropriate
signature for the function. In particular strlen returns a size_t,
not an int. size_t is always unsigned.

In line 8 we come to a biggie. The standard specifically does not
guarantee the action of a pointer below an object. The only real
purpose of this statement is to compensate for the initial
increment in line 10. This can be avoided by rearrangement of the
code, which will then let the routine function where the
assumptions are valid. This is the only real error in the code
that I see.

In line 11 we have several hidden assumptions. The first is that
the cast of a pointer to an int is valid. This is never
guaranteed. A pointer can be much larger than an int, and may have
all sorts of non-integer like information embedded, such as segment
id. If sizeof(int) is less than 4 the validity of this is even
less likely.

Then we come to the purpose of the statement, which is to discover
if the pointer is suitably aligned for an int. It does this by
bit-anding with SW-1, which is the concealed sizeof(int)-1. This
won't be very useful if sizeof(int) is, say, 3 or any other
non-poweroftwo. In addition, it assumes that an aligned pointer
will have those bits zero. While this last is very likely in
todays systems, it is still an assumption. The system designer is
entitled to assume this, but user code is not.

Line 13 again uses the unwarranted cast of a pointer to an int.
This enables the use of the already suspicious macro hasNulByte in
line 15.

If all these assumptions are correct, line 19 finally calculates a
pointer difference (which is valid, and of type size_t or ssize_t,
but will always fit into a size_t). It then does a concealed cast
of this into an int, which could cause undefined or implementation
defined behaviour if the value exceeds what will fit into an int.
This one is also unnecessary, since it is trivial to define the
return type as size_t and guarantee success.

I haven't even mentioned the assumption of 2's complement
arithmetic, which I believe to be embedded in the hasNulByte
macro. I haven't bothered to think this out.

Would you believe that so many hidden assumptions can be embedded
in such innocent looking code? The sneaky thing is that the code
appears trivially correct at first glance. This is the stuff that
Heisenbugs are made of. Yet use of such code is fairly safe if we
are aware of those hidden assumptions.

I have cross-posted this without setting follow-ups, because I
believe that discussion will be valid in all the newsgroups posted.

[1] The draft C standards can be found at:
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/>

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Mar 6 '06

Subscribe Post Reply

351

12676

<
1
2
3
4
5
>
Last »

msg

On the other hand, when Seymour Cray started his own company, those
machines where 2's complement.
The Cray blood-line starting at least with "Little Character" (prototype
for the 160) was 1's complement, implemented with subtraction as the
basis of arithmetic (the so-called 'adder pyramid'). Even the CDC
3000 series which were mostly others' designs retained 1's complement
arithmetic. The 6000 and 7000 series PPUs were essentially 160s also.
I should think it safe to say one could find 1's complement in Cray
designs from at least 1957 through the early 1980s.
And he shifted from 60 to 64 bit
words, but still retained octal notation (he did not like hexadecimal
at all).

Nor did he have truck with integrated circuits until absolutely necessary.

Michael Grigoni
Cybertheque Museum

Mar 9 '06 #101

Andrew Reilly

On Thu, 09 Mar 2006 00:15:17 +0000, Christian Bau wrote:

I just tried the following program (CodeWarrior 10 on MacOS X):

Same for gcc4 on MacOS X. However this slight permutation of your
program (only the comparison line has changed):

#include <stdio.h>

#define SIZE (50*1000000L)
typedef struct {
char a [SIZE];
} bigstruct;

static bigstruct bigarray [8];

int main(void)
{
printf("%lx\n", (unsigned long) &bigarray [0]);
printf("%lx\n", (unsigned long) &bigarray [9]);
printf("%lx\n", (unsigned long) &bigarray [-1]);

if (&bigarray [-1] - & bigarray [0] < 0)
printf ("Everything is fine\n");
else
printf ("The C Standard is right: &bigarray [-1] is broken\n");

return 0;
}

produces:
3080
1ad2a500
fd054000
Everything is fine

So what we see is that (a) pointer comparisons use direct unsigned integer
comparison, instead of checking the sign of the pointer difference---since
pointer comparisons only make sense in the context of an indivdual object,
I'd argue that the compiler is doing the wrong thing here, and the
comparison should instead have been done in the context of a pointer
difference; and (b) your printf string about "&bigarray[-1] is broken" is
wrong, since that's not what the code showed at all. What it showed is
that &bigarray[-1] could be formed, that &bigarray[0] was one element to
the right of it, and that hell did not freeze over (nor was any trap
taken), since you did not attempt to access any memory there.

Cheers,

--
Andrew

Mar 9 '06 #102

Andrew Reilly

On Wed, 08 Mar 2006 18:07:45 -0700, Al Balmer wrote:

On Thu, 09 Mar 2006 09:13:24 +1100, Andrew Reilly
<an*************@areilly.bpc-users.org> wrote:
C is not a higher-level language. It's a universal assembler. Pick
another one.
Nice parrot. I think the original author of that phrase meant it as a
joke.

Most jokes contain at least a kernel of truth.
I spent 25 years writing assembler. C is a higher-level language.

Yeah, me to. Still do, regularly, on processors that will never have a C
compiler. C is as close to a universal assembler as we've got at the
moment. It doesn't stick it's neck out too far, although a more
deliberately designed universal assembler would be a really good thing.
(It's on my list of things to do...)

If you actually *want* a higher level language, there are better ones
to chose from than C.

--
Andrew

Mar 9 '06 #103

Keith Thompson

Andrew Reilly <an*************@areilly.bpc-users.org> writes:

On Thu, 09 Mar 2006 00:00:34 +0000, Christian Bau wrote:
Question: If the C Standard guarantees that for any array a, &a [-1]
should be valid, should it also guarantee that &a [-1] != NULL

Probably, since NULL has been given the guarantee that it's unique in some
sense. In an embedded environment, or assembly language, the construct
could of course produce NULL (for whatever value you pick for NULL), and
NULL would not be special. I don't know that insisting on the existence of
a unique and special NULL pointer value is one of the standard's crowning
achievements, either. It's convenient for lots of things, but it's just
not the way simple hardware works, particularly at the limits.

How exactly do you get from NULL (more precisely, a null pointer
value) being "unique in some sense" to a guarantee that &a[-1], which
doesn't point to any object, is unequal to NULL?

The standard guarantees that a null pointer "is guaranteed to compare
unequal to a pointer to any object or function". &a[-1] is not a
pointer to any object or function, so the standard doesn't guarantee
that &a[-1] != NULL.

Plausibly, if a null pointer is represented as all-bits-zero, and
pointer arithmetic works like integer arithmetic, an object of size N
could easily happen to be allocated at address N; then pointer
arithmetic could yield a null pointer value. (In standard C, this is
one of the infinitely many possible results of undefined behavior.)

What restrictions would you be willing to impose, and/or what code
would you be willing to break, in order to make such a guarantee?

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Mar 9 '06 #104

Chris Torek

In article <pa****************************@areilly.bpc-users.org>
Andrew Reilly <an*************@areilly.bpc-users.org> wrote:

So what we see is that (a) pointer comparisons use direct unsigned integer
comparison, instead of checking the sign of the pointer difference ...
While the data are consistent with this conclusion, there are other
ways to arrive at the same output. But this is certainly allowed.

It is perhaps worth pointing out that in Ancient C (as in "whatever
Dennis' compiler did"), before the "unsigned" keyword even existed,
the way you got unsigned arithmetic and comparisons was to use
"char *". That is:

int a, b;
char *c, *d;
...
if (a < b) /* signed compare */
...
c = a; /* no cast needed because this was Ancient C */
d = b; /* (we could even do things like 077440->rkcsr!) */
if (c < d) /* unsigned compare */
...
I'd argue that the compiler is doing the wrong thing here ...

It sounded to me as though you liked what Dennis' original compilers
did, and wished that era still existed. In this respect, it does:
and now you argue that this is somehow "wrong".
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Mar 9 '06 #105

Eric Sosman

Chris Torek wrote:

Keith Thompson said:
Um, I always thought that "within" and "outside" were two different
things.

In article <du**********@nwrdmz03.dmz.ncs.ea.ibs-infra.bt.com>
Richard Heathfield <in*****@invalid.invalid> wrote:
Ask Jack to lend you his bottle. You'll soon change your mind.

To clarify a bit ...

A mathematician named Klein
Thought the Moebius band was divine
Said he, "If you glue
The edges of two
You'll get a weird bottle like mine!"

:-)

(A Moebius band has only one side. It is a two-dimensional object
that exists only in a 3-dimensional [or higher] space. A Klein
bottle can only be made in a 4-dimensional [or higher] space, and
is a 3-D object with only one side. The concept can be carried on
indefinitely, but a Klein bottle is hard enough to contemplate
already.)

See

http://www.kleinbottle.com/

--
Eric Sosman
es*****@acm-dot-org.invalid

Mar 9 '06 #106

Chad

Let me get this correct.

If I went something like

#include <stdio.h>
int main(void) {

int *p;
int arr[2];
p = arr + 4;

return 0;
}

This would be undefine behavior because I'm writing two past the array
instead of one. Right?

Chad

Mar 9 '06 #107

Arthur J. O'Dwyer

On Wed, 8 Mar 2006, Chad wrote:

Let me get this correct.
If I went something like

#include <stdio.h>
int main(void) {
int *p;
int arr[2];
p = arr + 4;
return 0;
}

This would be undefined behavior because I'm writing two past the array
instead of one. Right?

Wrong. It would be undefined behavior because you're constructing
a pointer that points /three/ elements past the end of the array.
("Writing" has nothing to do with it.) But yes, it's undefined
behavior in C (and C++).

-Arthur

Mar 9 '06 #108

Keith Thompson

"Chad" <cd*****@gmail.com> writes:

Let me get this correct.

If I went something like

#include <stdio.h>
int main(void) {

int *p;
int arr[2];
p = arr + 4;

return 0;
}

This would be undefine behavior because I'm writing two past the array
instead of one. Right?

You're not writing past the array, but yes, it's undefined behavior.

Given the above declarations, and adding "int i;":

p = arr + 1; /* ok */
i = *p; /* ok, accesses 2nd element of 2-element array */

p = arr + 2; /* ok, points just past end of array */
i = *p; /* undefined behavior */

p = arr + 3; /* undefined behavior, points too far past end of array */

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Mar 9 '06 #109

Richard Heathfield

Eric Sosman said:

http://www.kleinbottle.com/

"These elegant bottles make great gifts, fantastic classroom displays, and
inferior mouse-traps."

Now /that/ is good advertising.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Mar 9 '06 #110

James Dow Allen

David Holland wrote:

On 2006-03-07, James Dow Allen <jd*********@yahoo.com> wrote:
> [...] but I'm sincerely curious whether anyone knows of an *actual*
> environment where p == s will ever be false after (p = s-1; p++).

The problem is that evaluating s-1 might cause an underflow and a
trap, and then you won't even reach the comparison. You don't
necessarily have to dereference an invalid pointer to get a trap.

You might hit this behavior on any segmented architecture (e.g.,
80286, or 80386+ with segments on) ...

I'm certainly no x86 expert. Can you show or point to the output
of any C compiler which causes an "underflow trap" in this case?

At the risk of repetition, I'm *not* asking whether a past or future
compiler might or may trap (or trash my hard disk); I'd just be curious
to
see one (1) actual instance where the computation (without dereference)
p=s-1 causes a trap.

James

Mar 9 '06 #111

CBFalconer

Andrew Reilly wrote:

.... snip ...
I reckon I'll just go with the undefined flow, in the interests of
efficient, clean code on the architectures that I target. I'll
make sure that I supply a document specifying how the compilers
must behave for all of the undefined behaviours that I'm relying
on, OK? I have no interest in trying to make my code work on
architectures for which they don't hold.

That's just fine with me, and is the attitude I wanted to trigger.
As long as you recognize and document those assumptions, all is
probably well. In the process you may well find you don't need at
least some of the assumptions, and improve your code portability
thereby.

In the original sample code, it is necessary to deduce when an
integer pointer can be used in order to achieve the goals of the
routine. Thus it is necessary to make some of those assumptions.
Once documented, people know when the code won't work.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Mar 9 '06 #112

CBFalconer

msg wrote:

On the other hand, when Seymour Cray started his own company,
those machines where 2's complement.

The Cray blood-line starting at least with "Little Character"
(prototype for the 160) was 1's complement, implemented with
subtraction as the basis of arithmetic (the so-called 'adder
pyramid'). Even the CDC 3000 series which were mostly others'
designs retained 1's complement arithmetic. The 6000 and 7000
series PPUs were essentially 160s also. I should think it safe
to say one could find 1's complement in Cray designs from at
least 1957 through the early 1980s.

Please don't remove attributions for material you quote.

The reason to use a subtractor is that that guarantees than -0
never appears in the results. This allows using that value for
such things as traps, uninitialized, etc. It also simplifies
operand sign and zero detection. The same thing applies to decimal
machines using 9s complement, and I realized it too late to take
proper advantage in the firstpc, as shown on my website.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Mar 9 '06 #113

CBFalconer

Al Balmer wrote:

Chris Torek <no****@torek.net> wrote:
Richard Heathfield <in*****@invalid.invalid> wrote:
Keith Thompson said: Um, I always thought that "within" and "outside" were two
different things.

Ask Jack to lend you his bottle. You'll soon change your mind.

To clarify a bit ...

A mathematician named Klein
Thought the Moebius band was divine
Said he, "If you glue
The edges of two
You'll get a weird bottle like mine!"
:-)

(A Moebius band has only one side. It is a two-dimensional object
that exists only in a 3-dimensional [or higher] space. A Klein
bottle can only be made in a 4-dimensional [or higher] space, and
is a 3-D object with only one side. The concept can be carried on
indefinitely, but a Klein bottle is hard enough to contemplate
already.)

But that was Felix. Who's Jack?

Jack Klein, a noted contributor, especially in c.a.e.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Mar 9 '06 #114

Richard Bos

"James Dow Allen" <jd*********@yahoo.com> wrote:

David Holland wrote:
On 2006-03-07, James Dow Allen <jd*********@yahoo.com> wrote:
> [...] but I'm sincerely curious whether anyone knows of an *actual*
> environment where p == s will ever be false after (p = s-1; p++).

The problem is that evaluating s-1 might cause an underflow and a
trap, and then you won't even reach the comparison. You don't
necessarily have to dereference an invalid pointer to get a trap.

You might hit this behavior on any segmented architecture (e.g.,
80286, or 80386+ with segments on) ...

I'm certainly no x86 expert. Can you show or point to the output
of any C compiler which causes an "underflow trap" in this case?

At the risk of repetition, I'm *not* asking whether a past or future
compiler might or may trap (or trash my hard disk); I'd just be curious
to
see one (1) actual instance where the computation (without dereference)
p=s-1 causes a trap.

I don't know of any case where an pet grizzly bear who escaped has eaten
anyone in the Netherlands, but I'm still not short-sighted enough to use
that as an argument to allow grizzly bears as pets.

Richard

Mar 9 '06 #115

Richard Bos

Andrew Reilly <an*************@areilly.bpc-users.org> wrote:

On Wed, 08 Mar 2006 13:14:39 -0800, Andrey Tarasevich wrote:
This is a perfectly reasonable approach for a higher level language.

C is not a higher-level language. It's a universal assembler. Pick
another one.

All that statement means is that the person who utters it knows
diddly-squat about either C _or_ assembler.

Richard

Mar 9 '06 #116

Christian Bau

In article <pa****************************@areilly.bpc-users.org>,
Andrew Reilly <an*************@areilly.bpc-users.org> wrote:

On Thu, 09 Mar 2006 00:15:17 +0000, Christian Bau wrote:
I just tried the following program (CodeWarrior 10 on MacOS X):

Same for gcc4 on MacOS X. However this slight permutation of your
program (only the comparison line has changed):

#include <stdio.h>

#define SIZE (50*1000000L)
typedef struct {
char a [SIZE];
} bigstruct;

static bigstruct bigarray [8];

int main(void)
{
printf("%lx\n", (unsigned long) &bigarray [0]);
printf("%lx\n", (unsigned long) &bigarray [9]);
printf("%lx\n", (unsigned long) &bigarray [-1]);

if (&bigarray [-1] - & bigarray [0] < 0)
printf ("Everything is fine\n");
else
printf ("The C Standard is right: &bigarray [-1] is broken\n");

return 0;
}

produces:
3080
1ad2a500
fd054000
Everything is fine

So what we see is that (a) pointer comparisons use direct unsigned integer
comparison, instead of checking the sign of the pointer difference---since
pointer comparisons only make sense in the context of an indivdual object,
I'd argue that the compiler is doing the wrong thing here, and the
comparison should instead have been done in the context of a pointer
difference; and (b) your printf string about "&bigarray[-1] is broken" is
wrong, since that's not what the code showed at all. What it showed is
that &bigarray[-1] could be formed, that &bigarray[0] was one element to
the right of it, and that hell did not freeze over (nor was any trap
taken), since you did not attempt to access any memory there.

We didn't see anything. The code involved undefined behavior.

Now try the same with array indices -2, -3, -4 etc. and tell us when is
the first time that the program says your code is broken.

Or try this one on a 32 bit PowerPC or x86 system:

double* p;
double* q;

q = p + 0x2000000;
if (p == q)
printf ("It is broken!!!");
if (q - p == 0)
printf ("It is broken!!!");

Mar 9 '06 #117

Dik T. Winter

In article <44***************@yahoo.com> cb********@maineline.net writes:
....

The reason to use a subtractor is that that guarantees than -0
never appears in the results. This allows using that value for
such things as traps, uninitialized, etc.

This was however not done on any of the 1's complement machines I have
worked with. The +0 preferent machines (CDC) just did not generate it
in general. The -0 preferent machines I used (Electrologica) in general
did not generate +0. Bit the number not generated was not handled as
special in any way.

I have seen only one machine that used some particular bit pattern in
integers in a special way. The Gould. 2's complement but what would
now be regarded as the most negative bit pattern was a trap representation
on the Gould.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Mar 9 '06 #118

Ed Prochak

Andrew Reilly wrote:

On Wed, 08 Mar 2006 18:07:45 -0700, Al Balmer wrote:
On Thu, 09 Mar 2006 09:13:24 +1100, Andrew Reilly
<an*************@areilly.bpc-users.org> wrote:
C is not a higher-level language. It's a universal assembler. Pick
another one.
Nice parrot. I think the original author of that phrase meant it as a
joke.

Most jokes contain at least a kernel of truth.

Funny, I always called it a glorified assembler.

It fills a nitch that few true high level languages can.
I spent 25 years writing assembler. C is a higher-level language.
Yeah, me to. Still do, regularly, on processors that will never have a C
compiler. C is as close to a universal assembler as we've got at the
moment. It doesn't stick it's neck out too far, although a more
deliberately designed universal assembler would be a really good thing.
(It's on my list of things to do...)

Would a better universal assembler be more like assembler or more like
high level languages? I really think C hit very close to the optimal
balance.

If you actually *want* a higher level language, there are better ones
to chose from than C.

Good programmers definitely have to be multilingual.
ed

Mar 9 '06 #119

Keith Thompson

"Ed Prochak" <ed*******@gmail.com> writes:

Andrew Reilly wrote:

[...]

Yeah, me to. Still do, regularly, on processors that will never have a C
compiler. C is as close to a universal assembler as we've got at the
moment. It doesn't stick it's neck out too far, although a more
deliberately designed universal assembler would be a really good thing.
(It's on my list of things to do...)

Would a better universal assembler be more like assembler or more like
high level languages? I really think C hit very close to the optimal
balance.

Whether C is a "universal assembler" is an entirely separate question
from whether C is "good", or "better" than something else, or close to
some optimal balance.

As I understand the term, an assembly language is a symbolic language
in which the elements of the language map one-to-one (or nearly so)
onto machine-level instructions. Most assembly languages are, of
course, machine-specific, since they directly specify the actual
instructions. One could imagine a more generic assembler that uses
some kind of pseudo-instructions that can be translated more or less
one-to-one to actual machine instructions. C, though it's closer to
the machine than some languages, is not an assembler in this sense; in
a C program, you specify what you want the machine to do, not what
instructions it should use to do it.

<OT>Forth might be an interesting data point in this discussion, but
if you're going to go into that, please drop comp.lang.c from the
newsgroups.</OT>

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Mar 9 '06 #120

Paul Keinanen

On 9 Mar 2006 12:34:20 -0800, "Ed Prochak" <ed*******@gmail.com>
wrote:

Andrew Reilly wrote:

Yeah, me to. Still do, regularly, on processors that will never have a C
compiler. C is as close to a universal assembler as we've got at the
moment. It doesn't stick it's neck out too far, although a more
deliberately designed universal assembler would be a really good thing.
(It's on my list of things to do...)

Would a better universal assembler be more like assembler or more like
high level languages? I really think C hit very close to the optimal
balance.

I don't see much room for a "universal" assembler between C and a
traditional assembler, since the instruction sets can vary quite a
lot.

However, there exists a group of intermediate languages that try to
solve some problems in traditional assembler. One problem is the
management of local labels for branch targets, the other is that with
usually one opcode/source line the program can be quite long. Both
these things make it get a general view of what is going on, since
only a small fraction of the operations will fit into the editor
screen.

To handle the label management, some kind of conditional and repeat
blocks can be easily created. Allowing simple assignment statements
with assembly language operands makes it easier to but multiple
instructions on a single line.

Such languages have been implemented either as a preprocessor to an
ordinary assembler or directly as a macro set at least on PDP-11 and
VAX. The source code for PDP-11 might look like this:

IF R5 EQ #7
R3 = Base(R2) + (R0)+ - R1 + #4
ELSE
R3 = #5
SETF ; "Special" machine instruction
END_IF

would generate

CMP R5,#7
BNE 9$ ; Branch on opposite condition to else part
MOV Base(R2),R3
ADD (R0)+,R3
SUB R1,R3
ADD #4,R3
BR 19$ ; Jump over else part
9$: ; Else part
MOV #5,R3
SETF ; Copied directly from source
19$:

The assignment statement was always evaluated left to right or no
parenthesis were available to change the evaluation order. The
conditional expression consisted of one or two addressing mode
expression and a relational operator that translated to a conditional
branch instruction.

There is no point of trying to invent new constructions for each
machine instructions, so there is no problems in inserting any
"Special" machine instructions in the source file (in this case SETF),
which is copied directly to the pure assembler file.

For a different processor, the operands would of course be different,
but the control structures would nearly identical.

Paul

Mar 10 '06 #121

Ed Prochak

Keith Thompson wrote:

"Ed Prochak" <ed*******@gmail.com> writes:
Andrew Reilly wrote: [...]
Yeah, me to. Still do, regularly, on processors that will never have a C
compiler. C is as close to a universal assembler as we've got at the
moment. It doesn't stick it's neck out too far, although a more
deliberately designed universal assembler would be a really good thing.
(It's on my list of things to do...)

Would a better universal assembler be more like assembler or more like
high level languages? I really think C hit very close to the optimal
balance.

Whether C is a "universal assembler" is an entirely separate question
from whether C is "good", or "better" than something else, or close to
some optimal balance.

As I understand the term, an assembly language is a symbolic language
in which the elements of the language map one-to-one (or nearly so)
onto machine-level instructions.

The one-to-one mapping is broken for a macro assembler. Often there are
macros defined for subroutine entry and exit, loops and other things.
the only difference is that everyone defines their own macros, while in
C everybody uses the same "macros".
Most assembly languages are, of
course, machine-specific, since they directly specify the actual
instructions.
x++; in most machines can map to a single instruction
which is different from the instruction for ++x;
One could imagine a more generic assembler that uses
some kind of pseudo-instructions that can be translated more or less
one-to-one to actual machine instructions. C, though it's closer to
the machine than some languages, is not an assembler in this sense; in
a C program, you specify what you want the machine to do, not what
instructions it should use to do it.
I would agree that if an assembler must be a one-to-one mapping from
source line to opcode, then C doesn't fit. I just don't agree with that
definition of assembler.

<OT>Forth might be an interesting data point in this discussion, but
if you're going to go into that, please drop comp.lang.c from the
newsgroups.</OT>

Forth is definitely a contender.

nice discussion.
ed

Mar 10 '06 #122

Al Balmer

On 10 Mar 2006 06:44:22 -0800, "Ed Prochak" <ed*******@gmail.com>
wrote:

Keith Thompson wrote:
"Ed Prochak" <ed*******@gmail.com> writes:
> Andrew Reilly wrote: [...]
>> Yeah, me to. Still do, regularly, on processors that will never have a C

<snip>
The one-to-one mapping is broken for a macro assembler. Often there are
macros defined for subroutine entry and exit, loops and other things.
the only difference is that everyone defines their own macros, while in
C everybody uses the same "macros".
Not really. Many assemblers have predefined macros for various things,
and C programmers write macros using preprocessor directives.
Most assembly languages are, of
course, machine-specific, since they directly specify the actual
instructions.
x++; in most machines can map to a single instruction
which is different from the instruction for ++x;

Huh? As a standalone statement, if x is an integer type, I'd expect
both to be mapped to the machine's equivalent of INC x. If it's
embedded in a larger statement, or it's a pointer, it's likely that
several instructions will be generated, and a compiler (including a C
compiler) will do things that a macro assembler won't do.
One could imagine a more generic assembler that uses
some kind of pseudo-instructions that can be translated more or less
one-to-one to actual machine instructions.

Proposed decades ago, and there has been some implementation.
C, though it's closer to
the machine than some languages, is not an assembler in this sense; in
a C program, you specify what you want the machine to do, not what
instructions it should use to do it.

I would agree that if an assembler must be a one-to-one mapping from
source line to opcode, then C doesn't fit. I just don't agree with that
definition of assembler.

Nor does anyone else, since the invention of macros. However, C
doesn't fit any widely accepted definition of assembler. You can have
your own definition of assembler, as long as you don't expect folks to
know what you're talking about.

--
Al Balmer
Sun City, AZ

Mar 10 '06 #123

Ed Prochak

Richard G. Riley wrote:

"Ed"posted the following on 2006-03-10:
[]
The one-to-one mapping is broken for a macro assembler. Often there are
macros defined for subroutine entry and exit, loops and other things.
the only difference is that everyone defines their own macros, while in
C everybody uses the same "macros".

But a mnemonic representing an instruction is still just that. The
macro part is nothing more than rolling up of things for brevity. A
subsequent disassembly will reveal all the hidden gore.

But it will NOT display the original macro. There is no 1 to 1 mapping
from source to code.
Most assembly languages are, of
course, machine-specific, since they directly specify the actual
instructions.
x++; in most machines can map to a single instruction
which is different from the instruction for ++x;

How do you see them being different at the assembler level? They are
not are they? Its just when you do the (pseudo ASM) INC_REGx or ADDL 02,REGx or
whatever that matters isnt it?

Actuall I still think in PDP assembler at times (my first
assemblerprogramming).
so y=x++; really does map to a single instruction which both moves the
value to y and increments x (which had to be held in a register IIRC)
e.g if we have

y=++x;

then the pseduo assembler is
INC x
move y,x
MOV R1,x
A: MOV y,R1++
## I may have the syntax wrong, it's been a LONG time
where as

y=x++

is
move y,x
INC x
MOV R1,x
B: MOV y,++R1

The opcodes for those two instructions (lines A and B) are different in
PDP assembler.

Not taking into account expression return value that are CPU
equivalent. Admittedly I havent dabbled in later instruction sets for
the new post 80386 CPUs so please slap me down or better explain the
above if not right : its interesting.
I haven't played much in the intel realm since about the 286, and I
haven't done much assembly at all for about 10years. Even the last
embedded project I worked on with a tiny 8bit micro had a C compiler,
so I did nearly nothing in assembler. C makes it so much easier. I've
had the opinion of C as assembler since I first learned it (about
1983).

Some other languages do so much more for you that you might be scared
to look at the disassembly. e.g. languages that do array bounds
checking for you will generate much more code for a[y]=x; than does C.
You can picture the assembly code for C in your head without much
difficulty. The same doesn't hold true for some other languages.

One could imagine a more generic assembler that uses
some kind of pseudo-instructions that can be translated more or less
one-to-one to actual machine instructions. C, though it's closer to
the machine than some languages, is not an assembler in this sense; in
a C program, you specify what you want the machine to do, not what
instructions it should use to do it.
I would agree that if an assembler must be a one-to-one mapping from
source line to opcode, then C doesn't fit. I just don't agree with that
definition of assembler.

Sorry for coming late, but how do you see an assembler? In common
parlance it has always been (in my world) a program for converting
instruction set mnemonics into equivalent opcodes which run natively
on the target CPU.

I told you, a macro assembler does not work that way. One macro might
expand not just to multiple mnemonics, but to different mnemonics
depending on parameters. It is not 1 to 1 from source to assembly
mnemonics (let alone opcodes). A macro assembler can abstract just a
little or quite a lot away from the target machine. Depends on how you
use it. So while , an assembler is [] a program for converting
instruction set mnemonics into equivalent opcodes which run natively
on the target CPU. there's nothing about that conversion being one-to-one (mnemonic to
opcode)

even without macros, the one-to-one doesn't work if in the instruction
set the opcode for moving registers differs from moving memory, so
MOV R2,R!
differs from
MOV B,A
where R1 and R2 are register identifiers and A and B are memory
location. Yet we talk about the MOVe mnemonic as if both were the same
operation.

C's assignment operator maps about as closely to those opcodes as that
MOV mneumonic does. That's why I say it's a glorified assembler. You
have about as good an idea of what code is generated as you do with a
good assembler (as long as we can ignore the compiler's obtimizer).

--
"A desk is a dangerous place from which to view the world" - LeCarre.

Nice quote.
ed

Mar 10 '06 #124

cs_posting

Al Balmer wrote:

It doesn't have to make sure. It's free to segfault. You write funny
code, you pay the penalty (or your customers do.) Modern hardware does
a lot of speculation. It can preload or even precompute both branches
of a conditional, for example.

Hmm, so if I'm decrementing a divisior, and branching off somewhere
else before the actual divide instruction if the would be divisor is
zero, and your precomputation of both branches traps a division by zero
that a literal execution of my program would never perform... whose
fault is that?

I suspect that exception handling in speculative execution is a problem
that has been looked into.

Mar 10 '06 #125

Al Balmer

On 10 Mar 2006 11:42:24 -0800, cs********@hotmail.com wrote:

Al Balmer wrote:
It doesn't have to make sure. It's free to segfault. You write funny
code, you pay the penalty (or your customers do.) Modern hardware does
a lot of speculation. It can preload or even precompute both branches
of a conditional, for example.
Hmm, so if I'm decrementing a divisior, and branching off somewhere
else before the actual divide instruction if the would be divisor is
zero, and your precomputation of both branches traps a division by zero
that a literal execution of my program would never perform... whose
fault is that?

Not something to worry about, though you'd have to ask an expert why
:-) I suspect that this stuff is below the level of exception
triggers.
I suspect that exception handling in speculative execution is a problem
that has been looked into.

--
Al Balmer
Sun City, AZ

Mar 10 '06 #126

Michael Wojcik

In article <11**********************@z34g2000cwc.googlegroups .com>, "Ed Prochak" <ed*******@gmail.com> writes:

Keith Thompson wrote:
Most assembly languages are, of
course, machine-specific, since they directly specify the actual
instructions.

x++; in most machines can map to a single instruction

Sure, if "most machines" excludes load/store architectures, and
machines which cannot operate directly on an object of the size of
whatever x happens to be, and all the cases where "x" is a pointer to
an object of a size other than the machine's addressing granularity...

I suppose you could argue that "can" in your claim is intended to be
weak - that, for "most machines" (with a conforming C implementation,
presumably), there exists at least one C program containing the
statement "x++;", and a conforming C implementation which will
translate that statement to a single machine instruction.

But that's a very small claim. All machines "can" map that statement
to multiple instructions as well; many "can" map it to zero
instructions in that sense (taking advantage of auto-increment modes
or the like). What can happen says very little about what will.

The presence in C of syntactic sugar for certain simple operations
like "x++" doesn't support the claim that C is somehow akin to
assembler in any case. One distinguishing feature of assembler is
a *lack* of syntactic sugar. (Macros aren't a counterexample
because they're purely lexical constructs; in principle they're
completely separate from code generation.)

C isn't assembler because:

- It doesn't impose a strict mapping between (preprocessed) source
and generated code. The "as if" clause allows the implementation
to have the generated code differ significantly from a strict
interpretation of the source acting on the virtual machine.

- It has generalized constructs (expressions) which can result in
the implementation generating arbitrarily complex code.

--
Michael Wojcik mi************@microfocus.com

Any average educated person can turn out competent verse. -- W. H. Auden

Mar 10 '06 #127

Paul Keinanen

On 10 Mar 2006 20:15:45 GMT, mw*****@newsguy.com (Michael Wojcik)
wrote:

The presence in C of syntactic sugar for certain simple operations
like "x++" doesn't support the claim that C is somehow akin to
assembler in any case. One distinguishing feature of assembler is
a *lack* of syntactic sugar. (Macros aren't a counterexample
because they're purely lexical constructs; in principle they're
completely separate from code generation.)

Mnemonics and symbolic addresses in assemblers are just syntactic
sugar built on the binary machine code :-).

Entering machine codes in hex or octal is also syntactic sugar.

Paul

Mar 10 '06 #128

Walter Roberson

In article <11**********************@i39g2000cwa.googlegroups .com>,
<cs********@hotmail.com> wrote:

Hmm, so if I'm decrementing a divisior, and branching off somewhere
else before the actual divide instruction if the would be divisor is
zero, and your precomputation of both branches traps a division by zero
that a literal execution of my program would never perform... whose
fault is that? I suspect that exception handling in speculative execution is a problem
that has been looked into.

[Getting off-topic for comp.lang.c...]

Yes. For example on the MIPS architecture, an exception state is
inserted into the flow, but the exception itself is not taken
unless the exception "graduates"; the exception is supressed if
the conditional results turn out to be such that it was not needed.

In the MIPS IV instruction set, divide can be done as
"multiply by the reciprical", and it is not uncommon to schedule
the reciprical operation ahead of time, before the code has had
time to check whether the denominator is 0. The non-zeroness
is speculated so as to get a "head start" on the time-consuming
division operation.

If I recall correctly, a fair bit of the multi-instruction pipelining
on MIPS is taken up with controls to handle speculation properly.
--
Prototypes are supertypes of their clones. -- maplesoft

Mar 10 '06 #129

Keith Thompson

"Ed Prochak" <ed*******@gmail.com> writes:

Keith Thompson wrote:

[...]

Whether C is a "universal assembler" is an entirely separate question
from whether C is "good", or "better" than something else, or close to
some optimal balance.

As I understand the term, an assembly language is a symbolic language
in which the elements of the language map one-to-one (or nearly so)
onto machine-level instructions.

The one-to-one mapping is broken for a macro assembler. Often there are
macros defined for subroutine entry and exit, loops and other things.
the only difference is that everyone defines their own macros, while in
C everybody uses the same "macros".

There's a continuum from raw machine language to very high-level
languages. Macro assembler is only a very small step up from
non-macro assembler. C is a *much* bigger step up from that. Some C
constructs may happen to map to single instructions for *some*
compiler/CPU combinations; they might map to multiple instructions, or
even none, for others. An assignment statement might copy a single
scalar value (integer, floating-point, or pointer) -- or it might copy
an entire structure; the C code looks the same, but the machine code
is radically different.

Using entirely arbitrary units of high-level-ness, I'd call machine
language close to 0, assembly language 10, macro assembler 15, and C
about 50. It might be useful to have something around 35 or so.
(This is, of course, mostly meaningless.)

Assembly language is usually untyped; types are specified by which
instruction you use, not by the types of the operands. C, by
contrast, associates types with variables. It often figures out how
to implement an operation based on the types of its operands, and many
operations are disallowed (assigning a floating-point value to a
pointer, for example).

I know the old joke that C combines the power of assembly language
with the flexibility of assembly language. I even think it's funny.
But it's not realistic, at least for C programmers who care about
writing good portable code.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Mar 10 '06 #130

cs_posting

Randy Howard wrote:

Hypothetical hardware that traps on *speculative* loads isn't broken by
design? I'd love to see the initialization sequences, or the task
switching code that has to make sure that all pointer values are valid
before they're loaded. No, scratch that. I've got better things to do.

This is a lot of whining about a specific problem that can
easily be remedied just by changing the loop construction. The
whole debate is pretty pointless in that context, unless you
have some religious reason to insist upon the method in the
original.

Have to remember thought that the C program in question is really a
back translation of approximating an assembly language original. If
the compiler builds the undefined pointer operation in the logical way,
it will be essentially the same as the hand written assembly language
code.

To then claim that speculative execution may cause an exception on the
result is to imply that the assembly language author, who has a pretty
good idea what assumptions he is making, must now add "speculative
loading of something I wasn't going to fetch" to the list of concerns.

Or were you thinking it was the compiler rather than processor logic
which was going to do the speculating?

Some pipelining tricks like the MIPS branch delay slot, are explicitly
part of the programming model, and you do have to manually handle them
when working with low level assembly code. But for the x86,
speculation is not...

Mar 11 '06 #131

Al Balmer

On 10 Mar 2006 17:05:01 -0800, cs********@hotmail.com wrote:

Have to remember thought that the C program in question is really a
back translation of approximating an assembly language original

Why? We've long since stopped discussing that program.

--
Al Balmer
Sun City, AZ

Mar 11 '06 #132

Albert van der Horst

In article <pa***************************@bsb.me.uk>,
Ben Bacarisse <be********@bsb.me.uk> wrote:
<SNIP>

I found Eric Sosman's "if (buffer + space_required > buffer_end) ..."
example more convincing, because I have seen that in programs that are
intended to be portable -- I am pretty sure I have written such things
myself in my younger days. Have you other more general examples of
dangerous assumptions that can sneak into code? A list of the "top 10
things you might be assuming" would be very interesting.
I find it not an example of implicit assumptions, just of bad coding.

There is no reason not to use the almost as readable, problemless

if ( buffer_end - buffer < space_required )

(It mentally reads as " if the number of elements that still
fit in the buffer is less then the amount of elements we require")
Ben.

Groetjes Albert

--
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- like all pyramid schemes -- ultimately falters.
al****@spenarnc.xs4all.nl http://home.hccnet.nl/a.w.m.van.der.horst

Mar 11 '06 #133

Jordan Abel

On 2006-03-11, Mark L Pappin <ml*@acm.org> wrote:

Andrew Reilly <an*************@areilly.bpc-users.org> writes:
On Wed, 08 Mar 2006 18:07:45 -0700, Al Balmer wrote:
I spent 25 years writing assembler.

Yeah, me to. Still do, regularly, on processors that will never
have a C compiler.

It's a little OT in c.l.c, but would you mind telling us just what
processors those are, that you can make such a guarantee? What
characteristics do they have that means they'll never have a C
compiler?

(A few I can recall having been proposed are: tiny amounts of storage,
Harvard architecture, and lack of programmer-accessible stack.
Funnily enough, these are characteristics possessed by chips for which
I compile C code every day.)

"tiny amounts of storage" may preclude a conforming hosted
implementation [which must support an object of 65535 bytes, and, of
course, a decent-sized library]

Mar 11 '06 #134

cs_posting

Albert van der Horst wrote:

I found Eric Sosman's "if (buffer + space_required > buffer_end) ..."
There is no reason not to use the almost as readable, problemless

if ( buffer_end - buffer < space_required )

If the computation in one version can be reduced to a constant by the
compiler, that would be a reason for using that version.

I can imainge a number of situations in which "bad coding" is the
result of a programmer with a mental idea of how to accomplish
something efficiently, trying to render that approach in C as if it
were assembly language. This is doubly likely on small systems...

The problem of course is that the compiler has it's own ideas about how
to be efficient.

And the standards committee may have very different ideas from the
would-be hand optimizing programmer about how you are supposed to
instruct the compiler in what you want!

Mar 12 '06 #135

CBFalconer

Jordan Abel wrote:

On 2006-03-11, Mark L Pappin <ml*@acm.org> wrote:
Andrew Reilly <an*************@areilly.bpc-users.org> writes:
On Wed, 08 Mar 2006 18:07:45 -0700, Al Balmer wrote: I spent 25 years writing assembler.

Yeah, me to. Still do, regularly, on processors that will never
have a C compiler.

It's a little OT in c.l.c, but would you mind telling us just what
processors those are, that you can make such a guarantee? What
characteristics do they have that means they'll never have a C
compiler?

(A few I can recall having been proposed are: tiny amounts of
storage, Harvard architecture, and lack of programmer-accessible
stack. Funnily enough, these are characteristics possessed by
chips for which I compile C code every day.)

"tiny amounts of storage" may preclude a conforming hosted
implementation [which must support an object of 65535 bytes, and,
of course, a decent-sized library]

These machines may well have C compilers, just not conforming
ones. The areas of non-conformance are likely to be:

object size available
floating point arithmetic
recursion depth (1 meaning no recursion)
availability of long and long-long.
availability of standard library

Once more, many programs can be written that are valid and portable
C, without requiring these abilities. The thing that needs to be
documented is use of possible non-standard substitutes for standard
features.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Mar 12 '06 #136

Jordan Abel

On 2006-03-11, CBFalconer <cb********@yahoo.com> wrote:

Jordan Abel wrote:
On 2006-03-11, Mark L Pappin <ml*@acm.org> wrote:
Andrew Reilly <an*************@areilly.bpc-users.org> writes:
On Wed, 08 Mar 2006 18:07:45 -0700, Al Balmer wrote:
> I spent 25 years writing assembler.

Yeah, me to. Still do, regularly, on processors that will never
have a C compiler.

It's a little OT in c.l.c, but would you mind telling us just what
processors those are, that you can make such a guarantee? What
characteristics do they have that means they'll never have a C
compiler?

(A few I can recall having been proposed are: tiny amounts of
storage, Harvard architecture, and lack of programmer-accessible
stack. Funnily enough, these are characteristics possessed by
chips for which I compile C code every day.)

"tiny amounts of storage" may preclude a conforming hosted
implementation [which must support an object of 65535 bytes, and,
of course, a decent-sized library]

These machines may well have C compilers, just not conforming
ones. The areas of non-conformance are likely to be:

object size available
floating point arithmetic
recursion depth (1 meaning no recursion)

availability of long and long-long. Or even the range of int itself. [People have claimed that "c"
implementations exist with 8-bit int]
availability of standard library An implementation can conform, as a freestanding implementation, with
VERY little of the standard library

I'd question how much of the other stuff can be gone and still
considered "c", though.

Once more, many programs can be written that are valid and portable
C, without requiring these abilities. The thing that needs to be
documented is use of possible non-standard substitutes for standard
features.

Mar 12 '06 #137

toby

cs********@hotmail.com wrote:

Albert van der Horst wrote:
I found Eric Sosman's "if (buffer + space_required > buffer_end) ..."
There is no reason not to use the almost as readable, problemless

if ( buffer_end - buffer < space_required )

If the computation in one version can be reduced to a constant by the
compiler, that would be a reason for using that version.

I can imainge a number of situations in which "bad coding" is the
result of a programmer with a mental idea of how to accomplish
something efficiently, trying to render that approach in C as if it
were assembly language.

This reminds me of C--: http://www.cminusminus.org/
This is doubly likely on small systems...

The problem of course is that the compiler has it's own ideas about how
to be efficient.

And the standards committee may have very different ideas from the
would-be hand optimizing programmer about how you are supposed to
instruct the compiler in what you want!

Mar 12 '06 #138

Hans-Bernhard Broeker

In comp.arch.embedded Jordan Abel <ra*******@gmail.com> wrote:

On 2006-03-11, CBFalconer <cb********@yahoo.com> wrote:
Jordan Abel wrote:
"tiny amounts of storage" may preclude a conforming hosted
implementation [which must support an object of 65535 bytes, and,
of course, a decent-sized library]

The "decent-sized" library for small embedded systems is easier to
meet than you may think. Be sure to look up "freestanding
implementation" in the library section of the C standard.
These machines may well have C compilers, just not conforming
ones. The areas of non-conformance are likely to be:
Following the only truly formal definition, "C compilers, just not
conforming ones" don't of course exist any more actually than there
are cars on the highway with "wheels, just not round ones."

[...] availability of long and long-long.

Long long is, for many such compilers, still a non-issue, because they
never claimed to have implemented C99. C89==C is still a widely
accepted assumption.
Or even the range of int itself. [People have claimed that "c"
implementations exist with 8-bit int]
Some people would probably also not shy away from claiming there are
18-wheeler trucks built with only two wheels. Even all things
considered, such people are blatantly wrong (because they've been lied
to by, or are, marketroids). There's no excuse for violating a strict
requirement of the standard just to match users' likely interpretation
of one of the helpful suggestions, like "int should be the natural
integer type of the target CPU".
I'd question how much of the other stuff can be gone and still
considered "c", though.

The rule of thumb should be one of practicality. Try hard to fit as
much of the language as you can on the CPU, but stay reasonable.
I.e. all features that lie in the intersection between the standard's
requirements and the platform's feature set, should be implemented
strictly by the C standard. For the rest, stay as close to the
standard as you can bear. And above all, *document* all such
deviations prominently.

In case of doubt, do what every clever politician would do: refuse to
decide, so the users can unload the decision and the responsibility on
their own shoulders. Implement both a "standard as standard can"
mode, and a "as much standard as we think makes sense" mode. E.g. on
8-bit or smaller CPUs C's default integer promotion rules can turn
into a serious liability; offering a flag to turn them on or off makes
sense.

--
Hans-Bernhard Broeker (br*****@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Mar 12 '06 #139

Allan Herriman

On 12 Mar 2006 12:43:56 +1000, Mark L Pappin <ml*@acm.org> wrote:

Jordan Abel <ra*******@gmail.com> writes:
On 2006-03-11, Mark L Pappin <ml*@acm.org> wrote:
Andrew Reilly <an*************@areilly.bpc-users.org> writes:
On Wed, 08 Mar 2006 18:07:45 -0700, Al Balmer wrote:
> I spent 25 years writing assembler.

Yeah, me to. Still do, regularly, on processors that will never
have a C compiler.

It's a little OT in c.l.c, but would you mind telling us just what
processors those are, that you can make such a guarantee? What
characteristics do they have that means they'll never have a C
compiler?

(A few I can recall having been proposed are: tiny amounts of storage,
Harvard architecture, and lack of programmer-accessible stack.
Funnily enough, these are characteristics possessed by chips for which
I compile C code every day.)

"tiny amounts of storage" may preclude a conforming hosted
implementation [which must support an object of 65535 bytes, and, of
course, a decent-sized library]

Got me there. Freestanding only, with maybe 16 bytes of RAM and 256
words of ROM - no 'malloc()' here, and 'printf()' can be problematic.

A freestanding C implementation does, however, still have a C
compiler. I'm curious which processors Andrew Reilly claims "will
never have a C compiler", and why he makes that claim.

I assume Dr. Reilly's referring to various DSP devices. These often
have features such as saturating arithmetic and bit-reversed
addressing.

Regards,
Allan

Mar 12 '06 #140

Jordan Abel

On 2006-03-12, Allan Herriman <al***********@hotmail.com> wrote:

On 12 Mar 2006 12:43:56 +1000, Mark L Pappin <ml*@acm.org> wrote:
Jordan Abel <ra*******@gmail.com> writes:
On 2006-03-11, Mark L Pappin <ml*@acm.org> wrote:
Andrew Reilly <an*************@areilly.bpc-users.org> writes:
> On Wed, 08 Mar 2006 18:07:45 -0700, Al Balmer wrote:
>> I spent 25 years writing assembler.

> Yeah, me to. Still do, regularly, on processors that will never
> have a C compiler.

It's a little OT in c.l.c, but would you mind telling us just what
processors those are, that you can make such a guarantee? What
characteristics do they have that means they'll never have a C
compiler?

(A few I can recall having been proposed are: tiny amounts of storage,
Harvard architecture, and lack of programmer-accessible stack.
Funnily enough, these are characteristics possessed by chips for which
I compile C code every day.)

"tiny amounts of storage" may preclude a conforming hosted
implementation [which must support an object of 65535 bytes, and, of
course, a decent-sized library]
Got me there. Freestanding only, with maybe 16 bytes of RAM and 256
words of ROM - no 'malloc()' here, and 'printf()' can be problematic.

A freestanding C implementation does, however, still have a C
compiler. I'm curious which processors Andrew Reilly claims "will
never have a C compiler", and why he makes that claim.

I assume Dr. Reilly's referring to various DSP devices. These often
have features such as saturating arithmetic

Overflow's undefined, so what's wrong here?
and bit-reversed addressing.
I don't know what that is, so I have no idea how it would affect the
ability for there to be a conforming implementation

Regards,
Allan

Mar 12 '06 #141

Guy Macon

Mark L Pappin wrote:

A freestanding C implementation does, however, still have a C
compiler. I'm curious which processors Andrew Reilly claims "will
never have a C compiler", and why he makes that claim.

I don't think anyone will ever write a C Compiler for the
Atmel MARC4 [ http://www.atmel.com/products/MARC4/ ].

--
Guy Macon
<http://www.guymacon.com/>

Mar 12 '06 #142

Andrew Reilly

On Fri, 10 Mar 2006 11:23:37 +0200, Paul Keinanen wrote:

On 9 Mar 2006 12:34:20 -0800, "Ed Prochak" <ed*******@gmail.com>
wrote:
Andrew Reilly wrote:

Yeah, me to. Still do, regularly, on processors that will never have a C
compiler. C is as close to a universal assembler as we've got at the
moment. It doesn't stick it's neck out too far, although a more
deliberately designed universal assembler would be a really good thing.
(It's on my list of things to do...)

Would a better universal assembler be more like assembler or more like
high level languages? I really think C hit very close to the optimal
balance.

I don't see much room for a "universal" assembler between C and a
traditional assembler, since the instruction sets can vary quite a
lot.

I think that a useful "universal assembler" would be something that had
the basic set of operators and types, all of which were well defined for a
particular machine model (flat data memory map, 2's compliment arithmetic,
etc.) It could have expressions, as long as the operator precedence was
rigorous enough so that you could absolutely know what the order of
evaluation would be, at coding time.

The two or three most painful things about assembly language programming
are register allocation and making up control-flow symbol names (in
assemblers that don't already have nice structured control flow
macros/pseudo-ops. Both of these can be included in a "universal
assembler", if you forgo some pure control for convenience: conventional
control structures, subroutine calls that follow common conventions. The
machine instruction sets of Java's JVM and C#'s CLR (?) avoid the register
name issue by being stack-based (and muck up the memory model by being
object-centric). Tao's VM is more nearly a plain 32-bit RISC model, but
with an infinite number of registers, which are managed by the "assembler".
(The third painful thing is instruction scheduling, in super-scalar or
VLIW machines of various sorts. That would probably want to be subsumed
by the language "compiler" too.)

A data model, a set of operators, control flow, a syntax for building
abstractions and domain-specific sub-languages. That could almost be C
right there, except that there are too many holes in the data model and
operator function, both to support old/strange hardware, and to
allow/support compiler optimization transformations. Java has tightened
up the model, but it's not a model of a "bare processor", it's a model of
an "object machine". I'd like the same kind of low-level language
definition, but with objects only built using the language's
meta-programming/macro features, rather than being the only way to do
things.

Just dreaming...

Cheers,

--
Andrew

Mar 12 '06 #143

Andrew Reilly

On Sun, 12 Mar 2006 00:34:53 +1000, Mark L Pappin wrote:

Andrew Reilly <an*************@areilly.bpc-users.org> writes:
On Wed, 08 Mar 2006 18:07:45 -0700, Al Balmer wrote:
I spent 25 years writing assembler.
Yeah, me to. Still do, regularly, on processors that will never
have a C compiler.

It's a little OT in c.l.c, but would you mind telling us just what
processors those are, that you can make such a guarantee? What
characteristics do they have that means they'll never have a C
compiler?

I'm thinking mainly of deeply embedded DSP processors, like those of the
TI TAS3000 family, or Analog Devices Sigma DSPs, or any of several
similar-scale engines from several Japanese manufacturers.

Small memory, sure. Strange word lengths (not really that much of a
problem for C, admittedly). Some of these things don't have pointers in
the usual sense, let alone subroutine call stacks. Their arithmetic
usually doesn't match C's (integer only, usually with saturation on
overflow, freqently with different word lengths for data, coefficient and
result.
(A few I can recall having been proposed are: tiny amounts of storage,
Harvard architecture, and lack of programmer-accessible stack. Funnily
enough, these are characteristics possessed by chips for which I compile
C code every day.)

Apart from the reasons that I mentioned, the biggest one is simply utility
and man-power. No-one is building C compilers for these things because
no-one could or would use one if it existed: the hardware is tuned to do a
particular class of (fairly simple) thing, and that's easy enough to code
up in assembler. Easier than figuring out how to write a C compiler for
it, anyway.

Cheers,

--
Andrew

Mar 12 '06 #144

Michael N. Moran

Jordan Abel wrote:

and, of course, a decent-sized library]

Off topic? Yes. But, I it bothers me when we confuse the
language with the supporting libraries.
--
Michael N. Moran (h) 770 516 7918
5009 Old Field Ct. (c) 678 521 5460
Kennesaw, GA, USA 30144 http://mnmoran.org

"So often times it happens, that we live our lives in chains
and we never even know we have the key."
The Eagles, "Already Gone"

The Beatles were wrong: 1 & 1 & 1 is 1

Mar 12 '06 #145

James Dow Allen

Dik T. Winter wrote:

In article <44***************@yahoo.com> cb********@maineline.net writes:
...
> The reason to use a subtractor is that that guarantees than -0
> never appears in the results. This allows using that value for
> such things as traps, uninitialized, etc.

This was however not done on any of the 1's complement machines I have
worked with. The +0 preferent machines (CDC) just did not generate it
in general. ..

The CDC 6400 and 6600 used "complement recomplement arithmetic."
The only way to get -0 as the result of integer arithmetic was to start
with
-0, i.e. (-0)+(-0) = -0 and (-0)-(+0) = -0.

The normal way to copy a B-register was to write, e.g. SB6, B5
(IIRC) which generated the same machine opcode as SB6, B5+B0.
(B0 was an always-zero register.) Hence SB6, B5 was not guaranteed
to copy B5 exactly! Instead SB6, B5-B0 should be coded.

Since there were fast tests for negative and zero, using both +0 and -0
as flags for testing was a micro-optimization sometimes useful for
speed.

James Dow Allen

Mar 12 '06 #146

David Holland

On 2006-03-09, James Dow Allen <jd*********@yahoo.com> wrote:

David Holland wrote:
On 2006-03-07, James Dow Allen <jd*********@yahoo.com> wrote:
[...] but I'm sincerely curious whether anyone knows of an *actual*
environment where p == s will ever be false after (p = s-1; p++).

The problem is that evaluating s-1 might cause an underflow and a
trap, and then you won't even reach the comparison. You don't
necessarily have to dereference an invalid pointer to get a trap.

You might hit this behavior on any segmented architecture (e.g.,
80286, or 80386+ with segments on) ...

I'm certainly no x86 expert. Can you show or point to the output
of any C compiler which causes an "underflow trap" in this case?

Have you tried bounds-checking gcc?

I don't think I've ever myself seen a compiler that targeted 286
protected mode. Maybe some of the early DOS-extender compilers did,
before everyone switched to 386+. If you can find one and set it to
generate code for some kind of "huge" memory model (supporting
individual objects more than 64K in size) I'd expect it to trap if you
picked a suitable location for `s' to point to.

That assumes you can find a 286 to run it on, too.

Otherwise, I don't know of any, but I'm hardly an expert on strange
platforms.

(Note: Coherent was a 286 protected mode platform, but it only
supported the "small" memory model... and it had a K&R-only compiler,
so it's not a viable example.)

--
- David A. Holland
(the above address works if unscrambled but isn't checked often)

Mar 13 '06 #147

Richard Bos

"Michael N. Moran" <mi**@mnmoran.org> wrote:

Jordan Abel wrote:
and, of course, a decent-sized library]

Off topic? Yes. But, I it bothers me when we confuse the
language with the supporting libraries.

As long as we're talking about C, they are part of the same Standard.
You can get a freestanding implementation which is allowed not to
implement much of the Standard, but that doesn't make those parts any
less C.

Richard

Mar 13 '06 #148

Richard Bos

Jordan Abel <ra*******@gmail.com> wrote:

On 2006-03-12, Allan Herriman <al***********@hotmail.com> wrote:
On 12 Mar 2006 12:43:56 +1000, Mark L Pappin <ml*@acm.org> wrote:
A freestanding C implementation does, however, still have a C
compiler. I'm curious which processors Andrew Reilly claims "will
never have a C compiler", and why he makes that claim.

I assume Dr. Reilly's referring to various DSP devices. These often
have features such as saturating arithmetic

Overflow's undefined, so what's wrong here?

If the saturation also occurs for unsigned integers, you're going to
have a pain of a time implementing C's wraparound-on-unsigned-overflow
behaviour.

Richard

Mar 13 '06 #149

Allan Herriman

On 12 Mar 2006 07:14:52 GMT, Jordan Abel <ra*******@gmail.com> wrote:

On 2006-03-12, Allan Herriman <al***********@hotmail.com> wrote:

[snippage]

I assume Dr. Reilly's referring to various DSP devices. These often
have features such as saturating arithmetic

Overflow's undefined, so what's wrong here?
and bit-reversed addressing.

I don't know what that is, so I have no idea how it would affect the
ability for there to be a conforming implementation

http://www.google.com.au/search?q=bi...sed+addressing
(First hit)

Bit reversed addressing is just another addressing mode that's simple
to access from assembly language. Do you think that this could be
generated by a C compiler?

Bit reversed addressing is used in the calculation of an FFT (and
almost nowhere else).

Regards,
Allan

Mar 13 '06 #150