473,842 Members | 1,958 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Making Fatal Hidden Assumptions

We often find hidden, and totally unnecessary, assumptions being
made in code. The following leans heavily on one particular
example, which happens to be in C. However similar things can (and
do) occur in any language.

These assumptions are generally made because of familiarity with
the language. As a non-code example, consider the idea that the
faulty code is written by blackguards bent on foulling the
language. The term blackguards is not in favor these days, and for
good reason. However, the older you are, the more likely you are
to have used it since childhood, and to use it again, barring
specific thought on the subject. The same type of thing applies to
writing code.

I hope, with this little monograph, to encourage people to examine
some hidden assumptions they are making in their code. As ever, in
dealing with C, the reference standard is the ISO C standard.
Versions can be found in text and pdf format, by searching for N869
and N1124. [1] The latter does not have a text version, but is
more up-to-date.

We will always have innocent appearing code with these kinds of
assumptions built-in. However it would be wise to annotate such
code to make the assumptions explicit, which can avoid a great deal
of agony when the code is reused under other systems.

In the following example, the code is as downloaded from the
referenced URL, and the comments are entirely mine, including the
'every 5' linenumber references.

/* Making fatal hidden assumptions */
/* Paul Hsiehs version of strlen.
http://www.azillionmonkeys.com/qed/asmexample.html

Some sneaky hidden assumptions here:
1. p = s - 1 is valid. Not guaranteed. Careless coding.
2. cast (int) p is meaningful. Not guaranteed.
3. Use of 2's complement arithmetic.
4. ints have no trap representations or hidden bits.
5. 4 == sizeof(int) && 8 == CHAR_BIT.
6. size_t is actually int.
7. sizeof(int) is a power of 2.
8. int alignment depends on a zeroed bit field.

Since strlen is normally supplied by the system, the system
designer can guarantee all but item 1. Otherwise this is
not portable. Item 1 can probably be beaten by suitable
code reorganization to avoid the initial p = s - 1. This
is a serious bug which, for example, can cause segfaults
on many systems. It is most likely to foul when (int)s
has the value 0, and is meaningful.

He fails to make the valid assumption: 1 == sizeof(char).
*/

#define hasNulByte(x) ((x - 0x01010101) & ~x & 0x80808080)
#define SW (sizeof (int) / sizeof (char))

int xstrlen (const char *s) {
const char *p; /* 5 */
int d;

p = s - 1;
do {
p++; /* 10 */
if ((((int) p) & (SW - 1)) == 0) {
do {
d = *((int *) p);
p += SW;
} while (!hasNulByte (d)); /* 15 */
p -= SW;
}
} while (*p != 0);
return p - s;
} /* 20 */

Let us start with line 1! The constants appear to require that
sizeof(int) be 4, and that CHAR_BIT be precisely 8. I haven't
really looked too closely, and it is possible that the ~x term
allows for larger sizeof(int), but nothing allows for larger
CHAR_BIT. A further hidden assumption is that there are no trap
values in the representation of an int. Its functioning is
doubtful when sizeof(int) is less that 4. At the least it will
force promotion to long, which will seriously affect the speed.

This is an ingenious and speedy way of detecting a zero byte within
an int, provided the preconditions are met. There is nothing wrong
with it, PROVIDED we know when it is valid.

In line 2 we have the confusing use of sizeof(char), which is 1 by
definition. This just serves to obscure the fact that SW is
actually sizeof(int) later. No hidden assumptions have been made
here, but the usage helps to conceal later assumptions.

Line 4. Since this is intended to replace the systems strlen()
function, it would seem advantageous to use the appropriate
signature for the function. In particular strlen returns a size_t,
not an int. size_t is always unsigned.

In line 8 we come to a biggie. The standard specifically does not
guarantee the action of a pointer below an object. The only real
purpose of this statement is to compensate for the initial
increment in line 10. This can be avoided by rearrangement of the
code, which will then let the routine function where the
assumptions are valid. This is the only real error in the code
that I see.

In line 11 we have several hidden assumptions. The first is that
the cast of a pointer to an int is valid. This is never
guaranteed. A pointer can be much larger than an int, and may have
all sorts of non-integer like information embedded, such as segment
id. If sizeof(int) is less than 4 the validity of this is even
less likely.

Then we come to the purpose of the statement, which is to discover
if the pointer is suitably aligned for an int. It does this by
bit-anding with SW-1, which is the concealed sizeof(int)-1. This
won't be very useful if sizeof(int) is, say, 3 or any other
non-poweroftwo. In addition, it assumes that an aligned pointer
will have those bits zero. While this last is very likely in
todays systems, it is still an assumption. The system designer is
entitled to assume this, but user code is not.

Line 13 again uses the unwarranted cast of a pointer to an int.
This enables the use of the already suspicious macro hasNulByte in
line 15.

If all these assumptions are correct, line 19 finally calculates a
pointer difference (which is valid, and of type size_t or ssize_t,
but will always fit into a size_t). It then does a concealed cast
of this into an int, which could cause undefined or implementation
defined behaviour if the value exceeds what will fit into an int.
This one is also unnecessary, since it is trivial to define the
return type as size_t and guarantee success.

I haven't even mentioned the assumption of 2's complement
arithmetic, which I believe to be embedded in the hasNulByte
macro. I haven't bothered to think this out.

Would you believe that so many hidden assumptions can be embedded
in such innocent looking code? The sneaky thing is that the code
appears trivially correct at first glance. This is the stuff that
Heisenbugs are made of. Yet use of such code is fairly safe if we
are aware of those hidden assumptions.

I have cross-posted this without setting follow-ups, because I
believe that discussion will be valid in all the newsgroups posted.

[1] The draft C standards can be found at:
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/>

--
"If you want to post a followup via groups.google.c om, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell. org/google/>
Also see <http://www.safalra.com/special/googlegroupsrep ly/>

Mar 6 '06
351 13203

Dik T. Winter wrote:
In article <dv**********@c anopus.cc.umani toba.ca> ro******@ibd.nr c-cnrc.gc.ca (Walter Roberson) writes:
> In article <Iw********@cwi .nl>, Dik T. Winter <Di********@cwi .nl> wrote:
> >In article <11************ **********@z34g 2000cwc.googleg roups.com> "Ed Prochak" <ed*******@gmai l.com> writes:

>
> > > I would agree that if an assembler must be a one-to-one mapping from
> > > source line to opcode, then C doesn't fit. I just don't agree with that
> > > definition of assembler.

>
> >On the other hand for every machine instruction there should be an
> >construct in the assembler to get that instruction. With that in
> >mind C doesn't fit either.

>
> Over the years, there have been notable cases of "hidden" machine
> instructions -- undocumented instructions, quite possibly with no
> assembler construct (at least not in any publically available
> assembler.)


Indeed. But even when we look at the published instructions C falls
short of providing a construct for every one. Where is the C construct
to do a multply step available in quite a few early RISC machines?
Note also that in assembler you can access the special bits indicating
overflow and whatever (if they are available on the machine). How to
do that in C?


Well you cannot, but those processors did not even exist when C was
created. So those features didn't make it. To some degree, C is more of
a PDP assembler. But I wonder if there is a way to write it in C that
the compiler can recognize. You would only care IF you are targetting
such a specific RISC processor, in which case, your thinking shades
closer to the approach an assembler programmer takes than an HLL
programmer takes.

See the difference? It is not so much that C gives you absolute control
of the hardware, but approaching many programming tasks in C from the
view of an assembly programmer makes your code better. Then when the
need is more abstract, c still works for higher level programming.

I never said C was not an HLL.
Ed

great discussion BTW.

Mar 14 '06 #171

Dik T. Winter wrote:
In article <11************ *********@j33g2 000cwa.googlegr oups.com> "Ed Prochak" <ed*******@gmai l.com> writes:
...
> C is an assembler because
>
> -- It doesn't impose strict data type checking, especially between
> integers and pointers.
> (While there has been some discussion about cases where conversions
> back and forth between them can fail, for most machines it works. Good
> thing too or some OS's would be written in some other language.)
It does impose restrictions. You have to put in a cast.


not if you can live with a WARNING message.
> -- datatype sizes are dependent on the underlying hardware. While a lot
> of modern hardware has formed around the common 8bit char, and
> multiples of 16 for int types (and recent C standards have started to
> impose these standards), C still supports machines that used 9bit char
> and 18bit and 36bit integers. This was the most frustrating thing for
> me when I first learned C. It forces precisely some of the hidden
> assumptions of this topic.
The same is true for Pascal.


guess I'm getting forgetful in my old age. (haven't touched PASCAL in
over 10tears). I thought PASCAL defined fixed ranges for the datatypes
like integers. I guess I didn't port enough PASCAL applications to see
the difference. (I could have swore you'd get an error on X=32767+2 ;
> -- C allows for easy "compilatio n" in that you could do it in one pass
> of the source code (well two counting the preprocessor run). The
> original C compiler was written in C so that bootstrapping onto a new
> machine required only a simple easily written initial compiler to
> compile the real compiler.
What is valid for C is also valid for Pascal. But not all is valid.
Code generation, for instance, is *not* part of the compiler. Without
a back-end that translates the code generated by the compiler to actual
machine instructions, you are still nowhere. So your simple easily written
initial compiler was not an easily written initial compiler. You have
to consider what you use as assembler as backend. In contrast the very
first Pascal compiler was really single-pass and generated machine code
on-the-fly, without backend. Everything ready to run. (BTW, on that
machine the linking stage was almost never pre-performed, it took only
a few milli-seconds.)


Yes PASCAL and P-code, you have a point there, but I'm not sure it is
in your favor. Due to P-code, PASCAL is abstracted even above the
native assembler for the target platform. so we have
C->native assembler->program on native hardware
Pascal->program in p-code-> runs in p-code interpreter
So you have even less reason to think of the native hardware when
programming in PASCAL. This makes it more abstract and a higher HLL
than is C.
> -- original versions of the C compiler did not have passes like
> data-flow optimizers. So optimization was left to the programmer. Hence
> things like x++ and register storage became part of the language.
> Perhaps they are not needed now, but dropping these features from the
> language will nearly make it a differrent language. I do not know of
> any other HLL that has register, but about every assembler allows
> access to the registers under programmer control.
In C "register" is only a suggestion, it is not necessary to follow it.


The point is why even include this feature? It is because programming
you tend to think closer to the hardware than you do in PASCAL. Even
when I was doing some embedded graphics features for a product in
PASCAL, I don't think the CPU architecture ever entered my thoughts.
On the other hand, the very first Pascal compiler already did optimisation,
but not as a separte pass, but as part of the single pass it had. You
could tweek quite a bit with it when you had access to the source of the
compiler.


So the PASCAL compiler was more advanced than the C compiler of the
time. DO you think maybe it was due to PASCAL being a more abstract
HLL than C might have had an effect here? (more likely though, it was
PASCAL predated C, at least in widespread use.)

The difference is, IMHO, that PASCAL is a more abstract HLL, letting
the programmer think more about the application. While C is a HLL, but
with features that force/allow the programming to consider the
underlying processor. (in the context of this topic, "force" is the
word.)

Ed

Mar 14 '06 #172

"Ed Prochak" <ed*******@gmai l.com> wrote in message
news:11******** **************@ i39g2000cwa.goo glegroups.com.. .

Keith Thompson wrote:
"Ed Prochak" <ed*******@gmai l.com> writes:
[...]
C is an assembler because
I should have phrased this: C is LIKE an assembler.
Or like this: C has low level features similar to an assembler.
How about a bitsliced machine that uses only 6bit integers?
I thought those died out. Were any those CPU's actually used in a computer
sufficiently advanced enough to compile C? As I recall, they were only used
as custom DSP's in the pre-DSP era, or as custom D/A convertors, etc...
Forgive my memory,but is it PL/1 or ADA that lets the programmer define
what integer type he wants. Syntax was something like
INTEGER*12 X
Probably ADA, I don't recall that in PL/1.
a big characteristic of assembler is that it is a simple language.
C is also a very simple language. Other HLLs are simple too, but the
simplicity combined with other characteristics suggest to me an
assembler feel to the language.
Again, C has low level features. I always use structured code when I
program in C. But, C allows coding in many unstructured ways (rumor: to
allow program porting of Fortran to C). But, it is a high level language.
I don't have to keep track of what data is in what register, or stack, or
memory, like when I coded in 6502 or when I code in IA-32. I don't need to
move data around to between registers, stack or memory, it's done for me in
C. I just need the name of the data or a named pointer to the data. I
don't need to setup prolog/epilog code. I don't need to calculate offsets
for branching instructions. etc...
No I was talking about the original motivation for the design of the
language. It was designed to exploit the register increment on DEC
processors. in the right context, (e.g. y=x++;) the increment doesn't
even become a separate instruction, as I mentioned in another post.
Common C myth, but untrue:
http://cm.bell-labs.com/cm/cs/who/dmr/chist.html

"Thompson went a step further by inventing the ++ and -- operators, which
increment or decrement; their prefix or postfix position determines whether
the alteration occurs before or after noting the value of the operand. They
were not in the earliest versions of B, but appeared along the way. People
often guess that they were created to use the auto-increment and
auto-decrement address modes provided by the DEC PDP-11 on which C and Unix
first became popular. This is historically impossible, since there was no
PDP-11 when B was developed. The PDP-7, however, did have a few
`auto-increment' memory cells, with the property that an indirect memory
reference through them incremented the cell. This feature probably suggested
such operators to Thompson;"
C doesn't allow access to specific registers (at least not portably).


But other HLL's don't even have register storage.


True. Many HLL's don't have pointers either which is a key attraction, for
me, to any language.

BTW, I've heard one of the Pascal standards added pointers...
Here's what the C standard says about the "register" specifier:

A declaration of an identifier for an object with storage-class
specifier register suggests that access to the object be as fast
as possible. The extent to which such suggestions are effective is
implementation-defined.


I know that it is just a suggestion. The point is Why was it included
in the language at all? Initially it gave the programmer more control.

And there are a few restrictions; for example, you can't take the
address of a register-qualified object.


Which makes sense to an assembler programmer, but not to a typical HLL
programmer.


True. This wouldn't or shouldn't make any sense to someone who doesn't
understand assembly.
lets put it this way. there is a gradient scale, from pure digits of
machine language (e.g., programming obcodes in binary is closer to the
hardware than using octal or hex)
at the lowest end and moving up past assebmler to higher and higher
levels of abstraction away from the hardware. On that scale, I put C
much closer to assembler than any other HLL I know. here's some samples
PERL, BASH, SQL
C++, JAVA
PASCAL, FORTRAN, COBOL
C
assembler
HEX opcodes
binary opcodes
digital voltages in the real hardware.


Based on my experiences, I'd list like so:

C, PL/1, FORTH
BASIC
PASCAL,FORTRAN
C (lowlevel), FORTH (lowlevel)
IA-32, 6502 assembler
HEX opcodes

My ranking of FORTRAN is highly debatable. It is strong in math, but
seriously primitive in a number of major programming areas, like string
processing. Yes, PASCAL is less useful that BASIC. BASIC had stronger, by
comparison, string processing abilities. Also, I don't see how you can
place Java above C, since it is a stripped down, pointer safe version of C.
PASCAL, (until) they added pointers, was basically a stripped down, pointer
safe version of PL/1.
Rod Pemberton
Mar 14 '06 #173

Richard Bos wrote:
mw*****@newsguy .com (Michael Wojcik) wrote:
"Ed Prochak" <ed*******@gmai l.com> writes:
Keith Thompson wrote:

> Most assembly languages are, of
> course, machine-specific, since they directly specify the actual
> instructions.

x++; in most machines can map to a single instruction


Sure, if "most machines" excludes load/store architectures, and
machines which cannot operate directly on an object of the size of
whatever x happens to be, and all the cases where "x" is a pointer to
an object of a size other than the machine's addressing granularity...

The presence in C of syntactic sugar for certain simple operations
like "x++" doesn't support the claim that C is somehow akin to
assembler in any case. One distinguishing feature of assembler is
a *lack* of syntactic sugar.


One can make a case that in at least one aspect, C is _less_ like
assembler than many other high-level languages: for. Most languages only
support for loops looking like FOR n=1 TO 13 STEP 2: NEXT or for n:=10
downto 1 do. Such loops are often easily caught in one, or a few, simple
machine instructions. Now try the same with for (node=root; node;
node=node->next) or for (i=1, j=10; x[i] && y[j]; i++, j--).

Richard


True, but
for( <p1>; <p2>; p3> )
looks like a MACRO to me. It's up to the programmer to optimize
specific instances when you program in assembly. That's why there's
also while()

only difference is the loop block. In a macro assembler IOW you still
have to code that
goto top_of_for_loop 2
at the end of the loop, either literally or in some endfor macro.

Ed

Mar 14 '06 #174
On 14 Mar 2006 10:52:28 -0800, "Ed Prochak" <ed*******@gmai l.com>
wrote:
Incorrect. Attempting to assign an integer value to a pointer object,
or vice versa, is a constraint violation, requiring a diagnostic.


a Warning.


The difference between "error" and "warning" is usually unimportant,
and for some compilers, seems arbitrary.

Violations should be fixed, no matter how nicely the compiler tells
you about them.

--
Al Balmer
Sun City, AZ
Mar 14 '06 #175
"Ed Prochak" <ed*******@gmai l.com> writes:
Keith Thompson wrote:
"Ed Prochak" <ed*******@gmai l.com> writes:
[...]
> C is an assembler because
I should have phrased this: C is LIKE an assembler.
And a raven is like a writing desk.
<http://www.straightdop e.com/classics/a5_266.html>

"C is an assembler" and "C is like an assembler" are two *very*
different statements. The latter is obviously true, given a
sufficiently loose interpretation of "like".
> -- It doesn't impose strict data type checking, especially between
> integers and pointers.
> (While there has been some discussion about cases where conversions
> back and forth between them can fail, for most machines it works. Good
> thing too or some OS's would be written in some other language.)


Incorrect. Attempting to assign an integer value to a pointer object,
or vice versa, is a constraint violation, requiring a diagnostic.


a Warning.


The C standard doesn't distinguish between different kinds of
diagnostics, and it doesn't require any program to be rejected by the
compiler (unless it has a "#error" directive). This allows for
language extensions; an implementation is free to interpret an
otherwise illegal construct as it likes, as long as it produces some
kind of diagnostic in conforming mode. It also doesn't require the
diagnostic to have any particular form, or to be clearly associated
with the point at which the error occurred. (Those are
quality-of-implementation issues.)

This looseness of requirements for diagnostics isn't a point of
similarity between C and assemblers; on the contrary, in every
assembler I've seen, misspelling the name of an opcode or using
incorrect punctuation for an addressing mode results in an immediate
error message and failure of the assembler.
Integer and pointers can be converted back and forth only using a cast
(an explicit conversion operator). The result of such a conversion is
implementation-defined.

Even if this were correct, it certainly wouldn't make C an assembler.


To some degree you are right. It's actually pointer manipulation that
makes it closer to assembler.


C provides certain operations on certain types. Pointer arithmetic
happens to be something that can be done in most or all assemblers and
in C, but C places restrictions on pointer arithmetic that you won't
find in any assembler. For example, you can subtract one pointer from
another, but only if they're pointers to the same type; in a typical
assembler, pointer values don't even have types. Pointer arithmetic
is allowed only within the bounds of a single object (though
violations of this needn't be diagnosed; they cause undefined
behavior); pointer arithmetic in an assembler gives you whatever
result makes sense given the underlying address representation. C
says nothing about how pointers are represented, and arithmetic on
pointers is not defined in terms of ordinary integer arithmetic; in an
assembler, the representation of a pointer is exposed, and you'd
probably use the ordinary integer opertations to perform pointer
arithmetic.
> -- datatype sizes are dependent on the underlying hardware. While a lot
> of modern hardware has formed around the common 8bit char, and
> multiples of 16 for int types (and recent C standards have started to
> impose these standards), C still supports machines that used 9bit char
> and 18bit and 36bit integers. This was the most frustrating thing for
> me when I first learned C. It forces precisely some of the hidden
> assumptions of this topic.


I don't know what "recent C standards" you're referring to. C
requires CHAR_BIT to be at least 8; it can be larger. short and int
must be at least 16 bits, and long must be at least 32 bits. A
conforming implementation, even with the most current standard, could
have 9-bit char, 18-bit short, 36-bit int, and 72-bit long.


How about a bitsliced machine that uses only 6bit integers?


What about it? A conforming C implementation on such a machine must
have CHAR_BIT>=8, INT_MAX>=32768, LONG_MAX>=21474 83647, and so forth.
The compiler may have to do some extra work to implement this. (You
could certainly provide a non-conforming C implementation that
provides a 6-bit type called "int"; the C standard obviously places no
constraints on non-conforming implementations . I'd recommend calling
the resulting language something other than C, to avoid confusion.)

But this is a common feature of many high-level languages. Ada, for
example has an implementation-defined set of integer types, similar to
what C provides; I've never heard anyone claim that Ada is an
assembler.


Forgive my memory,but is it PL/1 or ADA that lets the programmer define
what integer type he wants. Syntax was something like
INTEGER*12 X
defined X as a 12 bit integer. (Note that such syntax is portable in
that on two different processors, you still know that the range of X is
+2048 to -2047
The point is a 16bit integer in ADA is always a 16bit integer and
writing
x=32768 +10
will always overflow in ADA, but it is dependent on the compiler and
processor in C. It can overflow, or it can succeed.


I'm not familiar with PL/I.

Ada (not ADA) has a predefined type called Integer. It can have other
predefined integer types such as Short_Integer, Long_Integer,
Long_Long_Integ er, and so forth. There are specific requirements on
the ranges of these types, quite similar to C's requirements for int,
short, long, etc. There's also a syntax for declaring a user-defined
type with a specified range:
type Integer_32 is range -2**31 .. 2**31-1;
This type will be implemented as one of the predefined integer types,
selected by the compiler to cover the requested range.

C99 has something similar, but not as elaborate: a set of typedefs in
<stdint.h> such as int32_t, intleast32_t, and so forth. Each of these
is implemented as one of the predefined integer types.
But my point on this was, you need to know your target processor in C
more than in a language like ADA. This puts a burden on the C
programmer closer to an assembler programmer on the same machine than
to a ADA programmer.


You can get just as "close to the metal" in Ada as you can in C. Or,
in both languages, you can write portable code that will work properly
regardless of the underlying hardware, as long as there's a conforming
implementation. C is lower-level than Ada, so it's there's a greater
bias in C to relatively low-level constructs and system dependencies,
but it's only a matter of degree. In this sense, C and Ada are far
more similar to each other than either is to any assembler I've ever
seen.

[...]
You're talking about an implementation, not the language.


a big characteristic of assembler is that it is a simple language.
C is also a very simple language. Other HLLs are simple too, but the
simplicity combined with other characteristics suggest to me an
assembler feel to the language.


If you're just saying there's an "assembler feel", I won't argue with
you -- except to say that, with the right mindset, you can write
portable code in C without thinking much about the underlying
hardware.

[...]
Again, you're talking about an implementation, not the language.


No I was talking about the original motivation for the design of the
language. It was designed to exploit the register increment on DEC
processors. in the right context, (e.g. y=x++;) the increment doesn't
even become a separate instruction, as I mentioned in another post.


The PDP-11 has predecrement and postincrement modes; it doesn't have
preincrement or postdecrement. And yet C provides all 4 combinations,
with no implied preference for the ones that happen to be
implementable as PDP-11 addressing modes. In any case, C's ancestry
goes back to the PDP-7, and to earlier languages (B and BCPL) that
predate the PDP-11.

[...]
Here's what the C standard says about the "register" specifier:

A declaration of an identifier for an object with storage-class
specifier register suggests that access to the object be as fast
as possible. The extent to which such suggestions are effective is
implementation-defined.


I know that it is just a suggestion. The point is Why was it included
in the language at all? Initially it gave the programmer more control.


Sure, but giving the programmer more control is hardly synonymous with
assembly language.
And there are a few restrictions; for example, you can't take the
address of a register-qualified object.


Which makes sense to an assembler programmer, but not to a typical HLL
programmer.


Sure, it's a low-level feature.
> So IMHO, C is a nice generic assembler. It fits nicely in the narrow
> world between hardware and applications. The fact that it is a decent
> application development language is a bonus. I like C, I use it often.
> Just realize it is a HLL with an assembler side too.


You've given a few examples that purport to demonstrate that C is an
assembler.

Try giving a definition of the word "assembler" . If the definition
applies to C (the language, not any particular implementation) , I'd
say it's likely to be a poor definition, but I'm willing to be
surprised.


lets put it this way. there is a gradient scale, from pure digits of
machine language (e.g., programming obcodes in binary is closer to the
hardware than using octal or hex)
at the lowest end and moving up past assebmler to higher and higher
levels of abstraction away from the hardware. On that scale, I put C
much closer to assembler than any other HLL I know. here's some samples
PERL, BASH, SQL
C++, JAVA
PASCAL, FORTRAN, COBOL
C
assembler
HEX opcodes
binary opcodes
digital voltages in the real hardware.


That seems like a reasonable scale (I might put Forth somewhere below
C). But you don't indicate the relative distances between the levels.
C is certainly closer to assembler than Pascal is, but I'd say that C
and Pascal are much closer to each other than either is to assembler.

You can write system-specific non-portable code in any language. In
assembler, you can *only* write system-specific non-portable code. In
C and everything above it, it's possible to write portable code that
will behave as specified on any system with a conforming
implementation, and a conforming implementation is possible on a very
wide variety of hardware. Based on that distinction, there's a
sizable gap between assembler and C.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Mar 14 '06 #176
"Rod Pemberton" <do*********@so rry.bitbuck.cmm > writes:
[...]
BTW, I've heard one of the Pascal standards added pointers...


<OT>Pascal has always had pointers.</OT>

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Mar 14 '06 #177
"Ed Prochak" <ed*******@gmai l.com> writes:
Dik T. Winter wrote:
In article <11************ *********@j33g2 000cwa.googlegr oups.com>
"Ed Prochak" <ed*******@gmai l.com> writes: ...
> C is an assembler because
>
> -- It doesn't impose strict data type checking, especially between
> integers and pointers.
> (While there has been some discussion about cases where conversions
> back and forth between them can fail, for most machines it works. Good
> thing too or some OS's would be written in some other language.)


It does impose restrictions. You have to put in a cast.


not if you can live with a WARNING message.


Assigning an integer value to a pointer object, or vice versa, without
an explicit conversion (cast operator) is a constraint violation. The
standard requires a diagnostic; it doesn't distinguish between
warnings and error messages. Once the diagnostic has been issued, the
compiler is free to reject the program. If the compiler chooses to
generate an executable anyway, the behavior is undefined (unless the
implementation specifically documents it).

An assignment without a cast, assuming the compiler accepts it (after
the required diagnostic) isn't even required to behave the same way as
the corresponding assignment with a cast -- though it's likely to do
so in real life.

C compilers commonly don't reject programs that violate this
particular constraint, because it's a common construct in pre-standard
C, but that's an attribute of the compiler not of the language as it's
now defined.

[...]

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Mar 14 '06 #178
On Tue, 14 Mar 2006 11:52:28 -0800, Ed Prochak wrote:
Keith Thompson wrote:
C doesn't allow access to specific registers (at least not portably).


But other HLL's don't even have register storage.

Here's what the C standard says about the "register" specifier:

A declaration of an identifier for an object with storage-class
specifier register suggests that access to the object be as fast
as possible. The extent to which such suggestions are effective is
implementation-defined.


I know that it is just a suggestion. The point is Why was it included
in the language at all? Initially it gave the programmer more control.


Anecdote:

About ten years ago I did a project involving an AT&T/WE DSP32C processor
that had a very original-feeling AT&T K&R C compiler. This compiler did
essentially no "optimizati on", that I could see. It didn't even do
automatic register spill or fill (other than saves and restores at
subroutine entry and exit, of course): normal "auto" local variables
existed entirely in the stack frame, and had to be accessed from there on
every use, and "register" local variables existed entirely in registers:
specify too many in any context and the code wouldn't compile.

A very different (and somewhat more laborious) experience than
programming with a modern compiler of, say, gcc vintage, but it was
actually pretty easy to get quite efficient code this way. That compiler
really was very much like a macro assembler with expression parsing.

[The C code that resulted was very much DSP32C-specific C code.
That's why a "universal assembler" would want a more abstract notion of
register variables that corresponds quite closely to that of modern C.]

Cheers,

--
Andrew

Mar 15 '06 #179
>> In article <11************ *********@j33g2 000cwa.googlegr oups.com> "Ed
Prochak" <ed*******@gmai l.com> writes:
[C] doesn't impose strict data type checking, especially between
integers and pointers.
Dik T. Winter wrote:
It does impose restrictions. You have to put in a cast.

In article <11************ *********@i40g2 000cwc.googlegr oups.com>
Ed Prochak <ed*******@gmai l.com> wrote:not if you can live with a WARNING message.


Funny, I get an "error" message:

% cat t.c
void *f(void) {
return 42;
}
% strictcc -O -c t.c
t.c: In function `f':
t.c:2: error: return makes pointer from integer without a cast
%

Is my compiler not a "C compiler"?

(I really do have a "strictcc" command, too. Some might regard it
as cheating, but it works.)

(Real C compilers really do differ as to which diagnostics are
"warnings" and which are "errors". In comp.lang.c in the last week
or two, we have seen some that "error out" on:

int *p;
float x;
...
p = x;

and some that accept it with a "warning".)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Mar 15 '06 #180

This thread has been closed and replies have been disabled. Please start a new discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.