473,842 Members | 1,933 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Making Fatal Hidden Assumptions

We often find hidden, and totally unnecessary, assumptions being
made in code. The following leans heavily on one particular
example, which happens to be in C. However similar things can (and
do) occur in any language.

These assumptions are generally made because of familiarity with
the language. As a non-code example, consider the idea that the
faulty code is written by blackguards bent on foulling the
language. The term blackguards is not in favor these days, and for
good reason. However, the older you are, the more likely you are
to have used it since childhood, and to use it again, barring
specific thought on the subject. The same type of thing applies to
writing code.

I hope, with this little monograph, to encourage people to examine
some hidden assumptions they are making in their code. As ever, in
dealing with C, the reference standard is the ISO C standard.
Versions can be found in text and pdf format, by searching for N869
and N1124. [1] The latter does not have a text version, but is
more up-to-date.

We will always have innocent appearing code with these kinds of
assumptions built-in. However it would be wise to annotate such
code to make the assumptions explicit, which can avoid a great deal
of agony when the code is reused under other systems.

In the following example, the code is as downloaded from the
referenced URL, and the comments are entirely mine, including the
'every 5' linenumber references.

/* Making fatal hidden assumptions */
/* Paul Hsiehs version of strlen.
http://www.azillionmonkeys.com/qed/asmexample.html

Some sneaky hidden assumptions here:
1. p = s - 1 is valid. Not guaranteed. Careless coding.
2. cast (int) p is meaningful. Not guaranteed.
3. Use of 2's complement arithmetic.
4. ints have no trap representations or hidden bits.
5. 4 == sizeof(int) && 8 == CHAR_BIT.
6. size_t is actually int.
7. sizeof(int) is a power of 2.
8. int alignment depends on a zeroed bit field.

Since strlen is normally supplied by the system, the system
designer can guarantee all but item 1. Otherwise this is
not portable. Item 1 can probably be beaten by suitable
code reorganization to avoid the initial p = s - 1. This
is a serious bug which, for example, can cause segfaults
on many systems. It is most likely to foul when (int)s
has the value 0, and is meaningful.

He fails to make the valid assumption: 1 == sizeof(char).
*/

#define hasNulByte(x) ((x - 0x01010101) & ~x & 0x80808080)
#define SW (sizeof (int) / sizeof (char))

int xstrlen (const char *s) {
const char *p; /* 5 */
int d;

p = s - 1;
do {
p++; /* 10 */
if ((((int) p) & (SW - 1)) == 0) {
do {
d = *((int *) p);
p += SW;
} while (!hasNulByte (d)); /* 15 */
p -= SW;
}
} while (*p != 0);
return p - s;
} /* 20 */

Let us start with line 1! The constants appear to require that
sizeof(int) be 4, and that CHAR_BIT be precisely 8. I haven't
really looked too closely, and it is possible that the ~x term
allows for larger sizeof(int), but nothing allows for larger
CHAR_BIT. A further hidden assumption is that there are no trap
values in the representation of an int. Its functioning is
doubtful when sizeof(int) is less that 4. At the least it will
force promotion to long, which will seriously affect the speed.

This is an ingenious and speedy way of detecting a zero byte within
an int, provided the preconditions are met. There is nothing wrong
with it, PROVIDED we know when it is valid.

In line 2 we have the confusing use of sizeof(char), which is 1 by
definition. This just serves to obscure the fact that SW is
actually sizeof(int) later. No hidden assumptions have been made
here, but the usage helps to conceal later assumptions.

Line 4. Since this is intended to replace the systems strlen()
function, it would seem advantageous to use the appropriate
signature for the function. In particular strlen returns a size_t,
not an int. size_t is always unsigned.

In line 8 we come to a biggie. The standard specifically does not
guarantee the action of a pointer below an object. The only real
purpose of this statement is to compensate for the initial
increment in line 10. This can be avoided by rearrangement of the
code, which will then let the routine function where the
assumptions are valid. This is the only real error in the code
that I see.

In line 11 we have several hidden assumptions. The first is that
the cast of a pointer to an int is valid. This is never
guaranteed. A pointer can be much larger than an int, and may have
all sorts of non-integer like information embedded, such as segment
id. If sizeof(int) is less than 4 the validity of this is even
less likely.

Then we come to the purpose of the statement, which is to discover
if the pointer is suitably aligned for an int. It does this by
bit-anding with SW-1, which is the concealed sizeof(int)-1. This
won't be very useful if sizeof(int) is, say, 3 or any other
non-poweroftwo. In addition, it assumes that an aligned pointer
will have those bits zero. While this last is very likely in
todays systems, it is still an assumption. The system designer is
entitled to assume this, but user code is not.

Line 13 again uses the unwarranted cast of a pointer to an int.
This enables the use of the already suspicious macro hasNulByte in
line 15.

If all these assumptions are correct, line 19 finally calculates a
pointer difference (which is valid, and of type size_t or ssize_t,
but will always fit into a size_t). It then does a concealed cast
of this into an int, which could cause undefined or implementation
defined behaviour if the value exceeds what will fit into an int.
This one is also unnecessary, since it is trivial to define the
return type as size_t and guarantee success.

I haven't even mentioned the assumption of 2's complement
arithmetic, which I believe to be embedded in the hasNulByte
macro. I haven't bothered to think this out.

Would you believe that so many hidden assumptions can be embedded
in such innocent looking code? The sneaky thing is that the code
appears trivially correct at first glance. This is the stuff that
Heisenbugs are made of. Yet use of such code is fairly safe if we
are aware of those hidden assumptions.

I have cross-posted this without setting follow-ups, because I
believe that discussion will be valid in all the newsgroups posted.

[1] The draft C standards can be found at:
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/>

--
"If you want to post a followup via groups.google.c om, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell. org/google/>
Also see <http://www.safalra.com/special/googlegroupsrep ly/>

Mar 6 '06
351 13203
* Richard Bos:
"Alf P. Steinbach" <al***@start.no > wrote:
* Richard Bos -> someone:
*Shrug* If you go about redefining terms to support your delusion that C
is a portable assembler, that's your problem, not that of C. Happily I haven't followed this debate, so I don't know what you folks,
good or not, think "portable assembler" means, nor how you can think a
language is a compiler (or whatever).

C was designed as a portable assembly language,


Complete nonsense.


This is a fallacy known as "Appeal to Ridicule". To quote one source:

<q>
This sort of "reasoning" is fallacious because mocking a claim does not
show that it is false. This is especially clear in the following
example: "1+1=2! That's the most ridiculous thing I have ever heard!"
</q>
Read what dmr himself writes about this:
<http://cm.bell-labs.com/cm/cs/who/dmr/chist.html>.
Why should I read that yet again? Quote what you think is relevant.

C was designed as a systems programming language for (then) a single,
new OS, _not_ a portable assembler. In fact, that article explicitly
states that a. one of the reasons that drove the evolution of C's
ancestors, and later C itself, was _not_ to have to use assembler and b.
portability was not yet a concern in the first days.


All that is correct and irrelevant. Why do or did you think it should
or would be relevant? Perhaps you're just objecting as best you can to
terminology that seems frightening to you, and even appears new to you;
perhaps <url: http://cr.yp.to/qhasm/20050129-portable.txt> can help?

You can find more information about the C language at e.g. Wikipedia,
<url: http://en.wikipedia.or g/wiki/C_programming_l anguage>.


Ah, well, yes, if you get your information from Wikipedia, where any
little fool with a complete misunderstandin g of his subject can, and
last time I looked at that page in fact _had_, turned an article into a
mix of some seeds of fact, some serious misconceptions, and quite a bit
of complete bollocks, then I'm afraid I cannot take your opinion on any
subject seriously, certainly not on C.


The above is several fallacies lumped into one spaghetti fallacy. But
the main top-level fallacy is known as "Poisoning the Well". Don't you
think you should perhaps restrict your usage of fallacious reasoning?

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Mar 17 '06 #241
Keith Thompson wrote:
"Alf P. Steinbach" <al***@start.no > writes:
* Keith Thompson:
"Alf P. Steinbach" <al***@start.no > writes:
* Keith Thompson:
> "Alf P. Steinbach" <al***@start.no > writes:
>> C was designed as a portable assembly language, the most successful so
>> far, so if the term has any practical meaning, then C is that meaning.
> Show us a definition of the term "assembly language". By
> "definition ", I mean something that unambiguously allows you to
> determine whether something is an assembly language or not. If you do
> so, I predict that either (1) your definition won't apply to C, or (2)
> your definition won't match any common understanding of the term
> "assembly language".
Sorry, that's bullshit.

However, show me a definition of bullshit (by "definition ", I mean
something that unambiguously allows you to determine whether something
is bullshit or not) and I'll readily retract that statement. ;-)
If you're not interested in having this discussion, just say so. First: we have now, satisfactorily to me, demonstrated that your
request on behalf of "us" for a really clear-cut definition of a
vaguely defined part of a term as used in contexts other than in that
term, had no bearing whatsoever on anything, other than as emotional
rhetoric.
If you can't define what a term means, then it is meaningless and so it
is pointless to use that term with reference to anything.
Even just the second word of that question, "us", almost blew up my
bs-meter!
Well, you don't need to be told what you mean (I assume) so it is
everyone else that needs to know in order to understand what you are
trying to say.
Then to your question, which AFAICS has nothing to do with the quoted
material:

I don't know what "the" discussion is, except it seems that some
people are being offended by having C likened to assembly language. I
No one has expressed any offence at the term that I've seen, they have
just disagreed.
think that inferiority complex could be an interesting background to
various discussions, if we just take pains to introduce that angle,
Suggesting that the people who disagree with you have an inferiority
complex, on the other hand, is insulting.
e.g. how C does not (seemingly) hold any advantage as a .NET language,
and will similarly lose terrain and become more of a specialist's
niche language when other such platforms come about, as they must.
But, you want to only discuss how the terminology evolved, or who
invented it, if it was? I'm no historian. But, one does not need to
be a historian to understand what it means: the meaning can be derived
logically.
All completely irrelevant to the discussion at hand.
Apparently I have failed utterly to make myself clear. I'll try one
last time.

You claim that C is, in some sense, an assembly language
(specifically, a "portable assembly language"). I'm assuming here
that any "portable assembly language" is an "assembly language".

I claim that C is not an "assembly language", though it may bear some
resemblance to one.


I'll give some attributes that I consider essential to an assembler, and
whether or not C has them:
- The ability to write Interrupt Service Routines
- Not possible in C since there is no mechanism to specify that a
routine is an ISR and some processors use different calling
conventions for IRSs.
- The ability to access the status flags of the processor, including
such flags as overflow flags
- Not possible in C, if a calculation overflows you invoke undefined
behaviour
- The ability to access IO ports on the processor
- Not possible in C. You can access memory mapped devices, but not IO
mapped devices.

I could go on.

<snip>
Consider that there is an inherent conflict beween "portable" and
traditional meanings of the term "assembly language".

[...]
Something must therefore give, rather radically, if the term "portable
assembly" is to be meaningful, if we are to make sense of what someone
means when he or she cries "portable assembly!".


I agree. It does not follow from this, though, that the term
"portable assembly language" is necessarily meaningful, or that it
applies to C even if it is meaningful.

If someone cries "portable assembly", I can either figure out what
they mean, or I can decide they're crying nonsense. I'm content to do
either or both.

So, finally, what do you mean by "assembly language"?


Indeed, since C does not come close to any definition I would use. Also,
what do you mean by "portable assembler".

BTW, I've done embedded programming, including writing interrupt service
routines, accessing IO ports (as opposed to memory mapped devices),
accessing memory mapped devices etc all in a few different extended
versions of Pascal. So from my perspective Pascal is about as close to
assembler as C is.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc
Mar 17 '06 #242
Alf P. Steinbach wrote:
* Keith Thompson:

You claim that C is, in some sense, an assembly language
Nope, as you know I haven't claimed that. Whenever you're referring to
a claim, be sure to quote that claim. Especially when your previous
attempt at making the claim-making stick has been shown as fallacious.

(specifically, a "portable assembly language").
I'm assuming here that any "portable assembly language" is an
"assembly language".


That assumption does not hold.


Then tell us what you *do* mean.
However, that was explained twice, in detail, in the posting you now
replied to, and yet here you are making the same nonsense assumption.
You have not explained what you mean by Portable Assembly Language
anywhere that I can see.
OK, example.

Example 1.

Cheery Unstoppable Keith: "How's your cycle, by the way?"
Puzzled Maria: "Fine, just fine?"
Cheery Unstoppable Keith: "So you don't share it with others?"
Puzzled Maria: "That happened five years ago, when we all
lived together. You want to borrow it?"
Cheery Unstoppable Keith: "Huh, you think I'm transsexual!?!"
Puzzled Maria: "/What/?"
Cheery Unstoppable Keith: "Don't try that on me! I know what you're
thinking! What you're /implying/!"
Puzzled Maria: "Uh oh... Keith, say, you'd like a beer or
something? I'll go fetch it. Right back!"

In this highly educational example, note that the "MotorCycle " that
Puzzled Maria is thinking of, is not a "Cycle" that Cheery Unstoppable
Keith is thinking of. Just chopping off the word "Motor" does not make
a "MotorCycle " into a "Cycle". Well, OK, it does, but not /that/ kind.

I have numbered this example in case more examples will be needed; then
the numbers may make it easier to refer unambigiously to the examples.
A closer example would be:
You: "Cats are ambulatory rocks"
Keith: "That doesn't make any sense. Cats are not any form of rock. What
do you mean be ambulatory rock?"
You: "I didn't say cats are rocks."
[snip]
There should be some definition of the phrase "assembly language".
Try <url: http://en.wikipedia.or g/wiki/Assembly_langua ge>; although not
a reference nor of reference-quality, it's good enough for any novice
level in-practice discussion or just learning what the term means very
roughly.


Many of us have been in the computer industry for many years. I spent
many years programming in various forms of assembly language. I also
posted else where in this thread a number of things that you can easily
do in one of those that you can't even come close to in C. No one
contradicted my assertions there that this meant C was no where near
being any form of assembly language.
If
C is an assembly language, then C should meet the requirements of that
definition. This line of reasoning seems obvious and unremarkable to
me; do you disagree?


No, I don't disagree; it's fine on its own, and it's irrelevant.


So you consider "portable assembly language" to have nothing to do with
"assembly language?" In that case I'll define "portable assembly
language" as the language spoken by workers in a mobile factory. Now we
can all go home since C clearly does not meet that definition.

Alternatively define what you meen by the term.
[...]
Consider that there is an inherent conflict beween "portable" and
traditional meanings of the term "assembly language".

[...]
Something must therefore give, rather radically, if the term "portable
assembly" is to be meaningful, if we are to make sense of what someone
means when he or she cries "portable assembly!".


I agree. It does not follow from this, though, that the term
"portable assembly language" is necessarily meaningful, or that it
applies to C even if it is meaningful.

If someone cries "portable assembly", I can either figure out what
they mean, or I can decide they're crying nonsense. I'm content to do
either or both.


Yes, you can either assume some meaning, in which case you'll end up
with essentially my analysis (perhaps differing in some detail), and


Or we will end up with a completely different result, as we seem to have
done.
will be able to understand e.g. that Wikedia article referred to earlier
in the thread, which used that (to you) troublesome term.
The Wiki article only says some people use the term, it makes no claim
as to either its correctness or meaning. Therefore, in this respect, the
Wiki article is correct beacause people do use the term with reference
to C. That does not make such people correct or tell us what they mean
by it.
Or, you can say, I regard this term as meaningless, I refuse to consider
any meaning, I want to be forever ignorant except as enlighened by some
of my chosen Holy Books (e.g. the C and perhaps C++ standards).


Flooglewobbit.

That term means you are talking complete rubbish. You can either accept
that or forever remain ignorant except as enlightened by some of your
chosen holy books.

At least, unlike you, I'm telling you what the term means.
So, finally, what do you mean by "assembly language"?


I think that should be clear since I've now used the term, but one
important difference from the Wikipedia definition is that I do regard
e.g. "CLR assembly language" (CIL, Common Intermediate Language) as an
assembly language, i.e. I include assembly languages for virtual
machines among the assembly languages, and I do not exclude assembly
languages that have some OO features, such as TASM had and CIL has.

However, note well again for the umpteenth time (some time it must sink
in, I hope) that that's irrelevant: I only answer it because you ask.


So in your world the definition of what a computer is has no relevance
to the meaning of the term "portable computer?"
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc
Mar 17 '06 #243
Flash Gordon wrote:

I'll give some attributes that I consider essential to an assembler, and
whether or not C has them:
- The ability to write Interrupt Service Routines
- Not possible in C since there is no mechanism to specify that a
routine is an ISR and some processors use different calling
conventions for IRSs.
Every C compiler I've used for the last 2exp(N) years (apart from
perhaps the ones for specific OSs like Windows, where I've simply not
looked) has had this facility. It's not C- it's an extension, and it's
very annoying that they are all different- but it's there.
- The ability to access IO ports on the processor
- Not possible in C. You can access memory mapped devices, but not IO
mapped devices.


Same as above.

Plus necessary constructs to direct unchanging objects to ROM, again all
compilers different.
But you're right (whoever said it), it's nowt like an assembler. It's
just a quick and dirty middle- level language, and to use it you really
need to know the hardware yoiu're operating on in many cases.

It seems that the little embedded stuff is considered simply below the
radar for the standards committees. Perhaps that's a good thing.

Paul Burke

Mar 17 '06 #244
* Flash Gordon:
Alf P. Steinbach wrote:
* Keith Thompson:

You claim that C is, in some sense, an assembly language
Nope, as you know I haven't claimed that. Whenever you're referring
to a claim, be sure to quote that claim. Especially when your
previous attempt at making the claim-making stick has been shown as
fallacious.

(specifically, a "portable assembly language").
I'm assuming here that any "portable assembly language" is an
"assembly language".


That assumption does not hold.


Then tell us what you *do* mean.


If you can quote whatever passage is mystifying to you, I'll try to
clear that up, but I can't clear up an unlimited amount of things: be
specific.
So in your world the definition of what a computer is has no relevance
to the meaning of the term "portable computer?"


Huh?

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Mar 17 '06 #245
* Flash Gordon:
[snipped anonymous babble]


What is the question?

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Mar 17 '06 #246
Flash Gordon wrote:
If you can't define what a term means, then it is meaningless
I don't think that's true, if only because you have to start with
/some/ terms that don't have definitions.

Meaning is prior to definition.
and so it is pointless to use that term with reference to anything.


Clarity and/in communication, that's the ticket.

--
Chris "sparqling" Dollin
"Who do you serve, and who do you trust?"
Mar 17 '06 #247
Paul Burke wrote:
Flash Gordon wrote:
I'll give some attributes that I consider essential to an assembler,
and whether or not C has them:
- The ability to write Interrupt Service Routines
- Not possible in C since there is no mechanism to specify that a
routine is an ISR and some processors use different calling
conventions for IRSs.
Every C compiler I've used for the last 2exp(N) years (apart from
perhaps the ones for specific OSs like Windows, where I've simply not
looked) has had this facility. It's not C- it's an extension, and it's
very annoying that they are all different- but it's there.
- The ability to access IO ports on the processor
- Not possible in C. You can access memory mapped devices, but not IO
mapped devices.


Same as above.

Plus necessary constructs to direct unchanging objects to ROM, again all
compilers different.


You snipped the part where I pointed out that you can do all this in the
various forms of extended Pascal I've used, so if you allow extensions
Pascal is just as low a level language as C ;-)
But you're right (whoever said it), it's nowt like an assembler. It's
just a quick and dirty middle- level language, and to use it you really
need to know the hardware yoiu're operating on in many cases.
Agreed. Despite my comment above, I would say that Pascal was a higher
level language than C.
It seems that the little embedded stuff is considered simply below the
radar for the standards committees. Perhaps that's a good thing.


Well, those odd little corners in my experience are only a small
fraction of the code, so I don't see it as a big issue.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc
Mar 17 '06 #248
Alf P. Steinbach wrote:
* Flash Gordon:
Alf P. Steinbach wrote:
* Keith Thompson:

You claim that C is, in some sense, an assembly language

Nope, as you know I haven't claimed that. Whenever you're referring
to a claim, be sure to quote that claim. Especially when your
previous attempt at making the claim-making stick has been shown as
fallacious.
(specifically, a "portable assembly language").
I'm assuming here that any "portable assembly language" is an
"assembly language".

That assumption does not hold.


Then tell us what you *do* mean.


If you can quote whatever passage is mystifying to you, I'll try to
clear that up, but I can't clear up an unlimited amount of things: be
specific.


What do you mean by "portable assembly language". A question you have
already been asked.
So in your world the definition of what a computer is has no relevance
to the meaning of the term "portable computer?"


Huh?


You claim the definition of "assembly language" has no bearing on the
definition of "portable assembly language". By this argument the
definition of "computer" should not have anything to do with the
definition of "portable computer."
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc
Mar 17 '06 #249
Alf P. Steinbach wrote:
* Flash Gordon:
> [snipped anonymous babble]

I don't know why you call it anonymous, since I am not posting
anonymously. Or if I am, it is about the worst attempt ever at being
anonymous.
What is the question?


I wasn't asking a question, I was refuting points you raised and
pointing out why C is not any form of assembler, including a portable
assembler. It's not my fault if you don't understand a lot of the things
I spent many years using assembler for, or the various things you cannot
do in C that you can easily do in assembler.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc
Mar 17 '06 #250

This thread has been closed and replies have been disabled. Please start a new discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.