Is C99 the final C?

In article <ln************@nuthaus.mib.org>, ks***@mib.org says...

qe*@pobox.com (Paul Hsieh) writes:
Sidney Cadot <si****@jigsaw.nl> wrote: [...]
* support for a "packed" attribute to structs, guaranteeing that no
padding occurs.

Indeed, this is something I use on the x86 all the time. The problem
is that on platforms like UltraSparc or Alpha, this will either
inevitably lead to BUS errors, or extremely slow performing code.

If instead, the preprocessor were a lot more functional, then you
could simply extract packed offsets from a list of declarations and
literally plug them in as offsets into a char[] and do the slow memcpy
operations yourself.

Obviously an implementation of packed structures is useless if it
leads to bus errors.

There's ample precedent in other languages (Pascal and Ada at least)
for packed structures. [...] You can't sensible take the address of
packed_obj.i. A function that takes an "int*" argument will likely die if
you give it a misaligned pointer (unless you want to allow _Packed as an
attribute for function arguments). The simplest approach would be to forbid
taking the address of a member of a packed structure (think of the members
as fat bit fields). [...]

Then what would be the point of even calling it a "struct"? This is what I am
saying -- it leads to bus errors because of the rest of the language concepts
like taking the address of any value that is stored in a memory location.
Another possibility (ugly but perhaps useful) is to make the address of a
member of a packed field yield a void*.
No -- the problem is with the BUS error itself. The C language doesn't need
*EVEN MORE* ways of creating UB with otherise acceptable syntax. This is more
than just ugly is very very anti-intuitive.

The right answer is to give such pointers a special attribute, like,
"_Unaligned" (or simply reuse "_Packed".) The compiler would then enforce type
safety in the following way: A non-"_Unaligned" pointer may not accept that
value of an "_Unaligned" pointer, but the other way around is not true.
Certain functions like memcpy and memmove would then be declared with these
"_Unaligned" decorators. But programmers could go ahead and use the decorator
themselves so that unaligned accesses could be propogated arbitrarily down a
call stack in a well-defined and programmer controlled way. This would
precisely encapsulate what the programmer is trying to do without allowing the
compiler to produce unexpected BUS errors. Attempts to address an unaligned
pointer will be caught at compile time -- the perfect solution.

* upgraded status of enum types (they are currently quite
interchangeable with ints); deprecation of implicit casts from
int to enum (perhaps supported by a mandatory compiler warning).

I agree. Enums, as far as I can tell, are almost useless from a
compiler assisted code integrity point of view because of the
automatic coercion between ints and enums. Its almost not worth the
bothering to ever using an enum for any reason because of it.

I don't think enums can be repaired without breaking tons of existing
code. And they are useful as currently defined for defining names for
a number of distinct integer values. If you want Pascal-like
enumeration types, you'd need a new construct -- but I think having
two distinct kinds of enumeration types would be too ugly for new
users.

How about another decorator? Like: enum _Strict ____ {...}; ? Basically the
language would not auto-convert such enums to ints at all without an explicit
cast. Once again, under programmer control.

* a clear statement concerning the minimal level of active function
calls invocations that an implementation needs to support.
Currently, recursive programs will stackfault at a certain point,
and this situation is not handled satisfactorily in the standard
(it is not adressed at all, that is), as far as I can tell.

That doesn't seem possible. The amount of "stack" that an
implementation might use for a given function is clearly not easy to
define. Better to just leave this loose.

Agreed. The limit on call depth is typically determined by the amount
of available memory, something a compiler implementer can't say much
about. You could sensibly add a call depth clause to the Translation
Limits section (C99 5.2.4.1); that would the [require the] implementation to
handle at least one program with a call depth of N, but wouldn't really
guarantee anything in general.

Well the problem with this is that then the *LINKER* would have to have
augmented to analyze the max relevant stack size of all functions in an object
and then assign the final stack according to a formula (N * maxstacksz) that
makes this work. It also kind of makes use of alloca impossible.
[...]
* a #define'd constant in stdio.h that gives the maximal number of
characters that a "%p" format specifier can emit. Likewise, for
other format specifiers such as "%d" and the like.

* a printf format specifier for printing numbers in base-2.
Ah -- the kludge request. Rather than adding format specifiers one at
a time, why not instead add in a way of being able to plug in
programmer-defined format specifiers? I think people in general would
like to use printf for printing out more than just the base types in a
collection of just a few formats defined at the whims of some 70s UNIX
hackers. Why not be able to print out your data structures, or
relevant parts of them as you see fit?

Well, you can do that with the "%s" specifier, as long as you've
defined a function that returns an image string for a value of your
type (with all the complications of functions returning dynamic
strings).

Who is going to free the memory allocated for this string? If its static, then
what happens when you try to printf two such items -- or just try to use it in
a multitasking environment in general?

* 'min' and 'max' operators (following gcc: ?< and ?>)

As I mentioned above, you might as well have operator overloading instead.

Most languages that provide operator overloading restrict it to
existing operator symbols.

Yeah well most languages have real string primitives and built-in array range
checking too. Somehow I don't think what has been done in *other languages*
has any serious bearing on what should be done in C. To reiterate my proposal:
A whole *GRAMMAR* of symbols for operators could be added all of which have no
default definition, but which *can be* defined by the programmer, with
semantics similar to C's function declaration.
[...] If you want "min" and "max" for int, there
aren't any spare operator symbols you can use. If you want to allow
overloading for arbitrary symbols (which some languages do), you'll
need to decide how and whether the user can define precedence for the
new operators.
Good point, but something as simple as "lowest precendence" and increasing in
the order in which they are declared seems fine enough. Or maybe inverted --
just play with those combinations to see what makes sense in practice. If
that's not good enough, then make the precedence level relative to another
operator at the time of declaration. For example:

int _Operator ?< after + (int x, int y) { /* max */
if (x > y) return x;
return y;
}

int _Operator ?> same ?< (int x, int y) { /* min */
if (x < y) return x;
return y;
}
[...]
Personally, I don't think it would be a good idea to have templates
in C, not even simple ones. This is bound to have quite complicated
semantics that I would not like to internalize.
Right -- this would just be making C into C++. Why not instead
dramatically improve the functionality of the preprocessor so that the
macro-like cobblings we put together in place of templates are
actually good for something? I've posted elsewhere about this, so I
won't go into details.

Hmm. I'm not sure that making the preprocessor *more* powerful is
such a good idea. It's too easy to abuse as it is [...]

A *LOT* of C is easy to abuse. If you're worried about programmer you are
working with's abuse of the preprocessor then that's an issue between you and
that programmer.
If you can improve the preprocessor without making it even more
dangerous, that's great. (I don't think I've see your proposal.)

My proposal is to add preprocessor-only scoped variables:

#define $c 1

The idea is that "$c" could never show up as such a symbol in the C source
after preprocessing is done. And in cases where such a $___ variable has not
been defined you could insert an instance specific generated variable such as:

$c -> __PREPROCINST_MD5_09839fe8d98798fe8978de98799cfe01 _c

so as to kind of put it into its own kind of "name-space" that is not really
"accessible" to the programmer. Where the MD5 obfuscation would come from a
source like: <filename><date,time><MD5(source)><the $varname> in an effort to
probabilistically avoid collisions across files (trust me, this is not as much
voodoo as you might think) in case some bozo turns this into a global
declaration.

The purpose is to allow for even more useful things like:

#for $c in #range(0,5)
printf ("Even: %d Odd: %d\n", 2*$c, 2*$c+1);
#endfor

/* #range(0,5) just expands to 0,1,2,3,4 and the #for loop works, kind of
python-like, just as you would expect. */

#define genStruct(name,#VARARGS) struct tag##name { #VARARGS };\
#for $c in #VARARGS
# define offsetof_##$c offsetof (tag##name, %c)
#endfor

/* Here the "\" is required to attach the #for, which then itself
has an implicit multi-line characteristic, so that the lines
up until the #endfor are sucked into the #define genStruct.
Also, #define's executed inside of a #for are repeatedly
executed for each iteration */

#define swap(type,x,y) { \
type $tmp = x; \
x = y; \
y = $tmp; \
}

/* In this case, without prior definition, $tmp is given an
obfuscated name by the time it reaches the C source code. */

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 13 '05 #104

P.J. Plauger

"Dan Pop" <Da*****@cern.ch> wrote in message
news:bq**********@sunnews.cern.ch...

By then, C99 is supposed to be obsoleted by C0x ;-)

Not sure how to read this, even with the emoticon. The C committee has
agreed to *reaffirm* the C Standard for the next several years, rather
than begin work on a major revision. I'd say that the odds of there
ever bing an official C0x to replace C99 are pretty small.

In comp.std.c committee members keep mentioning C0x as the next C
standard.

Ah, peer pressure. Even the crustiest of us succumb sometimes.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Nov 13 '05 #105

Keith Thompson

Sidney Cadot <si****@jigsaw.nl> writes:

Paul Hsieh wrote: [...] I used "%x" as an example of a format specifier that isn't defined ('x'
being a placeholder for any letter that hasn't been taken by the
standard). The statement is that there'd be only 15 about letters left
for this kind of thing (including 'x' by the way -- it's not a hex
specifier). Sorry for the confusion, I should've been clearer.

What do you mean when you say that "%x" is not a hex specifier?
That's either confusing or wrong.

printf("foo = %x\n", foo);

[...]

Yes I'm sure the same trick works for chars and shorts. So how do
you widen a long long multiply?!?!? What compiler trick are you
going to hope for to capture this? What you show here is just
some trivial *SMALL* multiply, that relies on the whims of the
optimizer.

Well, I'd show you, but it's impossible _in principle_. Given that you
are multiplying two expressions of the widest type supported by your
compiler, where would it store the result?

In a struct, or in an array of two int_max_t's or uint_max_t's.
(Wasn't that a common trick in primordial C, before the introduction
of "long"?)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
(Note new e-mail address)

Nov 13 '05 #106

Keith Thompson

qe*@pobox.com (Paul Hsieh) writes:

In article <ln************@nuthaus.mib.org>, ks***@mib.org says... [...]

There's ample precedent in other languages (Pascal and Ada at
least) for packed structures. [...] You can't sensible take the
address of packed_obj.i. A function that takes an "int*" argument
will likely die if you give it a misaligned pointer (unless you
want to allow _Packed as an attribute for function arguments).
The simplest approach would be to forbid taking the address of a
member of a packed structure (think of the members as fat bit
fields). [...]

Then what would be the point of even calling it a "struct"? This is
what I am saying -- it leads to bus errors because of the rest of
the language concepts like taking the address of any value that is
stored in a memory location.

Surely it's no worse than calling a struct with bit fields a "struct".

Another possibility (ugly but perhaps useful) is to make the address of a
member of a packed field yield a void*.

No -- the problem is with the BUS error itself. The C language
doesn't need *EVEN MORE* ways of creating UB with otherise
acceptable syntax. This is more than just ugly is very very
anti-intuitive.

My thought was that making &struct_obj.packed_member yield a void*
would force the programmer to be careful about how he uses it. I
momentarily forgot about the implicit conversion from void* to any
object pointer type. (Which was dumb; I've even been participating in
the permanent floating "don't cast the result of malloc()" flameware.)
The right answer is to give such pointers a special attribute, like,
"_Unaligned" (or simply reuse "_Packed".)

[snip]

Yeah, that makes more sense than my idea.

BTW, I'm not necessarily arguing that packed structures would be worth
adding to the language, just thinking about how to do it right if it's
going to be done at all.

[...]

Agreed. The limit on call depth is typically determined by the
amount of available memory, something a compiler implementer can't
say much about. You could sensibly add a call depth clause to the
Translation Limits section (C99 5.2.4.1); that would the [require
the] implementation to handle at least one program with a call
depth of N, but wouldn't really guarantee anything in general.

Well the problem with this is that then the *LINKER* would have to
have augmented to analyze the max relevant stack size of all
functions in an object and then assign the final stack according to
a formula (N * maxstacksz) that makes this work. It also kind of
makes use of alloca impossible.

No, adding a call depth requirement to C99 5.2.4.1 wouldn't require
any kind of analysis. There would be no guarantee that a given depth
will be supported in all cases, merely that the implementation has to
translate and execute at least one program that hits the limit (along
with all the others). If a requirement for a call depth of at least,
say, 100 were added to 5.2.4.1, it could probably be met by all
existing implementations with no changes.

[...]

Well, you can do that with the "%s" specifier, as long as you've
defined a function that returns an image string for a value of your
type (with all the complications of functions returning dynamic
strings).

Who is going to free the memory allocated for this string? If its
static, then what happens when you try to printf two such items --
or just try to use it in a multitasking environment in general?

That's the same problem you have with any function that returns a
string. There are numerous solutions; programmers reinvent them all
the time.

If you can come up with a specification for an enhanced printf that
can produce arbitrary user-defined output for arbitrary user-defined
types, we can discuss whether it's better than "%s" with an image
function.

[...] If you want "min" and "max" for int, there
aren't any spare operator symbols you can use. If you want to allow
overloading for arbitrary symbols (which some languages do), you'll
need to decide how and whether the user can define precedence for the
new operators.

Good point, but something as simple as "lowest precendence" and
increasing in the order in which they are declared seems fine
enough. Or maybe inverted -- just play with those combinations to
see what makes sense in practice. If that's not good enough, then
make the precedence level relative to another operator at the time
of declaration. For example:
int _Operator ?< after + (int x, int y) { /* max */
if (x > y) return x;
return y;
}

int _Operator ?> same ?< (int x, int y) { /* min */
if (x < y) return x;
return y;
}

I think "increasing in the order in which they are declared" would be
very bad; you could quietly change the semantics of an expression by
reordering the declarations of the operators it uses (e.g., by
changing the order of #include directives).

For most C operators, the common rule for legible code is
"parenthesize, parenthesize, parenthesize". For user-defined
operators (if I thought they were a good idea), I'd probably advocate
not defining their precedence at all; if you want to write "x + y @ z",
you *have* to use parentheses. The next best thing might be to say
that all user-defined operators have the same precedence, perhaps just
above assignment.

People already complain that C looks like line noise; I don't think
assigning meanings to more arbitrary sequences of punctuation marks
solves anything. (And I would have thought that "?<" should be min,
and "?>" should be max.)

In the particular case of "min" and "max", I'd much rather just call
them "min" and "max". If you're going to have operator overloading,
you probably want function overloading as well. Even if you insist on
operator syntax rather than function call syntax, "a max b" is at
least as legible as "a $< b".

For most of the things that I'd want to see as operators, the existing
operator symbols are more than sufficient; for anything else, just use
identifiers.

[snip]

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
(Note new e-mail address)

Nov 13 '05 #107

Keith Thompson wrote:

I used "%x" as an example of a format specifier that isn't defined ('x'
being a placeholder for any letter that hasn't been taken by the
standard). The statement is that there'd be only 15 about letters left
for this kind of thing (including 'x' by the way -- it's not a hex
specifier). Sorry for the confusion, I should've been clearer.
What do you mean when you say that "%x" is not a hex specifier?
That's either confusing or wrong.

printf("foo = %x\n", foo);
It is wrong. A very stupid lapse on my side.
[...]
Yes I'm sure the same trick works for chars and shorts. So how do
you widen a long long multiply?!?!? What compiler trick are you
going to hope for to capture this? What you show here is just
some trivial *SMALL* multiply, that relies on the whims of the
optimizer.
Well, I'd show you, but it's impossible _in principle_. Given that you
are multiplying two expressions of the widest type supported by your
compiler, where would it store the result?

In a struct, or in an array of two int_max_t's or uint_max_t's.
(Wasn't that a common trick in primordial C, before the introduction
of "long"?)

Yes, that would work, and could be useful. For the signed case, a struct
containing an int_max_t for the most significant part and an uint_max_t
for the least significant part might a more proper choice?

Best regards, Sidney

Nov 13 '05 #108

Paul Hsieh wrote:

Good point, but something as simple as "lowest precendence" and increasing in
the order in which they are declared seems fine enough. Or maybe inverted --
just play with those combinations to see what makes sense in practice. If
that's not good enough, then make the precedence level relative to another
operator at the time of declaration. For example: int _Operator ?< after + (int x, int y) { /* max */
if (x > y) return x;
return y;
}

int _Operator ?> same ?< (int x, int y) { /* min */
if (x < y) return x;
return y;
}

That looks like a decent first stab at a proposed syntax. Some question
though:

- What would be the constraints on acceptable operator names?

- How are you going to define left, right, or lack of associativity?

- In what way will you handle the possible introduction of ambiguity in
the parser, that gets to parse the new tokens?

- What if I want a ?< to work both on int and on double types?

- does (and if so, how) does your syntax allow the introduction of unary
prefix operators (such as !), binary infix operators that may have
compile-time identifiers as a parameter (such as ->), n-ary operators
(such as the ternary a?b:c or your proposed quaternary carry/add
operator), and operators that exist both in unary and binary form (+, -)?

My gut feeling is that this would effectively force the compiler to
maintain a dynamic parser on-the-fly while scanning through the source,
which would be wildly complex. You mentioned that actual languages exist
that do this sort of thing; are they also compiled languages like C, or
are they interpreted languages of the functional variety?

Best regards,

Sidney

Nov 13 '05 #109

Keith Thompson

Sidney Cadot <si****@jigsaw.nl> writes:
[...]

It is wrong. A very stupid lapse on my side.

Hey, this is Usenet! You're not supposed to admit mistakes here. The
least you could do is sneakily change the subject and start being
personally abusive. 8-)}

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
(Note new e-mail address)

Nov 13 '05 #110

Sidney Cadot <si****@jigsaw.nl> wrote:

Paul Hsieh wrote:
In article <bq**********@news.tudelft.nl>, si****@jigsaw.nl says...
Paul Hsieh wrote:
Sidney Cadot <si****@jigsaw.nl> wrote:
>[...] I for one would be happy if more compilers would
>fully start to support C99, [...]I don't think that day will ever come. In its totallity C99 is almost
completely worthless in real world environments. Vendors will be
smart to pick up restrict and few of the goodies in C99 and just stop
there.
Want to take a bet...?
Sure. Vendors are waiting to see what the C++ people do, because they
are well aware of the unreconcilable conflicts that have arisen. Bjarne
and crew are going to be forced to take the new stuff C99 in the bits and
pieces that don't cause any conflict or aren't otherwise stupid for other
reasons. The Vendors are going to look at this and decide that the
subset of C99 that the C++ people chose will be the least problematic
solution and just go with that.

Ok. I'll give you 10:1 odds; there will be a (near-perfect) C99 compiler
by the end of this decade.

A single vendor?!?! Ooooh ... try not to set your standards too high.
Obviously, its well known that the gnu C++ people are basically converging
towards C99 compliance and are most of the way there already. That's not my
point. My point is that will Sun, Microsoft, Intel, MetroWerks, etc join the
fray so that C99 is ubiquitous to the point of obsoleting all previous C's for
all practical purposes for the majority of developers? Maybe the Comeau guy
will join the fray to serve the needs of the "perfect spec compliance" market
that he seems to be interested in.

If not, then projects that have a claim of real portability will never
embrace C99 (like LUA, or Python, or the JPEG reference implementation, for
example.) Even the average developers will forgo the C99 features for fear
that someone will try compile their stuff on an old compiler.

Look, nobody uses K&R-style function declarations anymore. The reason is
because the ANSI standard obsoleted them, and everyone picked up the ANSI
standard. That only happened because *EVERYONE* moved forward and picked up
the ANSI standard. One vendor is irrelevant.

If instead, the preprocessor were a lot more functional, then you
could simply extract packed offsets from a list of declarations and
literally plug them in as offsets into a char[] and do the slow memcpy
operations yourself.

This would violate the division between preprocessor and compiler too
much (the preprocessor would have to understand quite a lot of C semantics).

No, that's not what I am proposing. I am saying that you should not use
structs at all, but you can use the contents of them as a list of comma
seperated entries. With a more beefed up preprocessor one could find the
offset of a packed char array that corresponds to the nth element of the list
as a sum of sizeof()'s and you'd be off to the races.

Perhaps I'm missing something here, but wouldn't it be easier to use the
offsetof() macro?

It would be, but only if you have the packed structure mechanism. Other
people have posted indicating that in fact _Packed is more common that I
thought, so perhaps my suggestion is not necessary.

C is a language suitable for
and high encouraging of writing extremely unsound and poor code. Fixing it
would require a major overhaul of the language and library.

That's true. I don't quite see how this relates to the preceding
statement though.

I'm saying that trying to fix C's intrinsic problems shouldn't start or end
with some kind of resolution of call stack issues. Anyone who understands
machine architecture will not be surprised about call stack depth limitations.
There are far more pressing problems in the language that one would like to
fix.

There's a lot more that you can do as well. Such as a tryexpand()
function which works like realloc except that it performs no action
except returning with some sort of error status if the block cannot be
resized without moving its base pointer. Further, one would like to
be able to manage *multiple* heaps, and have a freeall() function --
it would make the problem of memory leaks much more manageable for
many applications. It would almost make some cases enormously faster.

But this is perhaps territory that the Standard should steer clear of,
more like something a well-written and dedicated third-party library
could provide.
But a third party library can't do this portably.

I don't see why not?

Explain to me how you implement malloc() in a *multithreaded* environment
portably. You could claim that C doesn't support multithreading, but I highly
doubt your going to convince any vendor that they should shut off their
multithreading support based on this argument. By dictating its existence in
the library, it would put the responsibility of making it work right in the
hands of the vendor without affecting the C standards stance on not
acknowledging the need for multithreading.

Its actual useful functionality that you just can't get from the C
language, and there's no way to reliably map such functionality to the C
language itself. One is forced to know the details of the underlying
platform to implement such things. Its something that really *should* be
in the language.

Well, it looks to me you're proposing to have a feature-rich heap
manager. I honestly don't see you this couldn't be implemented portably
without platform-specific knowledge. Could you elaborate?

See my multithreading comment above. Also, efficient heaps are usually
written with a flat view of memory in mind. This kind of is impossible in
non-flat memory architectures (like segmented architectures.)
[...] I want this more for reasons of orthogonality in design than anything
else.
You want orthogonality in the C language? You must be joking ...

My proposal allows the programmer to decide what is or is not useful them.

I'm all for that.

Well, I'm a programmer, and I don't care about binary output -- how does your
proposal help me decide what I think is useful to me?

I think people in general would like to use printf for printing out
more than just the base types in a collection of just a few formats
defined at the whims of some 70s UNIX hackers. Why not be able to
print out your data structures, or relevant parts of them as you see
fit?
I don't think it's too bad an idea (although I have never gotten round
to trying the mechanism gcc provides for this). In any case, this kind
of thing is so much more naturally done in a OOP-supporting language
like C++ . Without being bellingerent: why not use that if you want this
kind of thing?

Well, when I am programming in C++ I will use it. But I'm not going to move
all the way to using C++ just for this single purpose by itself.
I used "%x" as an example of a format specifier that isn't defined ('x'
being a placeholder for any letter that hasn't been taken by the
standard). The statement is that there'd be only 15 about letters left
for this kind of thing (including 'x' by the way -- it's not a hex
specifier). Sorry for the confusion, I should've been clearer.
Well what's wrong with %@, %*, %_, %^, etc?

>* I think I would like to see a real string-type as a first-class
> citizen in C, implemented as a native type. But this would open
> up too big a can of worms, I am afraid, and a good case can be
> made that this violates the principles of C too much (being a
> low-level language and all).

The problem is that real string handling requires memory handling.
The other primitive types in C are flat structures that are fixed
width. You either need something like C++'s constructor/destructor
semantics or automatic garbage collection otherwise you're going to
have some trouble with memory leaking.

A very simple reference-counting implementation would suffice. [...]

This would complexify the compiler to no end. Its also hard to account for a
reference that was arrived at via something like "memcpy".

A first-class citizen string wouldn't be a pointer; neither would you
necessarily be able to get its address (although you should be able to
get the address of the characters it contains).

But a string has variable length. If you allow strings to be mutable, then
the actual sequence of characters has to be put into some kind of dynamic
storage somewhere. Either way, the base part of the string would in some way
have to be the storable into, say a struct. But you can copy a struct via
memcpy or however. But this then requires a count increment since there is
now an additional copy of the string. So how is memcpy supposed to know that
its contents contain a string that it needs to increase the ref count for?
Similarly, memset needs to know how to *decrease* such a ref count.

If you allow the base of the string itself to move (like those morons did in
the Safe C String Library) then a simple things like:

string *a, b;

a = (string *) malloc (sizeof (string));
*a = b;
b = b + b + b; /* triple up b, presumably relocating the base */
/* But now *a is undefined */

are just broken.

Look, the semantics of C just don't easily allow for a useful string primitive
that doesn't have impact on the memory model (i.e., leak if you aren't
careful.) Even the Better String Library (http://bstring.sf.net/) concedes
that the programmer has to dilligently call bdestroy() to clean up after
themselves, otherwise you'll just leak.

2) because it is quite unrelated (I don't get the 'instead')
I'm saying that you could have &&&, |||, but just don't defined what they
actually do. Require that the programmer define what they do. C doesn't have
type-specific functions, and if one were to add in operator overloading in a
consistent way, then that would mean that an operator overload would have to
accept only its defined type.

Ok, so the language should have a big bunch of operators, ready for the
taking. Incidentally, Mathematica supports this, if you want it badly.

Hey, its not me -- apparently its people like you who wants more operators.
My point is that no matter what operators get added to the C language, you'll
never satisfy everyone's appetites. People will just want more and more,
though almost nobody will want all of what could be added.

My solution solves the problem once and for all. You have all the operators
you want, with whatever semantics you want.

For this to be useful without losing the
operators that already exist in C, the right answer is to *ADD* operators. In
fact I would suggest that one simply defined a grammar for such operators, and
allow *ALL* such operators to be definable.

This seems to me a bad idea for a multitude of reasons. First, it would
complicate most stages of the compiler considerably. Second, a
maintenance nightmare ensues: while the standard operators of C are
basically burnt into my soul, I'd have to get used to the Fantasy
Operator Of The Month every time I take on a new project, originally
programmed by someone els.

Yes, but if instead of actual operator overloading you only allow redefinition
of these new operators, there will not be any of the *surprise* factor. If
you see one of these new operators, you can just view it like you view an
unfamilliar function -- you'll look up its definition obviously.
There's a good reason that we use things like '+' and '*' pervasively,
in many situations; they are short, and easily absorbed in many
contexts. Self-defined operator tokens (consisting, of course, of
'atomic' operators like '+', '=', '<' ...) will lead to unreadable code,
I think; perhaps something akin to a complicated 'sed' script.
And allowing people to define their own functions with whatever names they
like doesn't lead to unreadable code? Its just the same thing. What makes
your code readable is adherence to an agreed upon coding standard that exists
outside of what the language defines.

3) because operator overloading is mostly a bad idea, IMHO
Well, Bjarne Stroustrup has made a recent impassioned request to *REMOVE*
features from C++.

Do you have a reference? That's bound to be a fun read, and he probably
missed a few candidates.

It was just in the notes to some meeting Bjarne had in the last year or so to
discuss the next C++ standard. His quote was something like that: while
adding a feature for C++ can have value, removing one would have even more
value. Maybe someone who is following the C++ standardization threads can
find a reference -- I just spent a few minutes on google and couldn't find it.

I highly doubt that operator overloading is one that has
been made or would be taken seriously. I.e., I don't think a credible
population of people who have been exposed to it would consider it a bad idea.

I can only speak for myself; I have been exposed, and think it's a bad
idea. When used very sparsely, it has it's uses. However, introducing
new user-definable operators as you propose would be folly; the only way
operator overloading works in practice is if you maintain some sort of
link to the intuitive meaning of an operator. User defined operators
lack this by definition.

But so do user definable function names. Yet, functionally they are almost
the same.

While
this is sometimes a useful shorthand, I am sure that different
applications have different list cutesy compactions that would be
worth while instead of the one above.

... I'd like to see them. &&& is a bit silly (it's fully equivalent to
"a ? b : 0") but ||| (or ?: in gcc) is actually quite useful.
But there are no end of little cheesy operators that one could add. For
example, a <> b to swap a and b, a <<< b to rotate a by b bits, @ a to find the
highest bit of a, etc., etc., etc.

"<>" would be a bad choice, since it is easy to confuse for "not equal
to". I've programmed a bit in IDL for a while, which has my dear "min"
and "max" operators.... It's a pity they are denoted "<" and ">",
leading to heaps of misery by confusion.

<<< and @ are nice though. I would be almost in favour of adding them,
were it not for the fact that this would drive C dangerously close in
the direction of APL.

You missed the "etc., etc., etc." part. I could keep coming up with them
until the cows come home: a! for factorial, a ^< b for "a choose b" (you want
language supposed for this because of overflow concerns of using the direct
definition) <-> a for endian swapping, $% a for the fractional part of a
floating point number, a +>> b for the average (there is another overflow
issue), etc., etc.

All of these are good, in some cases. And I think that there would be no
end to the number of useful operators that one might like to add to a
program. I think your proposal is DOA because you cannot make a credible
case as to why your operator in particular has any value over any of
number of other operators that you might like to add. Adding operator
overloading, however, would be a real extension and would in a sense
address *all* these issues.

Again I wonder, seriously: wouldn't you be better of using C++ ?

No because I want *MORE* operators -- not just the ability to redefine the
ones I've got (and therefore lose some.)

>* 'min' and 'max' operators (following gcc: ?< and ?>)

As I mentioned above, you might as well have operator overloading instead.
Sure, but you're talking about something that goes a lot further than
run-off-the-mill operator overloading. I think the simple way would be
to just introduce these min and max operators and be done with it.

"min" and "max" are perhaps less important than "+" and "*", but they
are probably the most-used operations that are not available right now
as operators. If we are going to extend C with new operators, they would
be the most natural choice I think.

WATCOM C/C++ defined the macros min(a,b) and max(a,b) in some header files.
Why wouldn't the language just accept this? Is it because you want variable
length parameters? -- Well in that case, does my preprocessor extension
proposal start to look like its making more sense?

Now I would ask you: which existing operator would you like to overload
for, say, integers, to mean "min" and "max" ?
How about a <==> b for max and a >==< b for min? I personally don't care that
much.

Those are not existing operators, as you know. They would have to be
defined in your curious "operator definition" scheme.

I find the idea freaky, yet interesting. I think C is not the place for
this (really, it would be too easy to compete in the IOCCC) but perhaps
in another language... Just to follow your argument for a bit, what
would an "operator definition" declaration look like for, say, the "?<"
min operator in your hypothetical extended C?

This is what I've posted elsewhere:

int _Operator ?< after + (int a, int b) {
if (a > b) return a;
return b;
}

Yes I'm sure the same trick works for chars and shorts. So how do you
widen a long long multiply?!?!? What compiler trick are you going to
hope for to capture this? What you show here is just some trivial
*SMALL* multiply, that relies on the whims of the optimizer.

Well, I'd show you, but it's impossible _in principle_. Given that you
are multiplying two expressions of the widest type supported by your
compiler, where would it store the result?

In two values of the widest type -- just like how just about every
microprocessor which has a multiply does it:

high *% low = a * b;

PowerPC, Alpha, Itanium, UltraSPARC and AMD64 all have widening multiplies that
take two 64 bit operands and returns a 128 bit result in a pair of 64 bit
operands. They all invest a *LOT* of transistors to do this *ONE* operation.
They all *KNOW* you can't finagle any C/C++ compiler to produce the operation,
yet they still do it -- its *THAT* important (hint: SSL, and therefore *ALL* of
e-commerce, uses it.)

Well, I don't know if these dozen-or-so big-number 'powermod' operations
that are needed to establish an SSL connection are such a big deal as
you make it.

Its not me -- its Intel, IBM, Motorola, Sun and AMD who seem to be obsessed
with these instructions. Of course Amazon, Yahoo and Ebay and most banks are
kind of obsessed with them too, even if they don't know it.

Probably because most languages have been written on top of C or C++.
And what about a simple carry capturing addition?

Many languages exists where this is possible, they are called
"assembly". There is no way that you could come up with a well-defined
semantics for this.
carry +< var = a + b;

It looks cute, I'll give you that. Could you please provide semantics?
It may be a lot less self evident than you think.

How about:

- carry is set to either 1 or 0, depending on whether or not a + b overflows
(just follow the 2s complement rules of one of a or b is negative.)

- var is set to the result of the addition; the remainder if a carry occurs.

- The whole expression (if you put the whole thing in parenthesese) returns
the result of carry.

+< would not be an operator in of itself -- the whole syntax is required.
For example: c +< v = a * b would just be a syntax error. The "cuteness" was
stolen from an idea I saw in some ML syntax. Obviously +< - would also be
useful.

Did you know that a PowerPC processor doesn't have a shift-right where
you can capture the carry bit in one instruction? Silly but no less true.
What has this got to do with anything? Capturing carries coming out of
shifts don't show up in any significant algorithms that I am aware of
that are significantly faster than using what we have already.

Ah, I see you've never implemented a non-table-driven CRC or a binary
greatest common divisor algorithm.

You can find a binary gcd algorithm that I wrote here:

http://www.pobox.com/~qed/32bprim.c

You will notice how I don't use or care about carries coming out of a right
shift. There wouldn't be enough of a savings to matter.
[...] They are both hard at work when you establish an SSL connection.
The specific operations I am citing make a *HUGE* difference and have billion
dollar price tags associated with them.

These numbers you made up from thin air, no? otherwise, I'd welcome a
reference.

Widening multpilies cost transistor on the CPU. The hardware algorithms are
variations of your basic public school multiply algorithm -- so it takes n^2
transistors to perform the complete operation, where n is the largest bit
word that the machine accepts for the multiplier. If the multiply were not
widened they could save half of those transistors. So multiply those extra
transistors by the number of CPUs shipped with a widening multipliy (PPC,
x86s, Alphas, UltraSparcs, ... etc) and you easily end up in the billion
dollar range.
I understand the need for the C language standard to be applicable to as
many platforms as possible. But unlike some right shift detail that you
are talking about, the widening multiply hardware actually *IS* deployed
everywhere.

Sure is. Several good big-number libraries are available that have
processor-dependent machine code to do just this.

And that's the problem. They have to be hand written in assembly. Consider
just the SWOX Gnu multiprecision library. When the Itanium was introduced,
Intel promised that it would be great for e-commerce. The problem is that
the SWOX guys were having a hard time with IA64 assembly language (as
apparently lots of people are.) So they projected performance results for
the Itanium without having code available to do what they claim. So people
who wanted to consider using an Itanium system based on its performance for
e-commerce were stuck -- they had no code, and had to believe Intel's claims,
or SWOX's as to what the performance would be.

OTOH, if instead, the C language had exposed a carry propogating add, and a
widening multiply in the language, then it would just be up to the Intel
*compiler* people to figure out how to make sure the widening multiply was
used optimally, and the SWOX/GMP people would just do a recompile for baseline
results at least.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 13 '05 #111

goose

Sidney Cadot <si****@jigsaw.nl> wrote in message news:<bq**********@news.tudelft.nl>...

goose wrote:
no, but if a and b are not macros which expand to a() and b(), then

a ? b : 0

is identical to

a ? b : a

<nit>
the example code above did not actually state whether or not
macros were used, or if there were any side-effects.

it *did* however, use a and b, and not a() and b().
Well, I used a and b as stand-ins for "any two expressions".

(using indentation to make it easier to read)

<nit> :-)
this should be here
<nit> By the way, got any funny looks talking to people today?

(You forgot the </nit>)
then </nit> should be here

Best regards,

Sidney

</nit>

</nit> (closing <nit> of previous post :-)

goose,
the guys here at the office are already used
to me quoting parts of the standard at them.

Nov 13 '05 #112

Sidney Cadot <si****@jigsaw.nl> wrote:

Paul Hsieh wrote:
Good point, but something as simple as "lowest precendence" and increasing
in the order in which they are declared seems fine enough. Or maybe
inverted -- just play with those combinations to see what makes sense in
practice. If that's not good enough, then make the precedence level
relative to another operator at the time of declaration. For example:
int _Operator ?< after + (int x, int y) { /* max */
if (x > y) return x;
return y;
}

int _Operator ?> same ?< (int x, int y) { /* min */
if (x < y) return x;
return y;
}

That looks like a decent first stab at a proposed syntax. Some question
though:

- What would be the constraints on acceptable operator names?

As I said it would be defined in some sort of grammar. I am not a
parsable language expert but basically it has to allow for things
like:

<<<, ?<, ===, ^^, |||, &&&, etc

but disallow things like:

+++, ---, ~~, !!, **, etc

due to parsing ambiguities with existing semantics.

Perhaps as an operator gains unary attributes, tacking things onto the
end of it becomes a syntax error as well.
- In what way will you handle the possible introduction of ambiguity in
the parser, that gets to parse the new tokens?
By defining a good grammar.
- What if I want a ?< to work both on int and on double types?
specific type overloading, as is done in C++?
- How are you going to define left, right, or lack of associativity?

- does (and if so, how) does your syntax allow the introduction of unary
prefix operators (such as !), binary infix operators that may have
compile-time identifiers as a parameter (such as ->), n-ary operators
(such as the ternary a?b:c or your proposed quaternary carry/add
operator), and operators that exist both in unary and binary form (+, -)?
The easiest way to handle these issues is to allow an operator inherit
the attributes of a previously defined operator. So we add in the
"like" clause:

int _Operator ?< after + like + (int x, int y) { /* max */
if (x > y) return x;
return y;
}

I would propose that -> or . not be considered operators as they have
a really different nature to them that one could not map to function
definition semantics in a sensible way. So as to the question of
whether or not ? : or the quaternary operators that I proposed -- the
idea would be to first *ADD IN* the quaternary base operators that I
have proposed, then just use the "like" clause as I've described
above. So to reduce ambiguity again, I would also not consider the
dereferencing meaning of * or the address taking meaning of & to be
considered operators either.

int _Operator ?> : after ? : like ? : (int x, int y, int z) {
if (x >= 0) return y;
return z;
}
My gut feeling is that this would effectively force the compiler to
maintain a dynamic parser on-the-fly while scanning through the source,
which would be wildly complex.
Yes it complexifies the parser quite a bit. I don't dispute that.
[...] You mentioned that actual languages exist
that do this sort of thing; are they also compiled languages like C, or
are they interpreted languages of the functional variety?

No, actually I am unaware of any language that has infinite operators.
Someone mentioned mathematica, but I am unaware of its language
semantics. I mentioned ML as the inspiration of quaternary operators,
but that's about it.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 13 '05 #113

Keith Thompson <ks***@mib.org> wrote in message news:<ln************@nuthaus.mib.org>...

qe*@pobox.com (Paul Hsieh) writes:
In article <ln************@nuthaus.mib.org>, ks***@mib.org says...
There's ample precedent in other languages (Pascal and Ada at
least) for packed structures. [...] You can't sensible take the
address of packed_obj.i. [...] The simplest approach would be to
forbid taking the address of a member of a packed structure (think of
the members as fat bit fields). [...]
Then what would be the point of even calling it a "struct"? This is
what I am saying -- it leads to bus errors because of the rest of
the language concepts like taking the address of any value that is
stored in a memory location.

Surely it's no worse than calling a struct with bit fields a "struct".

Yeah, but this is one of the *weakness* of the language.

Well, you can do that with the "%s" specifier, as long as you've
defined a function that returns an image string for a value of your
type (with all the complications of functions returning dynamic
strings).

Who is going to free the memory allocated for this string? If its
static, then what happens when you try to printf two such items --
or just try to use it in a multitasking environment in general?

That's the same problem you have with any function that returns a
string. There are numerous solutions; programmers reinvent them all
the time.

Homey saywhat? You have to do the free *after* the printf. Yet you
still need to account for multitasking. You would have to LOCK before
the printf, then FREEALL / UNLOCK after the printf just to make it
work. Which can have a massive performance impact.
If you can come up with a specification for an enhanced printf that
can produce arbitrary user-defined output for arbitrary user-defined
types, we can discuss whether it's better than "%s" with an image
function.

Well, in this case I don't have to. As I've said there are snprintf
format extension mechanism in existence already out there. Just pick
up one of them.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 13 '05 #114

Dan Pop

In <79**************************@posting.google.com > qe*@pobox.com (Paul Hsieh) writes:

A single vendor?!?! Ooooh ... try not to set your standards too high.
Obviously, its well known that the gnu C++ people are basically converging
towards C99 compliance and are most of the way there already.

OTOH, the fact that they didn't make any progress in this area for
the last two years doesn't sound very encouraging.

http://gcc.gnu.org/c99status.html seems to be quite frozen.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 13 '05 #115

Paul Hsieh wrote:

Keith Thompson <ks***@mib.org> wrote:
.... snip stuff about multitasking ...
That's the same problem you have with any function that returns
a string. There are numerous solutions; programmers reinvent
them all the time.

Homey saywhat? You have to do the free *after* the printf. Yet
you still need to account for multitasking. You would have to
LOCK before the printf, then FREEALL / UNLOCK after the printf
just to make it work. Which can have a massive performance impact.

Nonsense. printf() simply has to make a private copy of its data
before returning. This is much easier in languages that use
references. Systems have been managing buffers for some time now.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 13 '05 #116

Arthur J. O'Dwyer

On Fri, 5 Dec 2003, Paul Hsieh wrote:

Sidney Cadot <si****@jigsaw.nl> wrote:
- What would be the constraints on acceptable operator names?

As I said it would be defined in some sort of grammar. I am not a
parsable language expert but basically it has to allow for things
like:

<<<, ?<, ===, ^^, |||, &&&, etc

but disallow things like:

+++, ---, ~~, !!, **, etc

due to parsing ambiguities with existing semantics.

Nitpick: &&& should be in the same category as !! or **, since it
currently also has useful semantics.
The basic rule for "new operator" candidacy is simple: If it ends
with a unary prefix operator, throw it out. If it begins with a
unary postfix operator, throw it out. If it contains the consecutive
characters "/*", "*/", or "//", throw it out. Everything else should
be acceptable, unless I'm missing a case.

- In what way will you handle the possible introduction of ambiguity in
the parser, that gets to parse the new tokens?

By defining a good grammar.

The ambiguity won't be in the parser; it'll be in the lexer.
And the *semantics* of the code will affect the output of the
lexer, if you allow new operators to evolve on the fly. Which
will make this new language practically impossible to implement
using lex/yacc techniques anymore.

My gut feeling is that this would effectively force the compiler to
maintain a dynamic parser on-the-fly while scanning through the source,
which would be wildly complex.

Yes it complexifies the parser quite a bit. I don't dispute that.

Not the parser so much as the lexer. Is "a %^&%*!^ b" three tokens,
four, five, six, seven, eight, or nine? It depends on the semantics
of the code we've already translated.
Note that this is *NOT*, repeat *NOT*, an idea that will ever make
it into the C programming language, for this reason -- as expressed
in this subthread, it would break almost *every* C-parsing tool on
the market.

[...] You mentioned that actual languages exist
that do this sort of thing; are they also compiled languages like C, or
are they interpreted languages of the functional variety?

No, actually I am unaware of any language that has infinite operators.
Someone mentioned mathematica, but I am unaware of its language
semantics. I mentioned ML as the inspiration of quaternary operators,
but that's about it.

ISTR that I brought up the topic in comp.lang.misc a year ago or
thereabouts, but I forget if anything interesting got said except
that which I've repeated here.

-Arthur

Nov 13 '05 #117

CBFalconer <cb********@yahoo.com> wrote in message news:<3F***************@yahoo.com>...

Paul Hsieh wrote:
Keith Thompson <ks***@mib.org> wrote:

... snip stuff about multitasking ...
That's the same problem you have with any function that returns
a string. There are numerous solutions; programmers reinvent
them all the time.

Homey saywhat? You have to do the free *after* the printf. Yet
you still need to account for multitasking. You would have to
LOCK before the printf, then FREEALL / UNLOCK after the printf
just to make it work. Which can have a massive performance impact.

Nonsense. printf() simply has to make a private copy of its data
before returning. This is much easier in languages that use
references. Systems have been managing buffers for some time now.

Excuse me, but someone has to *free* the memory. Explain how this is
done without some kind of lock or handle grabbing operation prior to
the printf (in which case you might as well do you own seperate printf
for each -> string operation, and free each result by hand) and
without modifying printf.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 13 '05 #118

Keith Thompson wrote:

Sidney Cadot <si****@jigsaw.nl> writes:
[...]
It is wrong. A very stupid lapse on my side.

Hey, this is Usenet! You're not supposed to admit mistakes here. The
least you could do is sneakily change the subject and start being
personally abusive. 8-)}

Once more, you're right. A very stupid lapse on my side :-)

Best regards,

Sidney

Nov 13 '05 #119

Paul Hsieh wrote:

CBFalconer <cb********@yahoo.com> wrote in message news:<3F***************@yahoo.com>...
Paul Hsieh wrote:
Keith Thompson <ks***@mib.org> wrote:
... snip stuff about multitasking ...

That's the same problem you have with any function that returns
> a string. There are numerous solutions; programmers reinvent
> them all the time.

Homey saywhat? You have to do the free *after* the printf. Yet
you still need to account for multitasking. You would have to
LOCK before the printf, then FREEALL / UNLOCK after the printf
just to make it work. Which can have a massive performance impact.

Nonsense. printf() simply has to make a private copy of its data
before returning. This is much easier in languages that use
references. Systems have been managing buffers for some time now.

Excuse me, but someone has to *free* the memory. Explain how this is
done without some kind of lock or handle grabbing operation prior to
the printf (in which case you might as well do you own seperate printf
for each -> string operation, and free each result by hand) and
without modifying printf.

printf (or anything else) secures the necessary memory, copies
things into it, and can now return to its caller while letting a
separate task process the data. When done, it releases that
memory.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 13 '05 #120

Paul Hsieh wrote:

Sidney Cadot <si****@jigsaw.nl> wrote:
Sure. Vendors are waiting to see what the C++ people do, because they
are well aware of the unreconcilable conflicts that have arisen. Bjarne
and crew are going to be forced to take the new stuff C99 in the bits and
pieces that don't cause any conflict or aren't otherwise stupid for other
reasons. The Vendors are going to look at this and decide that the
subset of C99 that the C++ people chose will be the least problematic
solution and just go with that.

Ok. I'll give you 10:1 odds; there will be a (near-perfect) C99 compiler
by the end of this decade. A single vendor?!?! Ooooh ... try not to set your standards too high.
One has to be conservative when engaging in bets.
Obviously, its well known that the gnu C++ people are basically converging
towards C99 compliance and are most of the way there already. That's not my
point. My point is that will Sun, Microsoft, Intel, MetroWerks, etc join the
fray so that C99 is ubiquitous to the point of obsoleting all previous C's for
all practical purposes for the majority of developers?
I think they will. Could take a couple of years though.
Maybe the Comeau guy
will join the fray to serve the needs of the "perfect spec compliance" market
that he seems to be interested in.

If not, then projects that have a claim of real portability will never
embrace C99 (like LUA, or Python, or the JPEG reference implementation, for
example.) Even the average developers will forgo the C99 features for fear
that someone will try compile their stuff on an old compiler.
Sure, there'll be market inertia, but this also happened with the
transition of K&R -> ANSI fifteen years ago.
Look, nobody uses K&R-style function declarations anymore. The reason is
because the ANSI standard obsoleted them, and everyone picked up the ANSI
standard. That only happened because *EVERYONE* moved forward and picked up
the ANSI standard. One vendor is irrelevant.
Ok. Can't speak for MW, but I think that by the end of 2007 we'll have
near-perfect C99 compilers from GNU, Sun, Microsoft, and Intel. Odds now
down to at 5:1; you're in?

No, that's not what I am proposing. I am saying that you should not use
structs at all, but you can use the contents of them as a list of comma
seperated entries. With a more beefed up preprocessor one could find the
offset of a packed char array that corresponds to the nth element of the list
as a sum of sizeof()'s and you'd be off to the races.

Perhaps I'm missing something here, but wouldn't it be easier to use the
offsetof() macro? It would be, but only if you have the packed structure mechanism. Other
people have posted indicating that in fact _Packed is more common that I
thought, so perhaps my suggestion is not necessary.
Ok. I agree with you on extending the capabilities of the preprocessor
in general, although I can't come up with actual things I miss in
everyday programming.

C is a language suitable for
and high encouraging of writing extremely unsound and poor code. Fixing it
would require a major overhaul of the language and library.

That's true. I don't quite see how this relates to the preceding
statement though. I'm saying that trying to fix C's intrinsic problems shouldn't start or end
with some kind of resolution of call stack issues. Anyone who understands
machine architecture will not be surprised about call stack depth limitations.
It's the task of a standard to spell these out, I think.
There are far more pressing problems in the language that one would like to
fix.
Yes, but most things that relate to the "encouraging of writing
extremely unsound and poor code", as you describe C, would be better
fixed by using another language. A lot of the inherently unsafe things
in C are sometimes needed, when doing low-level stuff.

[powerful heap manager]
But a third party library can't do this portably.

I don't see why not?

Explain to me how you implement malloc() in a *multithreaded* environment
portably. You could claim that C doesn't support multithreading, but I highly
doubt your going to convince any vendor that they should shut off their
multithreading support based on this argument.

Now your shifting the goalposts.
By dictating its existence in
the library, it would put the responsibility of making it work right in the
hands of the vendor without affecting the C standards stance on not
acknowledging the need for multithreading.
Obviously, you cannot write a multithreading heap manager portably if
there is no portable standard on multithreading (doh). If you need this
you can always presume POSIX, and be on your way.

Its actual useful functionality that you just can't get from the C
language, and there's no way to reliably map such functionality to the C
language itself. One is forced to know the details of the underlying
platform to implement such things. Its something that really *should* be
in the language.
I disagree. POSIX is for things like this.

Well, it looks to me you're proposing to have a feature-rich heap
manager. I honestly don't see you this couldn't be implemented portably
without platform-specific knowledge. Could you elaborate?

See my multithreading comment above. Also, efficient heaps are usually
written with a flat view of memory in mind. This kind of is impossible in
non-flat memory architectures (like segmented architectures.)

What does the latter remark have to do with C's suitability for doing it?

[...] I want this more for reasons of orthogonality in design than anything
else. You want orthogonality in the C language? You must be joking ...
Not at all.
My proposal allows the programmer to decide what is or is not useful them.

I'm all for that.

Well, I'm a programmer, and I don't care about binary output -- how does your
proposal help me decide what I think is useful to me?

It already did, it seems - You just stated your decision, with regard to
binary output. Fine by me.

[...] . Without being bellingerent: why not use that if you want this
kind of thing?

Well, when I am programming in C++ I will use it. But I'm not going to move
all the way to using C++ just for this single purpose by itself.

I used "%x" as an example of a format specifier that isn't defined ('x'
being a placeholder for any letter that hasn't been taken by the
standard). The statement is that there'd be only 15 about letters left
for this kind of thing (including 'x' by the way -- it's not a hex
specifier). Sorry for the confusion, I should've been clearer.

Well what's wrong with %@, %*, %_, %^, etc?

%* will clash with already legal format specifiers like %*d. All the
others are just plain ugly :-)

A first-class citizen string wouldn't be a pointer; neither would you
necessarily be able to get its address (although you should be able to
get the address of the characters it contains).

But a string has variable length. If you allow strings to be mutable, then
the actual sequence of characters has to be put into some kind of dynamic
storage somewhere. Either way, the base part of the string would in some way
have to be the storable into, say a struct. But you can copy a struct via
memcpy or however. But this then requires a count increment since there is
now an additional copy of the string. So how is memcpy supposed to know that
its contents contain a string that it needs to increase the ref count for?
Similarly, memset needs to know how to *decrease* such a ref count.

It's not very practical, is it..... Hmmmm. Instead of thinking of
perverse ways of circumventing these rather fundamental problems, I'll
just concede the point. Good show :-)

I'm saying that you could have &&&, |||, but just don't defined what they
actually do. Require that the programmer define what they do. C doesn't have
type-specific functions, and if one were to add in operator overloading in a
consistent way, then that would mean that an operator overload would have to
accept only its defined type.

Ok, so the language should have a big bunch of operators, ready for the
taking. Incidentally, Mathematica supports this, if you want it badly.

Hey, its not me -- apparently its people like you who wants more operators.

Just a dozen or so! But _standardized_ ones.

Seriously, though, your "operation introduction" idea is something
different than "operator overloading" alltogether. We should try not to
mix up these two.
My point is that no matter what operators get added to the C language, you'll
never satisfy everyone's appetites. People will just want more and more,
though almost nobody will want all of what could be added.

My solution solves the problem once and for all. You have all the operators
you want, with whatever semantics you want.
That's too much freedom for my taste. If I would want this kind of
thing, I would yank out lex and yacc and code my own language.

For this to be useful without losing the
operators that already exist in C, the right answer is to *ADD* operators. In
fact I would suggest that one simply defined a grammar for such operators, and
allow *ALL* such operators to be definable.

This seems to me a bad idea for a multitude of reasons. First, it would
complicate most stages of the compiler considerably. Second, a
maintenance nightmare ensues: while the standard operators of C are
basically burnt into my soul, I'd have to get used to the Fantasy
Operator Of The Month every time I take on a new project, originally
programmed by someone else. Yes, but if instead of actual operator overloading you only allow redefinition
of these new operators, there will not be any of the *surprise* factor.
I don't know if you've ever experienced the mispleasure of having to
maintain code that's not written by yourself, but it's difficult enough
as it is. Adding new operators might be interesting from a theoretical
perspective, but it surely is a bad idea from a broader software
engineering perspective.
If you see one of these new operators, you can just view it like you view an
unfamilliar function -- you'll look up its definition obviously.
There is an important difference: functions have a "name" that has a
mnemonic function. Operators are just a bunch of pixels with no link to
anything else. It's only by a lot of repetition that you get used to
weird things like '<' and '>'. I don't know about you, But I used to do
Pascal before I switched to C. It took me quite some time before I got
used to "!=".

There's a good reason that we use things like '+' and '*' pervasively,
in many situations; they are short, and easily absorbed in many
contexts. Self-defined operator tokens (consisting, of course, of
'atomic' operators like '+', '=', '<' ...) will lead to unreadable code,
I think; perhaps something akin to a complicated 'sed' script. And allowing people to define their own functions with whatever names they
like doesn't lead to unreadable code? Its just the same thing.
Nope. See above.
What makes your code readable is adherence to an agreed upon coding
standard that exists outside of what the language defines.
There are several such standards for identifier names. No such standard
exists for operator names, except: use familiar ones; preferably, steal
them from other languages. The common denominator of all the identifier
standards is: "use meaningful names". I maintain that there is no
parallel for operators; there's no such thing as a "meaningful"
operator, except when you have been drilled to know their meaning. Your
proposal is in direct collision with this rather important fact of how
the human mind seems to work.

3) because operator overloading is mostly a bad idea, IMHO Well, Bjarne Stroustrup has made a recent impassioned request to *REOVE*
features from C++. Do you have a reference? That's bound to be a fun read, and he probably
missed a few candidates.

It was just in the notes to some meeting Bjarne had in the last year or so to
discuss the next C++ standard. His quote was something like that: while
adding a feature for C++ can have value, removing one would have even more
value. Maybe someone who is following the C++ standardization threads can
find a reference -- I just spent a few minutes on google and couldn't find it.

Ok. I appreciate the effort.
I highly doubt that operator overloading is one that has
been made or would be taken seriously. I.e., I don't think a credible
population of people who have been exposed to it would consider it a bad idea.

I can only speak for myself; I have been exposed, and think it's a bad
idea. When used very sparsely, it has it's uses. However, introducing
new user-definable operators as you propose would be folly; the only way
operator overloading works in practice is if you maintain some sort of
link to the intuitive meaning of an operator. User defined operators
lack this by definition. But so do user definable function names. Yet, functionally they are almost
the same.
"names" refer to (often tangible) objects, whereas "operators" refer to
abstract ideas. I'm no psychologist, but I would guess they could back
up my claim that it's easier for us to handle names than symbols. For
one thing, I have yet to see the first 2-year old that utters "greater
than" as first words.

<<< and @ are nice though. I would be almost in favour of adding them,
were it not for the fact that this would drive C dangerously close in
the direction of APL.

You missed the "etc., etc., etc." part.

In a sense, I truly missed it. Your suggestions were rather interesting! ;-)
I could keep coming up with them
until the cows come home: a! for factorial, a ^< b for "a choose b" (you want
language supposed for this because of overflow concerns of using the direct
definition) <-> a for endian swapping, $% a for the fractional part of a
floating point number, a +>> b for the average (there is another overflow
issue), etc., etc.
Golly! You truly are good at this :-)

Again I wonder, seriously: wouldn't you be better of using C++ ? No because I want *MORE* operators -- not just the ability to redefine the
ones I've got (and therefore lose some.)
Ok. Your opinion on this is quite clear. I disagree for technical
(implementability) and psychological (names versus symbols) reasons. We
could just leave it at that.
[snipped a bit...] I find the idea freaky, yet interesting. I think C is not the place for
this (really, it would be too easy to compete in the IOCCC) but perhaps
in another language... Just to follow your argument for a bit, what
would an "operator definition" declaration look like for, say, the "?<"
min operator in your hypothetical extended C?

This is what I've posted elsewhere:

int _Operator ?< after + (int a, int b) {
if (a > b) return a;
return b;
}

I already saw that and reacted. Will come to that in another post.

Yes I'm sure the same trick works for chars and shorts. So how do you
widen a long long multiply?!?!? What compiler trick are you going to
hope for to capture this? What you show here is just some trivial
*SMALL* multiply, that relies on the whims of the optimizer.

Well, I'd show you, but it's impossible _in principle_. Given that you
are multiplying two expressions of the widest type supported by your
compiler, where would it store the result?

In two values of the widest type -- just like how just about every
microprocessor which has a multiply does it:

high *% low = a * b;

Hmmm. I hate to do this again, but could you provide semantics? Just to
keep things manageable, I'd be happy to see what happens if high, low,
a, and b are any possible combinations of bit-widths and signedness.
Could you clearly define the meaning of this?

PowerPC, Alpha, Itanium, UltraSPARC and AMD64 all have widening multiplies that
take two 64 bit operands and returns a 128 bit result in a pair of 64 bit
operands. They all invest a *LOT* of transistors to do this *ONE* operation.
They all *KNOW* you can't finagle any C/C++ compiler to produce the operation,
yet they still do it -- its *THAT* important (hint: SSL, and therefore *ALL* of
e-commerce, uses it.)
Well, I don't know if these dozen-or-so big-number 'powermod' operations
that are needed to establish an SSL connection are such a big deal as
you make it. Its not me -- its Intel, IBM, Motorola, Sun and AMD who seem to be obsessed
with these instructions.
I don't see them working the committees to get these supported in
non-assembly languages. I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.
Of course Amazon, Yahoo and Ebay and most banks are
kind of obsessed with them too, even if they don't know it.
I think you would find that bignum operations are a small part of the
load on e-commerce servers. All RSA-based protocols just do a small
amount of bignum work at session-establishment time to agree to a key
for a shared-secret algorithm.
Many languages exists where this is possible, they are called
"assembly". There is no way that you could come up with a well-defined
semantics for this.

carry +< var = a + b;

It looks cute, I'll give you that. Could you please provide semantics?
It may be a lot less self evident than you think. How about:

- carry is set to either 1 or 0, depending on whether or not a + b overflows
(just follow the 2s complement rules of one of a or b is negative.)
Hang on, are we talking about "overflow" or "carry" here? These are two
different things with signed numbers.

What happens if a is signed and b is unsigned?
- var is set to the result of the addition; the remainder if a carry occurs.
What happens if the signedness of var, a, and b are not equal?

What happens if the bit-widths of var, a, and b are not equal?
- The whole expression (if you put the whole thing in parenthesese) returns
the result of carry.
..... So this would presume the actual expression is: "+< var = a + b" .
There's no need to introduce a mandatory "carry" variable, then.
In fact, if is were only interested in the carry, I'd be out of luck:
still need the 'var'. That's a bit ugly.

Basically, this is a C-esque syntax for a tuple assignment which
unfortunately is lacking in C:

(carry, value) = a+b
+< would not be an operator in of itself -- the whole syntax is required.
For example: c +< v = a * b would just be a syntax error. The "cuteness" was
stolen from an idea I saw in some ML syntax. Obviously +< - would also be
useful.
I would think you don't need the "c" as well, to make a valid
expression. But I would still need to know what happens with all the
bit-widths and signedness issues.

Did you know that a PowerPC processor doesn't have a shift-right where
you can capture the carry bit in one instruction? Silly but no less true.

What has this got to do with anything? Capturing carries coming out of
shifts don't show up in any significant algorithms that I am aware of
that are significantly faster than using what we have already.

Ah, I see you've never implemented a non-table-driven CRC or a binary
greatest common divisor algorithm.

You can find a binary gcd algorithm that I wrote here:

http://www.pobox.com/~qed/32bprim.c

That's not the "binary GCD algorithm", that's just Knuths version that
avoids modulos. Below is a binary GCD.

unsigned bgcd(unsigned a, unsigned b)
{
unsigned c,e;
if (c=a|b)
{
for(e=0;c%2==0;e++) c/=2;
a>>=e;
b>>=e;

while(a%2==0) a/=2;
while(b%2==0) b/=2;

while (a!=b)
{
if (a<b)
{
b-=a;
do b/=2; while (b%2==0);
}
else
{
a-=b;
do a/=2; while (a%2==0);
}
}
c=a<<e;
}
return c;
}
You will notice how I don't use or care about carries coming out of a right
shift. There wouldn't be enough of a savings to matter.
Check bgcd().

The specific operations I am citing make a *HUGE* difference and have billion
dollar price tags associated with them.

These numbers you made up from thin air, no? otherwise, I'd welcome a
reference.

Widening multpilies cost transistor on the CPU. The hardware algorithms are
variations of your basic public school multiply algorithm -- so it takes n^2
transistors to perform the complete operation, where n is the largest bit
word that the machine accepts for the multiplier. If the multiply were not
widened they could save half of those transistors. So multiply those extra
transistors by the number of CPUs shipped with a widening multipliy (PPC,
x86s, Alphas, UltraSparcs, ... etc) and you easily end up in the billion
dollar range.

This is probably the most elaborate version of "yes, I made these
numbers up from thin air" I've ever came across :-)

I understand the need for the C language standard to be applicable to as
many platforms as possible. But unlike some right shift detail that you
are talking about, the widening multiply hardware actually *IS* deployed
everywhere.

Yup. And it is used too. From machine language.
Sure is. Several good big-number libraries are available that have
processor-dependent machine code to do just this.

And that's the problem. They have to be hand written in assembly. Consider
just the SWOX Gnu multiprecision library. When the Itanium was introduced,
Intel promised that it would be great for e-commerce.

Correction: the Intel marketing department promised that it would be
great for e-commerce.
The problem is that the SWOX guys were having a hard time with IA64 assembly
language (as apparently lots of people are.)
Yes, it's close to a VLIW architecture. Hard to code manually.
So they projected performance results for
the Itanium without having code available to do what they claim. So people
who wanted to consider using an Itanium system based on its performance for
e-commerce were stuck -- they had no code, and had to believe Intel's claims,
or SWOX's as to what the performance would be.
The only thing your example shows is that a marketing angle sometimes
doesn't rhyme well with technical realities.
OTOH, if instead, the C language had exposed a carry propogating add, and a
widening multiply in the language, then it would just be up to the Intel
*compiler* people to figure out how to make sure the widening multiply was
used optimally, and the SWOX/GMP people would just do a recompile for baseline
results at least.

I would guess that Intel, being both a compiler maker and the IA64
manufacturer, could have introduced a macro widemul(hr,lr,a,b) to do
this, and help the SWOX guys out a bit?

My guess is that they have problems with raw performance and/or compiler
technique. I have some experience with a VLIW compiler, and these things
need a compiler pass to do instruction to execution-pipeline allocation.
This is a very active area of research, and notoriously difficult. My
guess is that there are inherent problems of getting high performance
out of IA64 for this kind of algorithms. VLIW and VLIW-like
architectures can do wonders on high-troughput, low-branching type of
work, but they tend to break down on some very simple algorithms, if
there is a lot of branching.

I don't know SWOX; what do they use for bignum multiplication?
Karatsuba's algorithm?

Best regards, Sidney

Nov 13 '05 #121

Paul Hsieh wrote:

Sidney Cadot <si****@jigsaw.nl> wrote:
Paul Hsieh wrote:
Good point, but something as simple as "lowest precendence" and increasing
in the order in which they are declared seems fine enough. Or maybe
inverted -- just play with those combinations to see what makes sense in
practice. If that's not good enough, then make the precedence level
relative to another operator at the time of declaration. For example:

int _Operator ?< after + (int x, int y) { /* max */
if (x > y) return x;
return y;
}

int _Operator ?> same ?< (int x, int y) { /* min */
if (x < y) return x;
return y;
}

That looks like a decent first stab at a proposed syntax. Some question
though:

- What would be the constraints on acceptable operator names?

As I said it would be defined in some sort of grammar. I am not a
parsable language expert but basically it has to allow for things
like:

<<<, ?<, ===, ^^, |||, &&&, etc

but disallow things like:

+++, ---, ~~, !!, **, etc

due to parsing ambiguities with existing semantics.

I'm sorry, but that's not much of an answer. We are now both stuck, not
knowing whether it is possible even in principle to do this. And the
onus is on you to do so, I would think.
Perhaps as an operator gains unary attributes, tacking things onto the
end of it becomes a syntax error as well.
That statement does not mean a lot without some definitions.

- In what way will you handle the possible introduction of ambiguity in
the parser, that gets to parse the new tokens? By defining a good grammar.
I seriously doubt this is possible even in principle.
- What if I want a ?< to work both on int and on double types? specific type overloading, as is done in C++?
With or without implicit conversions?
- How are you going to define left, right, or lack of associativity?

- does (and if so, how) does your syntax allow the introduction of unary
prefix operators (such as !), binary infix operators that may have
compile-time identifiers as a parameter (such as ->), n-ary operators
(such as the ternary a?b:c or your proposed quaternary carry/add
operator), and operators that exist both in unary and binary form (+, -)? The easiest way to handle these issues is to allow an operator inherit
the attributes of a previously defined operator. So we add in the
"like" clause:

int _Operator ?< after + like + (int x, int y) { /* max */
if (x > y) return x;
return y;
}
What would the rules look like to know if such a thing yields a
non-ambiguous tokenizer/parser. This is just to assess whether it could
be made to work _even in theory_. (That's the domain this idea is bound
to anyway).
I would propose that -> or . not be considered operators as they have
a really different nature to them that one could not map to function
definition semantics in a sensible way.
Why not? What if I want an operator a@^&@b that gives me the address of
the field following a.b in struct a?
So as to the question of
whether or not ? : or the quaternary operators that I proposed -- the
idea would be to first *ADD IN* the quaternary base operators that I
have proposed, then just use the "like" clause as I've described
above.
So we're back to square one: more operators in the core language?
So to reduce ambiguity again, I would also not consider the
dereferencing meaning of * or the address taking meaning of & to be
considered operators either.
That's a pretty arbitrary limit.
int _Operator ?> : after ? : like ? : (int x, int y, int z) {
if (x >= 0) return y;
return z;
}
My gut feeling is that this would effectively force the compiler to
maintain a dynamic parser on-the-fly while scanning through the source,
which would be wildly complex.
Yes it complexifies the parser quite a bit. I don't dispute that.

....Beyond the complexity of a C++ compiler, for one thing. And quite a
bit beyond it as well.
[...] You mentioned that actual languages exist
that do this sort of thing; are they also compiled languages like C, or
are they interpreted languages of the functional variety?

No, actually I am unaware of any language that has infinite operators.
I think APL has a number of operators that could be described as "close
to infinite" ... :-)
Someone mentioned mathematica, but I am unaware of its language
semantics.
That was me. It provides a lot of free operators that you can assign
meaning to (but not an extensible set). But in Matematica, all this is
just syntactic sugar. Your proposal looks moor like syntactic vinegar to
me :-)
I mentioned ML as the inspiration of quaternary operators,
but that's about it.

Pity. That could yield some useful lessons.

Best regards, Sidney

Nov 13 '05 #122

Arthur J. O'Dwyer wrote:

On Fri, 5 Dec 2003, Paul Hsieh wrote:
Sidney Cadot <si****@jigsaw.nl> wrote:
- What would be the constraints on acceptable operator names?
As I said it would be defined in some sort of grammar. I am not a
parsable language expert but basically it has to allow for things
like:

<<<, ?<, ===, ^^, |||, &&&, etc

but disallow things like:

+++, ---, ~~, !!, **, etc

due to parsing ambiguities with existing semantics.

Nitpick: &&& should be in the same category as !! or **, since it
currently also has useful semantics.
The basic rule for "new operator" candidacy is simple: If it ends
with a unary prefix operator, throw it out. If it begins with a
unary postfix operator, throw it out. If it contains the consecutive
characters "/*", "*/", or "//", throw it out. Everything else should
be acceptable, unless I'm missing a case.

Yes, that would take care of tokens, I guess.

- In what way will you handle the possible introduction of ambiguity in
the parser, that gets to parse the new tokens?

By defining a good grammar.

The ambiguity won't be in the parser; it'll be in the lexer.

Until you consider the precedence and associativeness of the new
operator. Surely, this will impact the parser as well.
And the *semantics* of the code will affect the output of the
lexer, if you allow new operators to evolve on the fly. Which
will make this new language practically impossible to implement
using lex/yacc techniques anymore.
On the contrary... In a way, the language Paul proposes already exists:
it's called bison input files, with a C grammer pre-installed.

My gut feeling is that this would effectively force the compiler to
maintain a dynamic parser on-the-fly while scanning through the source,
which would be wildly complex.

Yes it complexifies the parser quite a bit. I don't dispute that.

Not the parser so much as the lexer. Is "a %^&%*!^ b" three tokens,
four, five, six, seven, eight, or nine? It depends on the semantics
of the code we've already translated.

I would say the lexer is the least of the problems. It already has to
maintain dynamic tokens for typedefs, I guess it could handle this as
well. But the dynamic parser.... That would be monster.

Best regards,

Sidney

Nov 13 '05 #123

aj*@nospam.andrew.cmu.edu says...

My gut feeling is that this would effectively force the compiler to
maintain a dynamic parser on-the-fly while scanning through the source,
which would be wildly complex.

Yes it complexifies the parser quite a bit. I don't dispute that.

Not the parser so much as the lexer. Is "a %^&%*!^ b" three tokens,
four, five, six, seven, eight, or nine? It depends on the semantics
of the code we've already translated.
Note that this is *NOT*, repeat *NOT*, an idea that will ever make
it into the C programming language, for this reason -- as expressed
in this subthread, it would break almost *every* C-parsing tool on
the market.

So would adding &&& and |||. Remember that my premise is that if one is
motivated to add one of those, why not add in something more general?

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 13 '05 #124

In article <3F***************@yahoo.com>, cb********@yahoo.com says...

Paul Hsieh wrote:
CBFalconer <cb********@yahoo.com> wrote in message news:<3F***************@yahoo.com>...
Paul Hsieh wrote:
> Keith Thompson <ks***@mib.org> wrote:
... snip stuff about multitasking ...
> > That's the same problem you have with any function that returns
> > a string. There are numerous solutions; programmers reinvent
> > them all the time.
>
> Homey saywhat? You have to do the free *after* the printf. Yet
> you still need to account for multitasking. You would have to
> LOCK before the printf, then FREEALL / UNLOCK after the printf
> just to make it work. Which can have a massive performance impact.

Nonsense. printf() simply has to make a private copy of its data
before returning. This is much easier in languages that use
references. Systems have been managing buffers for some time now.

Excuse me, but someone has to *free* the memory. Explain how this is
done without some kind of lock or handle grabbing operation prior to
the printf (in which case you might as well do you own seperate printf
for each -> string operation, and free each result by hand) and
without modifying printf.

printf (or anything else) secures the necessary memory, copies
things into it, and can now return to its caller while letting a
separate task process the data. When done, it releases that
memory.

Which part of "without modifying printf" did you miss? It own output string is
not at issue. What's at issue is that you have just created a function which
returns an string. If the allocation is static, then you have multitasking/re-
entrancy problem. If its dynamic then you have a memory leak issue to contend
with. If you use a ref counting system, you have to same problem as the memory
leak issue -- who decrements the count when printf is done?

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 13 '05 #125

Paul Hsieh wrote:

aj*@nospam.andrew.cmu.edu says...
My gut feeling is that this would effectively force the compiler to
maintain a dynamic parser on-the-fly while scanning through the source,
which would be wildly complex.

Yes it complexifies the parser quite a bit. I don't dispute that.

Not the parser so much as the lexer. Is "a %^&%*!^ b" three tokens,
four, five, six, seven, eight, or nine? It depends on the semantics
of the code we've already translated.
Note that this is *NOT*, repeat *NOT*, an idea that will ever make
it into the C programming language, for this reason -- as expressed
in this subthread, it would break almost *every* C-parsing tool on
the market.

So would adding &&& and |||. Remember that my premise is that if one is
motivated to add one of those, why not add in something more general?

Bad comparison. Support for &&& and ||| would be infinitely easier to
add than support for what you propose.

Regards,

Sidney

Nov 13 '05 #126

Paul Hsieh wrote:

CBFalconer <cb********@yahoo.com> wroe:

.... snip ...

printf (or anything else) secures the necessary memory, copies
things into it, and can now return to its caller while letting a
separate task process the data. When done, it releases that
memory.

Which part of "without modifying printf" did you miss? It own
output string is not at issue. What's at issue is that you have
just created a function which returns an string. If the allocation
is static, then you have multitasking/re- entrancy problem. If its
dynamic then you have a memory leak issue to contend with. If you
use a ref counting system, you have to same problem as the memory
leak issue -- who decrements the count when printf is done?

printf etc are system procedures. If we are building them we
should be free to make them work, or to use suitable lower level
functions that are called by it. ISO C is not a multitasking
system, so to run such programs correctly in a multi-tasking
system requires that the various interfaces be carefully and
correctly designed.

One of the principles to be observed is that data storage not be
released before it is used.

You seem to be arguing about nothing at all.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 13 '05 #127

In article <bq**********@news.tudelft.nl>, si****@jigsaw.nl says...

Paul Hsieh wrote:
Look, nobody uses K&R-style function declarations anymore. The reason is
because the ANSI standard obsoleted them, and everyone picked up the ANSI
standard. That only happened because *EVERYONE* moved forward and picked up
the ANSI standard. One vendor is irrelevant.
Ok. Can't speak for MW, but I think that by the end of 2007 we'll have
near-perfect C99 compilers from GNU, Sun, Microsoft, and Intel. Odds now
down to at 5:1; you're in?

With the *and* in there? Sure. Since Microsoft alone will not do it
(otherwise why not back port MFC to C?), the GNU people may decide that the
last 10% just isn't worth it, Sun has two other languages to worry about that
take precidence in terms of development resources, and once C++0x emerges, C99
development by any of these vendors will almost certainly be halted. I'd have
to be wrong on all of these guys to lose this.
Ok. I agree with you on extending the capabilities of the preprocessor
in general, although I can't come up with actual things I miss in
everyday programming.
I run into this every now and then. For example, I was recently trying to
solve the following problem: Create a directed graph on n points, each with an
out degree of d (I am concerned with very small d, like 2), such that the path
length between any pair of points is minimized.

Turns out that this is a problem whose complexity grows super-exponentially
with very small values of n. Despite my best efforts, I don't know the answer
for n=11, d=2 for example (I know its either 3 or 4, but I can't prove that its
not 3). More startling is the possibility that n=12,d=2 might have a smaller
latency (but I don't know that either)!

Anyhow, the point is that the only way to have a chance to squeeze enough
computational juice out of my PC to solve this, I had to hard code huge amounts
of the code, and use a lot of bit twiddling tricks just for each special case.
I used the preprocessor macros as much as possible to make my code manageable,
but in order to change n, I actually have to *modify* the code by adding and
changing *n* lines of code. There's not much I can do about this, I would have
no chance to solve this otherwise.

With a more powerful preprocessor, or a code generator, I could actually make
this so that I could modify one #define, or even possibly make it run time
settable (though this would make the code much larger.)

I'm saying that trying to fix C's intrinsic problems shouldn't start or end
with some kind of resolution of call stack issues. Anyone who understands
machine architecture will not be surprised about call stack depth limitations.

It's the task of a standard to spell these out, I think.
There are far more pressing problems in the language that one would like to
fix.

Yes, but most things that relate to the "encouraging of writing
extremely unsound and poor code", as you describe C, would be better
fixed by using another language. A lot of the inherently unsafe things
in C are sometimes needed, when doing low-level stuff.

Why is UB in isgraph(-1) needed? Why is UB in gets() needed? Why is the fact
that fgets() skips over '\0' characters needed? Why is a non-portable right
shift on signed integers needed (especially considering the one on unsigned
*is* portable)?

[powerful heap manager]
But a third party library can't do this portably.

I don't see why not?

Explain to me how you implement malloc() in a *multithreaded* environment
portably. You could claim that C doesn't support multithreading, but I highly
doubt your going to convince any vendor that they should shut off their
multithreading support based on this argument.

Now your shifting the goalposts.

How so? The two are related. Writing a heap manager requires that you are
aware of multitasking considerations. If I want to extend the heap manager, I
have to solve them each one by one in different ways on different platforms.
And, of course, what sort of expectation of portability would an end user have
knowing that the library itself had to be carefully hand ported?

Compare this to the string library that I wrote (http://bstring.sf.net/). Its
totally portable. Although I don't have access to MAC OS X, and don't use gnu
C on Linux, in order to test to make sure, I know for a fact that end users
from both platforms have downloaded and have/are actively using it. I know it
works in 16 bit and 32 bit DOS/Windows environments, etc., etc. Its totally
portable, in the realworld sense of portable (semantically, as well as
syntactically.) The point is that the end users know that there is absolutely
0 risk of losing portability or having porting problems because of the use of
this library.

Third party tools for advanced heaps makes little sense. It would only be
worth consideration if it were actively ported to many platforms -- which
increases its cost. I.e., for whatever set of platforms I am considering using
such a library, I am paying for the development cost of every playform that it
would be ported to in the price of using it. Its also highly useless for new
hardware platforms which are in development. Even having access to the source
is of lesser value if there are platform specifics in each instance that
requires munging just to port it.

The point is, if the C standard were simply to add this functionality straight
into the library, then it would be each compiler vendor's responsibility to add
this functionality into the compiler. And the functionality would be
inherently portable as a result. The requirement of multitasking support would
then be pushed back into the vendors lap -- i.e., the people who have
introduced this non-portable feature in the first place.

By dictating its existence in
the library, it would put the responsibility of making it work right in the
hands of the vendor without affecting the C standards stance on not
acknowledging the need for multithreading.

Obviously, you cannot write a multithreading heap manager portably if
there is no portable standard on multithreading (doh). If you need this
you can always presume POSIX, and be on your way.

You do not understand. The *IMPLEMENTATION* needs to be multithreading aware.
From a programmer's point of view, the multithreading support is completely
transparent. This has no end user impact from a POSIX specification point of
view.

Its actual useful functionality that you just can't get from the C
language, and there's no way to reliably map such functionality to the C
language itself. One is forced to know the details of the underlying
platform to implement such things. Its something that really *should* be
in the language.
I disagree. POSIX is for things like this.
Well, it looks to me you're proposing to have a feature-rich heap
manager. I honestly don't see you this couldn't be implemented portably
without platform-specific knowledge. Could you elaborate?

See my multithreading comment above. Also, efficient heaps are usually
written with a flat view of memory in mind. This kind of is impossible in
non-flat memory architectures (like segmented architectures.)

What does the latter remark have to do with C's suitability for doing it?

I am saying that I don't disagree with you, but I think you are missing the
point. By simply adding the features/function into C, that would make it
defacto portable from the point of view that matters -- programmers of the C
language.

For most flat-memory architectures, its actually very straightforward to add in
all the features that I am requesting. I know this because I've written my own
heap manager, which of course uses platform specific behaviour for the one
platform I am interested in. (It only gets complicated for platforms with
unusual memory, like segmented architectures, which have correspondingly
complicated heap managers today.) This is in stark contrast with the
incredibly high bar of platform-specific complications set by trying to do this
outside of the language.

My proposal allows the programmer to decide what is or is not useful them.
I'm all for that.

Well, I'm a programmer, and I don't care about binary output -- how does your
proposal help me decide what I think is useful to me?

It already did, it seems - You just stated your decision, with regard to
binary output. Fine by me.

Not fine by me. Because some other programmer who's code I have to look at
will use it, and I won't have any idea what it is.

I'm saying that you could have &&&, |||, but just don't defined what they
actually do. Require that the programmer define what they do. C doesn't have
type-specific functions, and if one were to add in operator overloading in a
consistent way, then that would mean that an operator overload would have to
accept only its defined type.

Ok, so the language should have a big bunch of operators, ready for the
taking. Incidentally, Mathematica supports this, if you want it badly.

Hey, its not me -- apparently its people like you who wants more operators.

Just a dozen or so! But _standardized_ ones.

And you don't see the littlest problem with this proposal? If you add a dozen,
you'd better be sure that they are last dozen that anyone could possibly want
to add to the language. Their value add would have be worth the pain of having
everyone learn about 12 new symbols.

My point is that no matter what operators get added to the C language, you'll
never satisfy everyone's appetites. People will just want more and more,
though almost nobody will want all of what could be added.

My solution solves the problem once and for all. You have all the operators
you want, with whatever semantics you want.

That's too much freedom for my taste. If I would want this kind of
thing, I would yank out lex and yacc and code my own language.

Well how is that different from adding just &&& and |||? If you *REALLY
REALLY* want them, then why don't you yank out lex and yacc and code up a new
language?
I don't know if you've ever experienced the mispleasure of having to
maintain code that's not written by yourself, but it's difficult enough
as it is.
Do it all the time.
[...] Adding new operators might be interesting from a theoretical
perspective, but it surely is a bad idea from a broader software
engineering perspective.
Its no worse than trying to add in &&& or ||| today.

If you see one of these new operators, you can just view it like you view an
unfamilliar function -- you'll look up its definition obviously.

There is an important difference: functions have a "name" that has a
mnemonic function.

But the name may be misleading -- as is the case, more often than not, just
reflecting the thought of the original instance by the original programmer who
maybe cut and paste it from somewhere else.
[...] Operators are just a bunch of pixels with no link to
anything else. It's only by a lot of repetition that you get used to
weird things like '<' and '>'. I don't know about you, But I used to do
Pascal before I switched to C. It took me quite some time before I got
used to "!=".
And how long will it take the rest of us to get used to your weird &&& or |||
operators?

What makes your code readable is adherence to an agreed upon coding
> standard that exists outside of what the language defines.

There are several such standards for identifier names. No such standard
exists for operator names, except: use familiar ones; preferably, steal
them from other languages.

Sounds like a reasonable convention to me. How about: All new operators must
be defined in a central module named ----. Or: Only these new operators may be
added as defined by ... yada, yada, yada. The coding standards are just
different.
[...] The common denominator of all the identifier
standards is: "use meaningful names". I maintain that there is no
parallel for operators; there's no such thing as a "meaningful"
operator, except when you have been drilled to know their meaning. Your
proposal is in direct collision with this rather important fact of how
the human mnd seems to work.
Just like freeform variable names, there is the same incumberance of the
programmers managing the meaning of the symbols for more generic operators.
You have not made a sufficient case to convince me that there is a real
difference between the two.

<<< and @ are nice though.
BTW -- odd how you actually thought that these were "nice". Do you think you
would have a hard to remembering them? Would it wrankle on your brain because
they were odd and unfamilliar operators that are new to the language? Would
you have a tendancy to write abusive code because of the existence of these new
operators?

Do you think perhaps its possible to use an arbitrarily extendable operator
mechanism in order to *clarify* or make code actually more maintainable?
Yes I'm sure the same trick works for chars and shorts. So how do you
widen a long long multiply?!?!? What compiler trick are you going to
hope for to capture this? What you show here is just some trivial
*SMALL* multiply, that relies on the whims of the optimizer.

Well, I'd show you, but it's impossible _in principle_. Given that you
are multiplying two expressions of the widest type supported by your
compiler, where would it store the result?

In two values of the widest type -- just like how just about every
microprocessor which has a multiply does it:

high *% low = a * b;

Hmmm. I hate to do this again, but could you provide semantics? Just to
keep things manageable, I'd be happy to see what happens if high, low,
a, and b are any possible combinations of bit-widths and signedness.
Could you clearly define the meaning of this?

a, b, high and low must be integers. The signedness of the result of a * b (as
if it were not widened) dictates the result signedness. Coercion will happen
as necessary when storing to high and low. The whole expression will have a
side effect of returning high.

Well, I don't know if these dozen-or-so big-number 'powermod' operations
that are needed to establish an SSL connection are such a big deal as
you make it.
Its not me -- its Intel, IBM, Motorola, Sun and AMD who seem to be obsessed
with these instructions.

I don't see them working the committees to get these supported in
non-assembly languages.

That's because they don't care about how the it goes down once it hits
software. Someone has to pay something for a platform specific library? -- who
cares, as long as they sell their hardware. These same people didn't get
together to define the C standard did they? Why would they bother with an
extension just for widening multiplies?

Instead the hardware people waste their time on trivial little specifications
like IEEE-754, which the C standards idiots don't bother to look at until 15
years later.
[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.
The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.

Of course Amazon, Yahoo and Ebay and most banks are
kind of obsessed with them too, even if they don't know it.

I think you would find that bignum operations are a small part of the
load on e-commerce servers.

According to a paper by Intel, widening multiply accounts for something like
30% of the load on typical e-commerce transactions (typing in your credit card
over the net in a way that can't be snooped.) One single assembly instruction
(one *bundle* on Itanium) holds a whole server down for 30% of its computation,
versus the total 100K line e-commerce software required to the do the rest.
That's why HW manufacturers are keen on the number of transistors they spend
making this one opeartion reasonably fast.
[...] All RSA-based protocols just do a small
amount of bignum work at session-establishment time to agree to a key
for a shared-secret algorithm.
This is only useful for much larger secure transations like ssh, or an
encrypted phone call or something. E-commerce is a much smaller, one shot
transaction, where the RSA computation dominates.

>Many languages exists where this is possible, they are called
>"assembly". There is no way that you could come up with a well-defined
>semantics for this.

carry +< var = a + b;

It looks cute, I'll give you that. Could you please provide semantics?
It may be a lot less self evident than you think.
How about:

- carry is set to either 1 or 0, depending on whether or not a + b overflows
(just follow the 2s complement rules of one of a or b is negative.)

Hang on, are we talking about "overflow" or "carry" here? These are two
different things with signed numbers.

What happens if a is signed and b is unsigned?

My intend was for the operation to follow the semantics of the x86 ADC assembly
instruction. The point is that this instruction is known to be proper for
doing correct bignum additions.

- var is set to the result of the addition; the remainder if a carry occurs.

What happens if the signedness of var, a, and b are not equal?

It just behaves like the ADC x86 assembly instruction, the details of which I
will not regurgitate here.
What happens if the bit-widths of var, a, and b are not equal?
The bit-widths would be converted as if the (a + b) operation were happening in
isolation, to match C language semantics.

- The whole expression (if you put the whole thing in parenthesese) returns
the result of carry.

.... So this would presume the actual expression is: "+< var = a + b" .
There's no need to introduce a mandatory "carry" variable, then.

True. Is this a problem? Perhaps you would like it to return var as a side
effect instead to avoid this redundancy? I don't have that strong of a feeling
on it.
In fact, if is were only interested in the carry, I'd be out of luck:
still need the 'var'. That's a bit ugly.
You could just omit it as a degenerate form:

+< = a + b
Basically, this is a C-esque syntax for a tuple assignment which
unfortunately is lacking in C:

(carry, value) = a+b
Yeah see, but the problem is that this encompasses existing C-language forms.
For all I know this might be legal C syntax already (I wouldn't know, I just
don't use the "," operator in this way) in which case we are kind of already
dead with backwards compatibility. There's also nothing in that syntax to
indicate some new thing was happening that is capturing the carry.

Ah, I see you've never implemented a non-table-driven CRC or a binary
greatest common divisor algorithm.

You can find a binary gcd algorithm that I wrote here:

http://www.pobox.com/~qed/32bprim.c

That's not the "binary GCD algorithm", that's just Knuths version that
avoids modulos. Below is a binary GCD.

Sorry, a previous version that I never put out on the web used the binary
algorithm. I tested Knuths as much faster and thus updated it, and forgot that
I had done this.

The specific operations I am citing make a *HUGE* difference and have billion
dollar price tags associated with them.

These numbers you made up from thin air, no? otherwise, I'd welcome a
reference.

Widening multpilies cost transistor on the CPU. The hardware algorithms are
variations of your basic public school multiply algorithm -- so it takes n^2
transistors to perform the complete operation, where n is the largest bit
word that the machine accepts for the multiplier. If the multiply were not
widened they could save half of those transistors. So multiply those extra
transistors by the number of CPUs shipped with a widening multipliy (PPC,
x86s, Alphas, UltraSparcs, ... etc) and you easily end up in the billion
dollar range.

This is probably the most elaborate version of "yes, I made these
numbers up from thin air" I've ever came across :-)

But I didn't. I used to work with one of these companies. People spend time
and consideration on this one instruction. You could just feel the impact that
this one instruction was going to have, and the considerations for the cost of
its implementation. I could easily see a quarter million just in design
effort, then some quarter million in testing, not to mention the cost of the
extra die area once it shipped -- and this is inside of *ONE* of these
companies, for *ONE* chip generation.

For example, inside of Intel, they decided that they were going to reuse their
floating point multiplier for their widening integer add for Itanium. But that
meant that the multiplier had to be able to do 128 bit multiplies (as opposed
to 82 bits, which is all the Itanium would have apparently needed) and
couldn't run a floating point and integer multiply at the same time. This has
non-trivial layout and design impact on the chip.

I understand the need for the C language standard to be applicable to as
many platforms as possible. But unlike some right shift detail that you
are talking about, the widening multiply hardware actually *IS* deployed
everywhere.
Yup. And it is used too. From machine language.

And *only* machine language. That's the point.

Sure is. Several good big-number libraries are available that have
processor-dependent machine code to do just this.

And that's the problem. They have to be hand written in assembly. Consider
just the SWOX Gnu multiprecision library. When the Itanium was introduced,
Intel promised that it would be great for e-commerce.

Correction: the Intel marketing department promised that it would be
great for e-commerce.

I'll be sure to go track down the guy who gave an 2-hour long presentation
showing the guts of a clever 56/64 bit carry avoiding bignum multiply algorithm
that Intel was pushing for Itanium and SSE-2, that he's really just a marketing
guy. Intel's claims were real -- they had working code in-house.

So they projected performance results for
the Itanium without having code available to do what they claim. So people
who wanted to consider using an Itanium system based on its performance for
e-commerce were stuck -- they had no code, and had to believe Intel's claims,
or SWOX's as to what the performance would be.

The only thing your example shows is that a marketing angle sometimes
doesn't rhyme well with technical realities.

No, it shows that without having a proper path, technology can be bogged down
by inane weaknesses in standards. Intel had its *C* compilers ready to go for
the Itanium *LONG* before any of this happened. Even to this day, we are
*STILL* waiting for the code to make it into GMP:

http://www.swox.com/gmp/gmp-speed.html

The grey bars indicate where they *think* the performance will be (because they
can't get their hands on the platform, or because they are still hacking on the
code) and the pink bars are actual delivered performance. So is Itanium really
fast or really slow at this? From that chart its impossible to tell for sure.

OTOH, if instead, the C language had exposed a carry propogating add, and a
widening multiply in the language, then it would just be up to the Intel
*compiler* people to figure out how to make sure the widening multiply was
used optimally, and the SWOX/GMP people would just do a recompile for baseline
results at least.

I would guess that Intel, being both a compiler maker and the IA64
manufacturer, could have introduced a macro widemul(hr,lr,a,b) to do
this, and help the SWOX guys out a bit?

They could have. But what kind of relationship do you think a proprietary
company like Intel has with a bunch of GPL geeks?
I don't know SWOX; what do they use for bignum multiplication?
Karatsuba's algorithm?

I think they have an option for that. But from my recollection of having
looked into this, by the time Karatsuba is useful, more advanced methods like
Toom Cook or straight to FFTs become applicable as well.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 13 '05 #128

Arthur J. O'Dwyer

On Sat, 6 Dec 2003, Sidney Cadot wrote:

Paul Hsieh wrote:
aj*@nospam.andrew.cmu.edu says...
Not the parser so much as the lexer. Is "a %^&%*!^ b" three tokens,
four, five, six, seven, eight, or nine? It depends on the semantics
of the code we've already translated.
Note that this is *NOT*, repeat *NOT*, an idea that will ever make
it into the C programming language, for this reason -- as expressed
in this subthread, it would break almost *every* C-parsing tool on
the market.

So would adding &&& and |||. Remember that my premise is that if one is
motivated to add one of those, why not add in something more general?

Bad comparison. Support for &&& and ||| would be infinitely easier to
add than support for what you propose.

Good comparison. Support for &&& and ||| is exactly as likely to
break existing tools, and is exactly as likely to make it into a
future version of C. [There may not be a firm cause-and-effect there,
but it is correlated.]

Remember *my* premise: *If* one were motivated to add one of those,
why not add in wings to pigs? :-)

-Arthur

Nov 13 '05 #129

Arthur J. O'Dwyer wrote:

On Sat, 6 Dec 2003, Sidney Cadot wrote:
Paul Hsieh wrote:
aj*@nospam.andrew.cmu.edu says...
[...]

So would adding &&& and |||. Remember that my premise is that if one is
motivated to add one of those, why not add in something more general?
Bad comparison. Support for &&& and ||| would be infinitely easier to
add than support for what you propose.

Good comparison. Support for &&& and ||| is exactly as likely to
break existing tools, and is exactly as likely to make it into a
future version of C. [There may not be a firm cause-and-effect there,
but it is correlated.]
Yes, &&& and ||| would break existing tools, but there is precedent for
that (introduction of // comments). The not unimportant difference is
that adding support for &&& and ||| would be rather trivial, while
support for Paul's proposal is a complete nightmare.

As to the likelihood of either feature making it into the C standard: I
disagree. The chances of ||| and &&& being added is many orders of
magnitudes greater than that of the the "operator introduction" feature
championed by Paul. Still very close to zero, of course. One has to
approach probabilities of this magnitude logarthmically :-)
Remember *my* premise: *If* one were motivated to add one of those,
why not add in wings to pigs? :-)

This did not go down well at all at sci.bio.genetic-engineering.pigs. It
seems that lack of vision is not limited to c.l.c ;-)
Best regards,

Sidney

Nov 13 '05 #130

Paul Hsieh wrote:

Look, nobody uses K&R-style function declarations anymore. The reason is
because the ANSI standard obsoleted them, and everyone picked up the ANSI
standard. That only happened because *EVERYONE* moved forward and picked up
the ANSI standard. One vendor is irrelevant.
Ok. Can't speak for MW, but I think that by the end of 2007 we'll have
near-perfect C99 compilers from GNU, Sun, Microsoft, and Intel. Odds now
down to at 5:1; you're in?

With the *and* in there? Sure. Since Microsoft alone will not do it
(otherwise why not back port MFC to C?), the GNU people may decide that the
last 10% just isn't worth it, Sun has two other languages to worry about that
take precidence in terms of development resources, and once C++0x emerges, C99
development by any of these vendors will almost certainly be halted. I'd have
to be wrong on all of these guys to lose this.

That's why the odds are down to 1:5 :-) Four years is quite a long time...

Ok. I agree with you on extending the capabilities of the preprocessor
in general, although I can't come up with actual things I miss in
everyday programming. I run into this every now and then. For example, I was recently trying to
solve the following problem: Create a directed graph on n points, each with an
out degree of d (I am concerned with very small d, like 2), such that the path
length between any pair of points is minimized.

Turns out that this is a problem whose complexity grows super-exponentially
with very small values of n. Despite my best efforts, I don't know the answer
for n=11, d=2 for example (I know its either 3 or 4, but I can't prove that its
not 3). More startling is the possibility that n=12,d=2 might have a smaller
latency (but I don't know that either)!
I used to do programming contests in university (both as a participant
and as a judge later on), so I quite like this sort of problem. Care to
provide some background (via regular e-mail), I think I'd like to think
about this for a bit.
Anyhow, the point is that the only way to have a chance to squeeze enough
computational juice out of my PC to solve this, I had to hard code huge amounts
of the code, and use a lot of bit twiddling tricks just for each special case.
I used the preprocessor macros as much as possible to make my code manageable,
but in order to change n, I actually have to *modify* the code by adding and
changing *n* lines of code. There's not much I can do about this, I would have
no chance to solve this otherwise.
Ok, that's a case. But wouldn't it have been better then just to make a
program generator (written in C or something else)? Or perhaps use the
m4 macro processor?

I'm saying that trying to fix C's intrinsic problems shouldn't start or end
with some kind of resolution of call stack issues. Anyone who understands
machine architecture will not be surprised about call stack depth limitations.

It's the task of a standard to spell these out, I think.
There are far more pressing problems in the language that one would like to
fix.

Yes, but most things that relate to the "encouraging of writing
extremely unsound and poor code", as you describe C, would be better
fixed by using another language. A lot of the inherently unsafe things
in C are sometimes needed, when doing low-level stuff.

Why is UB in isgraph(-1) needed? Why is UB in gets() needed? Why is the fact
that fgets() skips over '\0' characters needed? Why is a non-portable right
shift on signed integers needed (especially considering the one on unsigned
*is* portable)?

These are side cases, that are at least mentioned in the standard. No
big deal, IMHO. This is much more of a big deal:

int count_nodes(struct Tree *t)
{
return t ? count_nodes(t->left)+count_nodes(t->right)
: 0;
}

Things like this are ubiquitous in many kinds of code. Trees with a big
depth are not uncommon. How can I ever claim such a function is portable
if I have no guarantee whatsoever that it will work on some architectures?

[powerful heap manager]

>But a third party library can't do this portably.

I don't see why not?

Explain to me how you implement malloc() in a *multithreaded* environment
portably. You could claim that C doesn't support multithreading, but I highly
doubt your going to convince any vendor that they should shut off their
multithreading support based on this argument.
Now your shifting the goalposts. How so?
Since in your original statement there was no mention of multithreading.
Allow me to refresh your memory:

"But a third party library can't do this portably. Its actual useful
functionality that you just can't get from the C language, and there's
no way to reliably map such functionality to the C language itself.
One is forced to know the details of the underlying platform to
implement such things. Its something that really *should* be in the
language." [Paul Hsieh, 4 posts up]

See?
The two are related. Writing a heap manager requires that you are
aware of multitasking considerations.
Nonsense. There's many real hardware out there where multitasking is a
non-issue.
If I want to extend the heap manager, I have to solve them each one by one
in different ways on different platforms
Nonsense. Unless you need support for something inherently
platform-specific like multithreading.
And, of course, what sort of expectation of portability would an end user have
knowing that the library itself had to be carefully hand ported?
"None whatsoever", I suspect the answer to this rhetorical question
would be? I still don't see a problem writing a super-duper heap manager
on top of malloc().
Compare this to the string library that I wrote (http://bstring.sf.net/). Its
totally portable. Although I don't have access to MAC OS X, and don't use gnu
C on Linux, in order to test to make sure, I know for a fact that end users
from both platforms have downloaded and have/are actively using it. I know it
works in 16 bit and 32 bit DOS/Windows environments, etc., etc. Its totally
portable, in the realworld sense of portable (semantically, as well as
syntactically.) The point is that the end users know that there is absolutely
0 risk of losing portability or having porting problems because of the use of
this library. Third party tools for advanced heaps makes little sense. It would only be
worth consideration if it were actively ported to many platforms -- which
increases its cost. I.e., for whatever set of platforms I am considering using
such a library, I am paying for the development cost of every playform that it
would be ported to in the price of using it. Its also highly useless for new
hardware platforms which are in development. Even having access to the source
is of lesser value if there are platform specifics in each instance that
requires munging just to port it. The point is, if the C standard were simply to add this functionality straight
into the library, then it would be each compiler vendor's responsibility to add
this functionality into the compiler.
Sure but then the compiler would be more expensive to make, and I would
have to pay more $$$ for a feature I don't need.
And the functionality would be
inherently portable as a result. The requirement of multitasking support would
then be pushed back into the vendors lap -- i.e., the people who have
introduced this non-portable feature in the first place.
Adding multithreading support to a language is a can of worms. I don't
think you quite know what you're suggesting here. Many difficult
semantic issues to resolve. The fun thing about C is that it's simple.
By dictating its existence in
the library, it would put the responsibility of making it work right in the
hands of the vendor without affecting the C standards stance on not
acknowledging the need for multithreading.
Huh? The C standard describes the library.

Obviously, you cannot write a multithreading heap manager portably if
there is no portable standard on multithreading (doh). If you need this
you can always presume POSIX, and be on your way. You do not understand. The *IMPLEMENTATION* needs to be multithreading aware.
Perhaps I don't understand. The implementation... of what?

From a programmer's point of view, the multithreading support is completely
transparent.
If you know a way of doing that, you'll make big headlines in the
computer science world. I have yet to see a multithreading model that is
not either severely limited in what it can do or hard to use.
This has no end user impact from a POSIX specification point of view.
Sorry, I don't understand this remark.
See my multithreading comment above. Also, efficient heaps are usually
written with a flat view of memory in mind. This kind of is impossible in
non-flat memory architectures (like segmented architectures.)

What does the latter remark have to do with C's suitability for doing it?

I am saying that I don't disagree with you, but I think you are missing the
point. By simply adding the features/function into C, that would make it
defacto portable from the point of view that matters -- programmers of the C
language.
Well, let's see your proposal then. How would you weave threads in the C
language portably? No handwaving please, you will have to come up with
rock-solid semantics.
For most flat-memory architectures, its actually very straightforward to add in
all the features that I am requesting. I know this because I've written my own
heap manager, which of course uses platform specific behaviour for the one
platform I am interested in. (It only gets complicated for platforms with
unusual memory, like segmented architectures, which have correspondingly
complicated heap managers today.) This is in stark contrast with the
incredibly high bar of platform-specific complications set by trying to do this
outside of the language.
I'm eagerly awaiting your proposal that shows how to do all this in a
platform independent way.

>My proposal allows the programmer to decide what is or is not useful them.

I'm all for that.

Well, I'm a programmer, and I don't care about binary output -- how does your
proposal help me decide what I think is useful to me?

It already did, it seems - You just stated your decision, with regard to
binary output. Fine by me.

Not fine by me. Because some other programmer who's code I have to look at
will use it, and I won't have any idea what it is.

Don't worry, you can look it up in the man pages. Unlike your
user-defined operators... :-)

Hey, its not me -- apparently its people like you who wants more operators.

Just a dozen or so! But _standardized_ ones.

And you don't see the littlest problem with this proposal? If you add a dozen,
you'd better be sure that they are last dozen that anyone could possibly want
to add to the language. Their value add would have be worth the pain of having
everyone learn about 12 new symbols.

Well, lets start with ?< ?> and ?: for now; ?: instead of |||, and I
drop the &&& requests since it is much less useful. It's already in gcc
so it's implementable. Give everybody 10 years to get used to it, then
we may add a couple more.

My point is that no matter what operators get added to the C language, you'll
never satisfy everyone's appetites. People will just want more and more,
though almost nobody will want all of what could be added.

My solution solves the problem once and for all. You have all the operators
you want, with whatever semantics you want.

That's too much freedom for my taste. If I would want this kind of
thing, I would yank out lex and yacc and code my own language. Well how is that different from adding just &&& and |||? If you *REALLY
REALLY* want them, then why don't you yank out lex and yacc and code up a new
language?
Because I want to use them in C programs.
I don't know if you've ever experienced the mispleasure of having to
maintain code that's not written by yourself, but it's difficult enough
as it is.

Do it all the time.
[...] Adding new operators might be interesting from a theoretical
perspective, but it surely is a bad idea from a broader software
engineering perspective.

Its no worse than trying to add in &&& or ||| today.

I disagree. Let's leave it at that, shall we?

If you see one of these new operators, you can just view it like you view an
unfamilliar function -- you'll look up its definition obviously.

There is an important difference: functions have a "name" that has a
mnemonic function.

But the name may be misleading -- as is the case, more often than not, just
reflecting the thought of the original instance by the original programmer who
maybe cut and paste it from somewhere else.

Yes, well, you can practice bad coding style in any language. Names at
least have a fighting chance of being self-explanatory, which is not
true for operators.

[...] Operators are just a bunch of pixels with no link to
anything else. It's only by a lot of repetition that you get used to
weird things like '<' and '>'. I don't know about you, But I used to do
Pascal before I switched to C. It took me quite some time before I got
used to "!=". And how long will it take the rest of us to get used to your weird &&& or |||
operators?
A couple of weeks?

What makes your code readable is adherence to an agreed upon coding

> standard that exists outside of what the language defines.

There are several such standards for identifier names. No such standard
exists for operator names, except: use familiar ones; preferably, steal
them from other languages. Sounds like a reasonable convention to me. How about: All new operators must
be defined in a central module named ----. Or: Only these new operators may be
added as defined by ... yada, yada, yada.
More handwaving... Please humour me, and spell out a reasonable
convention for your operator-introduction.
The coding standards are just different.
I can't tell; I haven't seen a coding standard for operator introduction
yet.

[...] The common denominator of all the identifier
standards is: "use meaningful names". I maintain that there is no
parallel for operators; there's no such thing as a "meaningful"
operator, except when you have been drilled to know their meaning. Your
proposal is in direct collision with this rather important fact of how
the human mind seems to work. Just like freeform variable names, there is the same incumberance of the
programmers managing the meaning of the symbols for more generic operators.
You have not made a sufficient case to convince me that there is a real
difference between the two.
....Have you ever laid eyes on a moderately complicated APL program,
without knowing APL? Or perhaps a Forth program, without knowing Forth?
That would give you some clue as to what a program looks like when it
doesn't use a lot of names, but does use a lot of unknown symbols. The
effect is similar to what written Arabic looks like to people used to
the Roman alphabet, and vice versa.

<<< and @ are nice though. BTW -- odd how you actually thought that these were "nice". Do you think you
would have a hard to remembering them? Would it wrankle on your brain because
they were odd and unfamilliar operators that are new to the language? Would
you have a tendancy to write abusive code because of the existence of these new
operators?
The meanings you gave for these rang some bells with my programming
experience. It leads me to believe you have encountered similar
situations, which gives you some bonus points for credibility in my book
:-) Seriously, I don't think such operators would be a good idea, but
I'd welcome library fuctions (the old fashioned ones, with names) for them.
Do you think perhaps its possible to use an arbitrarily extendable operator
mechanism in order to *clarify* or make code actually more maintainable?
A resounding 'no'. It's an interesting thought, but for me it falls
short on technical and psychological grounds, as I previously tried to
explain.
In two values of the widest type -- just like how just about every
microprocessor which has a multiply does it:

high *% low = a * b;

Hmmm. I hate to do this again, but could you provide semantics? Just to
keep things manageable, I'd be happy to see what happens if high, low,
a, and b are any possible combinations of bit-widths and signedness.
Could you clearly define the meaning of this? a, b, high and low must be integers.
I'm with you so far ... ;-)
The signedness of the result of a * b (as if it were not widened) dictates the result signedness.
Ok.... This would fit, I'll give you that.
Coercion will happen as necessary when storing to high and low.
Ok. Let's call the "as if it were not widened" result X; X=a*b (in a
mathematical sense).

Now before coercion can begin, we'll have to split X; that's the crucial
step of the semantics; pseudo-code:

high = coerce( high_order_value(X) )
low = coerce( low_order_value (X) )

Now all we need is a good definition of high_order_value() and
low_order_value(), based on the signedness and bit-width of X, and
possibly the signedness and bit-width of high/low. How would they look?

The whole expression will have a side effect of returning high.

I'd formulate this as "have a value", otherwise it looks ok. I'm curious
how you are going to define the high_order_value and low_order_value
semantics.

Well, I don't know if these dozen-or-so big-number 'powermod' operations
that are needed to establish an SSL connection are such a big deal as
you make it.

Its not me -- its Intel, IBM, Motorola, Sun and AMD who seem to be obsessed
with these instructions.
Well they "provide" them, if that's what you mean.
I don't see them working the committees to get these supported in
non-assembly languages. That's because they don't care about how the it goes down once it hits
software. Someone has to pay something for a platform specific library? -- who
cares, as long as they sell their hardware. These same people didn't get
together to define the C standard did they? Why would they bother with an
extension just for widening multiplies?
Ok. So your point was, in the first place.... ?
Instead the hardware people waste their time on trivial little specifications
like IEEE-754, which the C standards idiots don't bother to look at until 15
years later.
You shouldn't trivialize IEEE-754, it's one hell of a standard, probably
one of the best I have ever seen. And as you are probably well aware,
there's a perfectly good reason why its only now that the C standard
committee is looking at it (only now hardware supporting it is widely
available). I really don't see your point here.

[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this. The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.
What's so bad about using third-party libraries?
Of course Amazon, Yahoo and Ebay and most banks are
kind of obsessed with them too, even if they don't know it.

I think you would find that bignum operations are a small part of the
load on e-commerce servers.

According to a paper by Intel, widening multiply accounts for something like
30% of the load on typical e-commerce transactions (typing in your credit card
over the net in a way that can't be snooped.)

Reference, please.

"load on typical e-commerce transaction" != "load on e-commerce server".

If the transaction is simply setup session/send credit-card
information/close session, this could be right. However, a jiffy ago we
were talking SSL.
One single assembly instruction (one *bundle* on Itanium) holds a whole server
down for 30% of its computation,
Again, you're equating server load with transaction load.
versus the total 100K line e-commerce software required to the do the rest.
That's why HW manufacturers are keen on the number of transistors they spend
making this one opeartion reasonably fast.
I'm happy they do, really. I love speed of third-party bignum libraries
nowadays.
[...] All RSA-based protocols just do a small
amount of bignum work at session-establishment time to agree to a key
for a shared-secret algorithm.

This is only useful for much larger secure transations like ssh, or an
encrypted phone call or something. E-commerce is a much smaller, one shot
transaction, where the RSA computation dominates.

I'm sure there are situations where this arises, although it's not
likely to happen when I order something with my credit card. More
inter-bank kind of stuff.

Now why would you not simply use a library for this?

>>Many languages exists where this is possible, they are called
>>"assembly". There is no way that you could come up with a well-defined
>>semantics for this.

>carry +< var = a + b;

It looks cute, I'll give you that. Could you please provide semantics?
It may be a lot less self evident than you think.

How about:

- carry is set to either 1 or 0, depending on whether or not a + b overflows
(just follow the 2s complement rules of one of a or b is negative.)

Hang on, are we talking about "overflow" or "carry" here? These are two
different things with signed numbers.

What happens if a is signed and b is unsigned? My intend was for the operation to follow the semantics of the x86 ADC assembly
instruction.
Fun thing is: it handles both signed and unsigned numbers. Not-so-funny
is the fact that the x86 supports (as do most microprocessors) both an
overflow flag and a carry flag. So, armed with this, please elaborate on
the meaning.
The point is that this instruction is known to be proper for
doing correct bignum additions.
It sure is... There are libraries containing hand-written assembly that
do fine :-)

- var is set to the result of the addition; the remainder if a carry occurs.

What happens if the signedness of var, a, and b are not equal? It just behaves like the ADC x86 assembly instruction, the details of which I
will not regurgitate here.
So on adding a signed plus unsigned.... The carry gets a value "as if"
the signed value was actually unsigned? On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semaantics.

What happens if the bit-widths of var, a, and b are not equal? The bit-widths would be converted as if the (a + b) operation were happening in
isolation, to match C language semantics.
Ok, so if either is signed, they will both be signed, and you get back a
useless carry bit. Doesn't sound very good to me.

- The whole expression (if you put the whole thing in parenthesese) returns
the result of carry.

.... So this would presume the actual expression is: "+< var = a + b" .
There's no need to introduce a mandatory "carry" variable, then.

True. Is this a problem? Perhaps you would like it to return var as a side
effect instead to avoid this redundancy? I don't have that strong of a feeling
on it.

No, carry/overflow will do fine, it's what you're most interested in,
most of the time.
In fact, if is were only interested in the carry, I'd be out of luck:
You could just omit it as a degenerate form:

+< = a + b
Two operators sitting next to each other. I think it's ugly, but there
you go.

Basically, this is a C-esque syntax for a tuple assignment which
unfortunately is lacking in C:

(carry, value) = a+b Yeah see, but the problem is that this encompasses existing C-language forms.
For all I know this might be legal C syntax already (I wouldn't know, I just
don't use the "," operator in this way) in which case we are kind of already
dead with backwards compatibility.
Unfortunately, there is no tuple assignment in C. It was just a piece of
pseudo-code to elucidate the meaning.
There's also nothing in that syntax to
indicate some new thing was happening that is capturing the carry.
Suppose we have tuples, we could write (c,v) = a <+ v, where the <+
operator is defined to return a tuple expression. Now tuples, that would
be _really_ nice. Finally I could swap with (a,b)=(b,a).

Ah, I see you've never implemented a non-table-driven CRC or a binary
greatest common divisor algorithm.

You can find a binary gcd algorithm that I wrote here:

http://www.pobox.com/~qed/32bprim.c

That's not the "binary GCD algorithm", that's just Knuths version that
avoids modulos. Below is a binary GCD.

Sorry, a previous version that I never put out on the web used the binary
algorithm. I tested Knuths as much faster and thus updated it, and forgot that
I had done this.

In assembly they are quite evenly matched :-)

>The specific operations I am citing make a *HUGE* difference and have billion
>dollar price tags associated with them.

These numbers you made up from thin air, no? otherwise, I'd welcome a
reference.

Widening multpilies cost transistor on the CPU. The hardware algorithms are
variations of your basic public school multiply algorithm -- so it takes n^2
transistors to perform the complete operation, where n is the largest bit
word that the machine accepts for the multiplier. If the multiply were not
widened they could save half of those transistors. So multiply those extra
transistors by the number of CPUs shipped with a widening multipliy (PPC,
x86s, Alphas, UltraSparcs, ... etc) and you easily end up in the billion
dollar range.

This is probably the most elaborate version of "yes, I made these
numbers up from thin air" I've ever came across :-) But I didn't.
Oh, but you did. Billion is 1,000,000,000 which is quite a huge number.
The only way we humans can handle this is by doing some calculations...
It doesn't help to just throw numbers of this magnitude around without a
reference.
I used to work with one of these companies. People spend time
and consideration on this one instruction. You could just feel the impact that
this one instruction was going to have, and the considerations for the cost of
its implementation.
Sure, it's an important instruction.
I could easily see a quarter million just in design
effort, then some quarter million in testing, not to mention the cost of the
extra die area once it shipped -- and this is inside of *ONE* of these
companies, for *ONE* chip generation.
We're still about 999,500,000 USD short of a billion.
For example, inside of Intel, they decided that they were going to reuse their
floating point multiplier for their widening integer add for Itanium. But that
meant that the multiplier had to be able to do 128 bit multiplies (as opposed
to 82 bits, which is all the Itanium would have apparently needed) and
couldn't run a floating point and integer multiply at the same time. This has
non-trivial layout and design impact on the chip.
Sure does. I estimate the cost of this design work would be in the range
of 10^6 dollars, but not 10^9 dollars.

>I understand the need for the C language standard to be applicable to as
>many platforms as possible. But unlike some right shift detail that you
>are talking about, the widening multiply hardware actually *IS* deployed
>everywhere.

Yup. And it is used too. From machine language.

And *only* machine language. That's the point.

....Which is fine, as far as I am concerned.

Sure is. Several good big-number libraries are available that have
processor-dependent machine code to do just this.

And that's the problem. They have to be hand written in assembly. Consider
just the SWOX Gnu multiprecision library. When the Itanium was introduced,
Intel promised that it would be great for e-commerce.

Correction: the Intel marketing department promised that it would be
great for e-commerce. I'll be sure to go track down the guy who gave an 2-hour long presentation
showing the guts of a clever 56/64 bit carry avoiding bignum multiply algorithm
that Intel was pushing for Itanium and SSE-2, that he's really just a marketing
guy. Intel's claims were real -- they had working code in-house.
Sounds like a technical guy to me, who was given the nice task of
thinking about all this. Probably even had a sheet about "the importance
for e-commerce" as a rationale for all this work. That's all perfectly
fine, except that I still fail to see why it is so important to code
this kind of high-performance code in C instead of assembly.

So they projected performance results for
the Itanium without having code available to do what they claim. So people
who wanted to consider using an Itanium system based on its performance for
e-commerce were stuck -- they had no code, and had to believe Intel's claims,
or SWOX's as to what the performance would be.
It sounds like Intel should've contributed some hand-written assembly to
SWOX, then.
The only thing your example shows is that a marketing angle sometimes
doesn't rhyme well with technical realities. No, it shows that without having a proper path, technology can be bogged down
by inane weaknesses in standards. Intel had its *C* compilers ready to go for
the Itanium *LONG* before any of this happened. Even to this day, we are
*STILL* waiting for the code to make it into GMP:

http://www.swox.com/gmp/gmp-speed.html
The grey bars indicate where they *think* the performance will be (because they
can't get their hands on the platform, or because they are still hacking on the
code) and the pink bars are actual delivered performance. So is Itanium really
fast or really slow at this? From that chart its impossible to tell for sure.
It sounds like Intel should contribute some hand-written assembly to
SWOX, then.
OTOH, if instead, the C language had exposed a carry propogating add, and a
widening multiply in the language, then it would just be up to the Intel
*compiler* people to figure out how to make sure the widening multiply was
used optimally, and the SWOX/GMP people would just do a recompile for baseline
results at least.

I would guess that Intel, being both a compiler maker and the IA64
manufacturer, could have introduced a macro widemul(hr,lr,a,b) to do
this, and help the SWOX guys out a bit?

They could have. But what kind of relationship do you think a proprietary
company like Intel has with a bunch of GPL geeks?

Not a very good one, it seems. Perhaps some PHB should reconsider. These
"GPL geeks" have delivered the gcc compiler, and all. But perhaps they
are also trying to sell their own compiler. Oh, the complexities of
managing... As a technical guy, I say they should just contribute to GMP
and don't make a lot of fuss about it. Perhaps even spin it that they
are pro open-source.
I don't know SWOX; what do they use for bignum multiplication?
Karatsuba's algorithm?

I think they have an option for that. But from my recollection of having
looked into this, by the time Karatsuba is useful, more advanced methods like
Toom Cook or straight to FFTs become applicable as well.

I think Karatsuba is best for the ranges dealt with by RSA.

Best regards,

Sidney

Nov 13 '05 #131

Sidney Cadot <si****@jigsaw.nl> wrote:

Arthur J. O'Dwyer wrote:
On Sat, 6 Dec 2003, Sidney Cadot wrote:
Paul Hsieh wrote:
aj*@nospam.andrew.cmu.edu says...
[...]
So would adding &&& and |||. Remember that my premise is that if one is
motivated to add one of those, why not add in something more general?

Bad comparison. Support for &&& and ||| would be infinitely easier to
add than support for what you propose.
Good comparison. Support for &&& and ||| is exactly as likely to
break existing tools, and is exactly as likely to make it into a
future version of C. [There may not be a firm cause-and-effect there,
but it is correlated.]

Yes, &&& and ||| would break existing tools, but there is precedent for
that (introduction of // comments).

// was added in because so many compilers had already supported them
anyway. This was in keeping with the C standards committee "codifying
standard practices" nonsense. It was already there, all over the
place.
[...] The not unimportant difference is
that adding support for &&& and ||| would be rather trivial, while
support for Paul's proposal is a complete nightmare.

As to the likelihood of either feature making it into the C standard: I
disagree. The chances of ||| and &&& being added is many orders of
magnitudes greater than that of the the "operator introduction" feature
championed by Paul. Still very close to zero, of course. One has to
approach probabilities of this magnitude logarthmically :-)

Probably true, but your suggestion is also infinitely less ambitious,
and infinitely less useful. Your suggestion is like the complex
number of variable length macros thing in C99. They are interesting
additions which provide some kind of epsilon more functionality, but
which don't compensate with the fact that you are leaving all C89
compilers behind.

My suggestion is more like "restrict" in C99 which adds functionality
that just is not in the language in any way shape or form. Some
people would use it no matter what -- because it adds something so
totally powerful in terms of functionality that its worth losing the
compatibility with C89.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 13 '05 #132

In article <br**********@news.tudelft.nl>, si****@jigsaw.nl says...

Paul Hsieh wrote:
I run into this every now and then. For example, I was recently trying to
solve the following problem: Create a directed graph on n points, each with an
out degree of d (I am concerned with very small d, like 2), such that the path
length between any pair of points is minimized.

Turns out that this is a problem whose complexity grows super-exponentially
with very small values of n. Despite my best efforts, I don't know the answer
for n=11, d=2 for example (I know its either 3 or 4, but I can't prove that its
not 3). More startling is the possibility that n=12,d=2 might have a smaller
latency (but I don't know that either)!
I used to do programming contests in university (both as a participant
and as a judge later on), so I quite like this sort of problem. Care to
provide some background (via regular e-mail), I think I'd like to think
about this for a bit.

This isn't a rigged contest problem. This is a real world problem. Imagine a
*SERVERLESS* online game topology. How do you communicate "game-moves" in
realtime between players in a way that is most efficient and with the least
perceived lag, given a limited amount of bandwidth? Solve the above problem,
and you have a solution for this problem. (Actually for games you can crank d
a little higher than 2 -- the exact problem that I am looking at is different,
and I am loathe to increase d above 2.)

You can think about this problem all you like. If you want to know if I am
fully of it or not, see if you can get above n=10,d=2.

Anyhow, the point is that the only way to have a chance to squeeze enough
computational juice out of my PC to solve this, I had to hard code huge amounts
of the code, and use a lot of bit twiddling tricks just for each special case.
I used the preprocessor macros as much as possible to make my code manageable,
but in order to change n, I actually have to *modify* the code by adding and
changing *n* lines of code. There's not much I can do about this, I would have
no chance to solve this otherwise.

Ok, that's a case. But wouldn't it have been better then just to make a
program generator (written in C or something else)? Or perhaps use the
m4 macro processor?

Tell me how you do this in a way that's compatible with testing it against 3
different compilers (one of the *OTHER* techniques I use to go after
performance ...) two of which are very unfriendly to being commandline driven,
while still making sense from a debugging point of view?

I'm saying that trying to fix C's intrinsic problems shouldn't start or end
with some kind of resolution of call stack issues. Anyone who understands
machine architecture will not be surprised about call stack depth limitations.

It's the task of a standard to spell these out, I think.

There are far more pressing problems in the language that one would like to
fix.

Yes, but most things that relate to the "encouraging of writing
extremely unsound and poor code", as you describe C, would be better
fixed by using another language. A lot of the inherently unsafe things
in C are sometimes needed, when doing low-level stuff.

Why is UB in isgraph(-1) needed? Why is UB in gets() needed? Why is the fact
that fgets() skips over '\0' characters needed? Why is a non-portable right
shift on signed integers needed (especially considering the one on unsigned
*is* portable)?

These are side cases, that are at least mentioned in the standard. No
big deal, IMHO.

Its a big deal to me. I don't own any material on the C language that
specifies all of this. I actually go reference source code for a C library I
happen to have access to. But, of course, these sources and a real world
programmer's sense leads me to the very wrong conclusion that isgraph(-1) just
returns false.
[...] This is much more of a big deal:

int count_nodes(struct Tree *t)
{
return t ? count_nodes(t->left)+count_nodes(t->right)
: 0;
}

Things like this are ubiquitous in many kinds of code.
Uh ... dude, that just looks like an expensive way of writing:

int count_nodes (struct Tree *t) {
return 0;
}

I don't suppose you did very well on these programming contests did you? Btw,
weren't you railing against trigraphs?
[...] Trees with a big depth are not uncommon.
Assuming you actually wanted to count the *NODES* of the tree, and were a
little concerned about performance try this:

static int count_nodes_inner (const struct Tree *t) {
return (t->left ? (1 + count_nodes_inner (t->left )) : 0) +
(t->right ? (1 + count_nodes_inner (t->right)) : 0);
}

int count_nodes (const struct Tree *t) {
return t ? count_nodes_innter (t) : 0;
}

as it avoids the leaf node tail calls, which will roughly double the
performance.
[...] How can I ever claim such a function is portable if I have no
guarantee whatsoever that it will work on some architectures?
Huh? Are you talking about the stack thing again? Dude: malloc (65536) is
not portable by the same reasoning, and:

int fib(int n) {
if (n <= 2) return 1;
return fib(n-1) + fib(n-2);
}

isn't going to work in any relevant way in any language, for a call like
f(1000000).

>[powerful heap manager]
>
>>But a third party library can't do this portably.
>
>I don't see why not?
You simply do not understand. Malloc and multithreading are intertwined in the
implementation. If you have multithreading, they you have to lock and unlock
your heap, or use other strategies to make sure one thread isn't in the middle
of freeing while another is mallocing. If you don't, it introduces something
called a "race condition". Without exact considerations malloc and free would
simply fail to function properly in a multithreaded environment.

The point is that any extensions of the heap's semantics end up having exactly
the same problem. For example, how would one write a "freeall()" on top of
malloc? (Ignoring the fact that writing it *ON TOP* of malloc/free loses all
the potential *MASSIVE* performance advantage) Well presumably you'd have some
hidden part at the head of each allocation creating a linked list of
outstanding mallocs that were removed upon frees, and the freeall() function
would just walk through them calling free one by one. In a multithreaded
environment, once again, you'd be in race condition city. Same problem for
trying to build seperated heaps (virtual or otherwise.)

Now the C language standard says nothing about this of course, because C
doesn't say anything about multithreading at all. But that doesn't make the
problem go away -- most of the commonly used C compilers today include
multitasking extensions.

The point is that there is a *HUGE* difference between pushing such
functionality into the C standard, versus trying to obtain them from 3rd
parties. If it were put into the standard, then each vendor could modify their
heap implementation (a quite doable endeavour, since each vendor has already
solved the heap lock/unlock problem.) But via 3rd parties, the library vendor
has to learn and keep up to date with the OS primitives for each platform they
support.
Explain to me how you implement malloc() in a *multithreaded* environment
portably. You could claim that C doesn't support multithreading, but I highly
doubt your going to convince any vendor that they should shut off their
multithreading support based on this argument.Now your shifting the goalposts.
How so?

Since in your original statement there was no mention of multithreading.

Mentioning the implementation malloc implicitely brings multithreading into the
discussion. That is what you don't understand.
Allow me to refresh your memory:

"But a third party library can't do this portably.
Because of the multithreading issue ...
[...] Its actual useful
functionality that you just can't get from the C language, and there's
no way to reliably map such functionality to the C language itself.
One is forced to know the details of the underlying platform to
implement such things. Its something that really *should* be in the
language." [Paul Hsieh, 4 posts up]

See?
Yes -- the implicit answer to the "WHY" question that follows each statement,
is "because of the multithreading problem". If you were aware of the issues,
I'm sure this wouldn't be quite so bewildering to you.

The two are related. Writing a heap manager requires that you are
aware of multitasking considerations.

Nonsense. There's many real hardware out there where multitasking is a
non-issue.

Obviously I meant for platforms that support multithreading -- and BTW it has
nothing to do with hardware. Multithreading is a software feature, not a
hardware one. (You're probably thinking of pre-emption, which is a totally
different thing.)

If I want to extend the heap manager, I have to solve them each one by one
> in different ways on different platforms

Nonsense. Unless you need support for something inherently
platform-specific like multithreading.

If the feature of multithreading is in the platform, then extensions to malloc
must take these into consideration. Do you understand the difference this and
what you said? malloc/calloc/realloc/free, or any of the extensions I propose
have no API, modes, or anything that exposes anything about multithreading to
the end user. Its all in the internals.
[...] I still don't see a problem writing a super-duper heap manager on top
of malloc().
Well, that much is obvious ...

The point is, if the C standard were simply to add this functionality straight
into the library, then it would be each compiler vendor's responsibility to add
this functionality into the compiler.

Sure but then the compiler would be more expensive to make, and I would
have to pay more $$$ for a feature I don't need.

I have written such a malloc myself (its almost much faster than typical heap
managers, and contains rich built-in debugging facilities) that I would be only
too happy to donate to the C standards committee on the condition that they
endorse my extensions. Its been well tested, and is easily portable to any
flat address implementations and whose multitasking mechanisms include one of
locks, mutexes (preferred) or semaphores.

And the functionality would be
inherently portable as a result. The requirement of multitasking support would
then be pushed back into the vendors lap -- i.e., the people who have
introduced this non-portable feature in the first place.

Adding multithreading support to a language is a can of worms.

Which I am not suggesting. And its not impossible -- look at the language
ErLang. The VxWorks or QNX or OSE APIs also suggests very good abstractions
for multithreading that should be reasonably applicable to any C-style
language.
[...] I don't
think you quite know what you're suggesting here. Many difficult
semantic issues to resolve. The fun thing about C is that it's simple.
I *DO* know what I am suggesting. This heap stuff is not just off the top of
my head. I've written applications where I've *needed* these features.

By dictating its existence in
the library, it would put the responsibility of making it work right in the
hands of the vendor without affecting the C standards stance on not
acknowledging the need for multithreading.
Huh? The C standard describes the library.

Right, but the library doesn't specify anything about multitasking. Its only
implementations that need to deal with it.

Obviously, you cannot write a multithreading heap manager portably if
there is no portable standard on multithreading (doh). If you need this
you can always presume POSIX, and be on your way.
You do not understand. The *IMPLEMENTATION* needs to be multithreading aware.

Perhaps I don't understand. The implementation... of what?

malloc/free/calloc/realloc or any extensions on top of them.

What does the latter remark have to do with C's suitability for doing it?
I am saying that I don't disagree with you, but I think you are missing the
point. By simply adding the features/function into C, that would make it
defacto portable from the point of view that matters -- programmers of the C
language.

Well, let's see your proposal then. How would you weave threads in the C
language portably? No handwaving please, you will have to come up with
rock-solid semantics.

I'm not proposing anything about threads. I am talking about the heap.
Extending the heap means you have to be aware of multithreading if your
platform supports it.

For most flat-memory architectures, its actually very straightforward to add in
all the features that I am requesting. I know this because I've written my own
heap manager, which of course uses platform specific behaviour for the one
platform I am interested in. (It only gets complicated for platforms with
unusual memory, like segmented architectures, which have correspondingly
complicated heap managers today.) This is in stark contrast with the
incredibly high bar of platform-specific complications set by trying to do this
outside of the language.

I'm eagerly awaiting your proposal that shows how to do all this in a
platform independent way.

The NULL heap is the system heap.

struct heap * newHeap (struct heap *);
- allocate a subheap from a given heap.
int destroyHeap (struct heap *);
- frees all allocated memory with respect to a given heap (including
sub-heaps), then destroy the heap.
int freeAllH (struct heap *);
- frees all allocated memory with respect to a given heap (including
sub-heaps), but don't destroy the heap.

void * mallocH (struct heap *, size_t);
- Like malloc with respect to a given heap.
void *callocH (size_t, size_t);
- Like calloc with respect to a given heap.
void * reallocH (struct heap *, void *, size_t);
- Like realloc with respect to a given heap.
void * expandH (struct heap *, void *, size_t);
- Like reallocH, but will return NULL instead of moving, and will never
move the pointer.
int freeH (struct heap *, void *);
- Like free with respect to a given heap.

int dgbIsAlloc (struct heap *, void *);
- Returns 1 if, to the best of the library's ability to determine, the
pointer is a properly allocated heap entry. Returns 0, if it can
determine the pointer to not be a properly allocated heap entry.

Notice how there is no mention of multithreading in anything there? Of course,
what is not seen, is that they in fact *ALL* require very exact considerations
for multithreading if the platform supports multithreading.

The value of this is that destroyHeap/freeAllH actually costs about the same as
a handful of frees (even if you've made tens of thousands of outstanding
mallocs.) And expandH can be used to in cases were a complete copy is not
necessary in order to move an allocation to a resized allocation (this feature
would improve the performance of the "Better String Library" in many cases,
that I cannot in any way capture with C's current heap semantics.) Although
dgbIsAlloc is probabilistic (though it would never return 0 on valid heap
pointer), in practice its exactness would be so close as to be a reasonable
realiable mechanism to ensure the correctness of an allocated pointer for
debugging purposes.

>>My proposal allows the programmer to decide what is or is not useful them.
>
>I'm all for that.

Well, I'm a programmer, and I don't care about binary output -- how does your
proposal help me decide what I think is useful to me?

It already did, it seems - You just stated your decision, with regard to
binary output. Fine by me.

Not fine by me. Because some other programmer who's code I have to look at
will use it, and I won't have any idea what it is.

Don't worry, you can look it up in the man pages. Unlike your
user-defined operators... :-)

You would have to look up "%b" ("%db"? -- whatever) in *UP TO DATE* man pages.
(Is *restrict* or the new *complex* type, from C99 properly documented in
*YOUR* current man pages?) My proposed plug-in mechanism would be in modules
or header files that would, one way or another, have to be in your project
somewhere. Presumably, the source would have to be available somewhere, so
there would be *NO QUESTION* as to what it does.

Same comment applies to &&&, ||| versus user defined operators.

Hey, its not me -- apparently its people like you who wants more operators.

Just a dozen or so! But _standardized_ ones.

And you don't see the littlest problem with this proposal? If you add a dozen,
you'd better be sure that they are last dozen that anyone could possibly want
to add to the language. Their value add would have be worth the pain of having
everyone learn about 12 new symbols.

Well, lets start with ?< ?> and ?: for now; ?: instead of |||, and I
drop the &&& requests since it is much less useful. It's already in gcc
so it's implementable. Give everybody 10 years to get used to it, then
we may add a couple more.

My claim is that so few people would use these operators, that the would be
defacto not supported. They would be less useful than bitfields.

My point is that no matter what operators get added to the C language, you'll
never satisfy everyone's appetites. People will just want more and more,
though almost nobody will want all of what could be added.

My solution solves the problem once and for all. You have all the operators
you want, with whatever semantics you want.

That's too much freedom for my taste. If I would want this kind of
thing, I would yank out lex and yacc and code my own language.
Well how is that different from adding just &&& and |||? If you *REALLY
REALLY* want them, then why don't you yank out lex and yacc and code up a new
language?

Because I want to use them in C programs.

But I want them *not* to be used in C programs -- unless they are fully
documented in the source. Always.

What makes your code readable is adherence to an agreed upon coding

> standard that exists outside of what the language defines.

There are several such standards for identifier names. No such standard
exists for operator names, except: use familiar ones; preferably, steal
them from other languages.
Sounds like a reasonable convention to me. How about: All new operators must
be defined in a central module named ----. Or: Only these new operators may be
added as defined by ... yada, yada, yada.

More handwaving... Please humour me, and spell out a reasonable
convention for your operator-introduction.
The coding standards are just different.

I can't tell; I haven't seen a coding standard for operator introduction
yet.

I just gave some. What's hand waving is saying something about stack limits --
you haven't even suggested a way in which that *COULD* be possible to put down
on a piece of paper. Its *OBVIOUS* what I am talking about above. You just
want something specific to have something arbitrary to disagree with. Just
pick a convention -- if something bothers you about certain operators, spell it
out in your convention.

Just like freeform variable names, there is the same incumberance of the
programmers managing the meaning of the symbols for more generic operators.
You have not made a sufficient case to convince me that there is a real
difference between the two.

...Have you ever laid eyes on a moderately complicated APL program,
without knowing APL?

No, but I have looked at the results of the annual obfuscated C code constest.
I have never suggested that you *COULDN'T* abuse the idea. But you can abuse
just about *ANYTHING* in C. You don't think adding in ||| or &&& would add to
the potential for noise in those contests? I gave a very wordy description of
how to define an operator -- its kind of hard to obfuscate something completely
while you have so many hints (a keyword like _Operator, a whole function-like
declaration, etc) as to what it is. You would have to hide it behind macros to
truly obfuscate it, but you can use macros for arbitrary obfuscation
regardless.

><<< and @ are nice though.
BTW -- odd how you actually thought that these were "nice". Do you think you
would have a hard to remembering them? Would it wrankle on your brain because
they were odd and unfamilliar operators that are new to the language? Would
you have a tendancy to write abusive code because of the existence of these new
operators?

The meanings you gave for these rang some bells with my programming
experience. It leads me to believe you have encountered similar
situations, which gives you some bonus points for credibility in my book
:-) Seriously, I don't think such operators would be a good idea, but
I'd welcome library functions (the old fashioned ones, with names) for them.

Well I'd never use &&& or ||| as you proposed them for any reason whatsoever.
At least with the <<< and @ operators, both add just enough useful
functionality, that I *would* consider using them, knowing I would be non-
portable. That's the point of all this -- its all just opinion. With
redefinable operators someone's arbitrary opinion doesn't enter into it -- you
can make it conform to whatever opinion you have. If you disagree with someone
else's opinion, you can even *change* it, subject only to your control over the
source.

Do you think perhaps its possible to use an arbitrarily extendable operator
mechanism in order to *clarify* or make code actually more maintainable?

A resounding 'no'. It's an interesting thought, but for me it falls
short on technical and psychological grounds, as I previously tried to
explain.

You haven't shown me that it falls on technical grounds. You've just said that
its hard to implement in the compiler -- but that's obvious.

>Well, I don't know if these dozen-or-so big-number 'powermod' operations
>that are needed to establish an SSL connection are such a big deal as
>you make it.

Its not me -- its Intel, IBM, Motorola, Sun and AMD who seem to be obsessed
with these instructions.
Well they "provide" them, if that's what you mean.

They *have* to.

I don't see them working the committees to get these supported in
non-assembly languages.
That's because they don't care about how the it goes down once it hits
software. Someone has to pay something for a platform specific library? -- who
cares, as long as they sell their hardware. These same people didn't get
together to define the C standard did they? Why would they bother with an
extension just for widening multiplies?

Ok. So your point was, in the first place.... ?

My point is that e-commerce is not a *hardware* problem. At least, since all
hardware supports it, its not a hardware problem. Its a software problem. The
point is the *C* does not offer a solution for e-commerce. The Java people can
point and laugh and say they are better for e-commerce than C, and ironically,
they would be correct (Java has a BigNum class).

Instead the hardware people waste their time on trivial little specifications
like IEEE-754, which the C standards idiots don't bother to look at until 15
years later.

You shouldn't trivialize IEEE-754, it's one hell of a standard, probably
one of the best I have ever seen.

You missed the implied sarcasm ...
And as you are probably well aware,
there's a perfectly good reason why its only now that the C standard
committee is looking at it (only now hardware supporting it is widely
available). I really don't see your point here.
Only NOW?!??! If by *now*, you mean 1991, then sure. The first IEEE-754
implementation came out in like 1984 or so (the 8087). Within a few years,
everyone else came on board -- Alpha (even though DEC was one of the vendors
originally most vigorously opposed to IEEE-754), Sparc, MIPS, PPC, they are all
IEEE-754, of course.

[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.
The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.

What's so bad about using third-party libraries?

From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive). None of them are portable to platforms which are in
development (you know like might be a concern for things like *SMART CARDS*.)

Of course Amazon, Yahoo and Ebay and most banks are
kind of obsessed with them too, even if they don't know it.

I think you would find that bignum operations are a small part of the
load on e-commerce servers.

According to a paper by Intel, widening multiply accounts for something like
30% of the load on typical e-commerce transactions (typing in your credit card
over the net in a way that can't be snooped.)

Reference, please.

It was in a presentation at the Intel Developer Forum in 1999 or 2000. I only
have this presentation in print form.
"load on typical e-commerce transaction" != "load on e-commerce server".
I meant the server. Clients have cycles to burn, obviously the client
performance doesn't matter on anything less than Half-life 2.
If the transaction is simply setup session/send credit-card
information/close session, this could be right. However, a jiffy ago we
were talking SSL.
One single assembly instruction (one *bundle* on Itanium) holds a whole server
> down for 30% of its computation,
Again, you're equating server load with transaction load.

Uhh ... both sides have to do the bignum operation. You have to do it both
ways because of man-in-the-middle attacks. Server load is what we are
concerned with here, obviously (Itanium is *not* a client platform.)

Hang on, are we talking about "overflow" or "carry" here? These are two
different things with signed numbers.

What happens if a is signed and b is unsigned?
My intend was for the operation to follow the semantics of the x86 ADC assembly
instruction.

Fun thing is: it handles both signed and unsigned numbers. Not-so-funny
is the fact that the x86 supports (as do most microprocessors) both an
overflow flag and a carry flag. So, armed with this, please elaborate on
the meaning.

Yes, but I never mentioned/introduced overflow. I only considered carry. For
bignum implementation, OF is not relevant. Only the CF is.

The point is that this instruction is known to be proper for
doing correct bignum additions.

It sure is... There are libraries containing hand-written assembly that
do fine :-)

Go look at the original purpose of C again. Its supposed to be a portable
assembler. This is a job for C.

- var is set to the result of the addition; the remainder if a carry occurs.

What happens if the signedness of var, a, and b are not equal?
It just behaves like the ADC x86 assembly instruction, the details of which I
will not regurgitate here.

So on adding a signed plus unsigned.... The carry gets a value "as if"
the signed value was actually unsigned?

Yes. Note that x86 does not have a version of ADC that considers the sign of
its operands in any special way that is reflected in the CF.
[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.

It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)

What happens if the bit-widths of var, a, and b are not equal?
The bit-widths would be converted as if the (a + b) operation were happening in
isolation, to match C language semantics.

Ok, so if either is signed, they will both be signed,

No ... that is not the rule used in C. unsigned supersedes signed.

Widening multpilies cost transistor on the CPU. The hardware algorithms are
variations of your basic public school multiply algorithm -- so it takes n^2
transistors to perform the complete operation, where n is the largest bit
word that the machine accepts for the multiplier. If the multiply were not
widened they could save half of those transistors. So multiply those extra
transistors by the number of CPUs shipped with a widening multipliy (PPC,
x86s, Alphas, UltraSparcs, ... etc) and you easily end up in the billion
dollar range.
Sure, it's an important instruction.

Important enough to be ubiquitous and available in all modern CPUs.
I could easily see a quarter million just in design
effort, then some quarter million in testing, not to mention the cost of the
extra die area once it shipped -- and this is inside of *ONE* of these
companies, for *ONE* chip generation.

We're still about 999,500,000 USD short of a billion.

They are willing to spend a half million in development, because they know its
going to cost them a hundred million in extra silicon, and want to make sure
its money well spent (Whoops! I am practically giving away which chip company
I am talking about). You, of course, have missed this and the obvious implied
multiplier over all the other chip architectures, over all generations that
have such a feature. A billion dollars looks like a conservative, but
obviously reasonable order of magnitude guess as to the overall cost of
implementing widening multiplies.

For example, inside of Intel, they decided that they were going to reuse their
floating point multiplier for their widening integer add for Itanium. But that
meant that the multiplier had to be able to do 128 bit multiplies (as opposed
to 82 bits, which is all the Itanium would have apparently needed) and
couldn't run a floating point and integer multiply at the same time. This has
non-trivial layout and design impact on the chip.

Sure does. I estimate the cost of this design work would be in the range
of 10^6 dollars, but not 10^9 dollars.

That's *design* cost. That's nothing compared to the *silicon* cost.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 13 '05 #133

In article <br**********@news.tudelft.nl>, si****@jigsaw.nl says...

Paul Hsieh wrote:
I run into this every now and then. For example, I was recently trying to
solve the following problem: Create a directed graph on n points, each with an
out degree of d (I am concerned with very small d, like 2), such that the path
length between any pair of points is minimized.

Turns out that this is a problem whose complexity grows super-exponentially
with very small values of n. Despite my best efforts, I don't know the answer
for n=11, d=2 for example (I know its either 3 or 4, but I can't prove that its
not 3). More startling is the possibility that n=12,d=2 might have a smaller
latency (but I don't know that either)!
I used to do programming contests in university (both as a participant
and as a judge later on), so I quite like this sort of problem. Care to
provide some background (via regular e-mail), I think I'd like to think
about this for a bit.

This isn't a rigged contest problem. This is a real world problem. Imagine a
*SERVERLESS* online game topology. How do you communicate "game-moves" in
realtime between players in a way that is most efficient and with the least
perceived lag, given a limited amount of bandwidth? Solve the above problem,
and you have a solution for this problem. (Actually for games you can crank d
a little higher than 2 -- the exact problem that I am looking at is different,
and I am loathe to increase d above 2.)

You can think about this problem all you like. If you want to know if I am
fully of it or not, see if you can get above n=10,d=2.

Anyhow, the point is that the only way to have a chance to squeeze enough
computational juice out of my PC to solve this, I had to hard code huge amounts
of the code, and use a lot of bit twiddling tricks just for each special case.
I used the preprocessor macros as much as possible to make my code manageable,
but in order to change n, I actually have to *modify* the code by adding and
changing *n* lines of code. There's not much I can do about this, I would have
no chance to solve this otherwise.

Ok, that's a case. But wouldn't it have been better then just to make a
program generator (written in C or something else)? Or perhaps use the
m4 macro processor?

Tell me how you do this in a way that's compatible with testing it against 3
different compilers (one of the *OTHER* techniques I use to go after
performance ...) two of which are very unfriendly to being commandline driven,
while still making sense from a debugging point of view?

I'm saying that trying to fix C's intrinsic problems shouldn't start or end
with some kind of resolution of call stack issues. Anyone who understands
machine architecture will not be surprised about call stack depth limitations.

It's the task of a standard to spell these out, I think.

There are far more pressing problems in the language that one would like to
fix.

Yes, but most things that relate to the "encouraging of writing
extremely unsound and poor code", as you describe C, would be better
fixed by using another language. A lot of the inherently unsafe things
in C are sometimes needed, when doing low-level stuff.

Why is UB in isgraph(-1) needed? Why is UB in gets() needed? Why is the fact
that fgets() skips over '\0' characters needed? Why is a non-portable right
shift on signed integers needed (especially considering the one on unsigned
*is* portable)?

These are side cases, that are at least mentioned in the standard. No
big deal, IMHO.

Its a big deal to me. I don't own any material on the C language that
specifies all of this. I actually go reference source code for a C library I
happen to have access to. But, of course, these sources and a real world
programmer's sense leads me to the very wrong conclusion that isgraph(-1) just
returns false.
[...] This is much more of a big deal:

int count_nodes(struct Tree *t)
{
return t ? count_nodes(t->left)+count_nodes(t->right)
: 0;
}

Things like this are ubiquitous in many kinds of code.
Uh ... dude, that just looks like an expensive way of writing:

int count_nodes (struct Tree *t) {
return 0;
}

I don't suppose you did very well on these programming contests did you? Btw,
weren't you railing against trigraphs?
[...] Trees with a big depth are not uncommon.
Assuming you actually wanted to count the *NODES* of the tree, and were a
little concerned about performance try this:

static int count_nodes_inner (const struct Tree *t) {
return (t->left ? (1 + count_nodes_inner (t->left )) : 0) +
(t->right ? (1 + count_nodes_inner (t->right)) : 0);
}

int count_nodes (const struct Tree *t) {
return t ? count_nodes_innter (t) : 0;
}

as it avoids the leaf node tail calls, which will roughly double the
performance.
[...] How can I ever claim such a function is portable if I have no
guarantee whatsoever that it will work on some architectures?
Huh? Are you talking about the stack thing again? Dude: malloc (65536) is
not portable by the same reasoning, and:

int fib(int n) {
if (n <= 2) return 1;
return fib(n-1) + fib(n-2);
}

isn't going to work in any relevant way in any language, for a call like
f(1000000).

>[powerful heap manager]
>
>>But a third party library can't do this portably.
>
>I don't see why not?
You simply do not understand. Malloc and multithreading are intertwined in the
implementation. If you have multithreading, they you have to lock and unlock
your heap, or use other strategies to make sure one thread isn't in the middle
of freeing while another is mallocing. If you don't, it introduces something
called a "race condition". Without exact considerations malloc and free would
simply fail to function properly in a multithreaded environment.

The point is that any extensions of the heap's semantics end up having exactly
the same problem. For example, how would one write a "freeall()" on top of
malloc? (Ignoring the fact that writing it *ON TOP* of malloc/free loses all
the potential *MASSIVE* performance advantage) Well presumably you'd have some
hidden part at the head of each allocation creating a linked list of
outstanding mallocs that were removed upon frees, and the freeall() function
would just walk through them calling free one by one. In a multithreaded
environment, once again, you'd be in race condition city. Same problem for
trying to build seperated heaps (virtual or otherwise.)

Now the C language standard says nothing about this of course, because C
doesn't say anything about multithreading at all. But that doesn't make the
problem go away -- most of the commonly used C compilers today include
multitasking extensions.

The point is that there is a *HUGE* difference between pushing such
functionality into the C standard, versus trying to obtain them from 3rd
parties. If it were put into the standard, then each vendor could modify their
heap implementation (a quite doable endeavour, since each vendor has already
solved the heap lock/unlock problem.) But via 3rd parties, the library vendor
has to learn and keep up to date with the OS primitives for each platform they
support.
Explain to me how you implement malloc() in a *multithreaded* environment
portably. You could claim that C doesn't support multithreading, but I highly
doubt your going to convince any vendor that they should shut off their
multithreading support based on this argument.Now your shifting the goalposts.
How so?

Since in your original statement there was no mention of multithreading.

Mentioning the implementation malloc implicitely brings multithreading into the
discussion. That is what you don't understand.
Allow me to refresh your memory:

"But a third party library can't do this portably.
Because of the multithreading issue ...
[...] Its actual useful
functionality that you just can't get from the C language, and there's
no way to reliably map such functionality to the C language itself.
One is forced to know the details of the underlying platform to
implement such things. Its something that really *should* be in the
language." [Paul Hsieh, 4 posts up]

See?
Yes -- the implicit answer to the "WHY" question that follows each statement,
is "because of the multithreading problem". If you were aware of the issues,
I'm sure this wouldn't be quite so bewildering to you.

The two are related. Writing a heap manager requires that you are
aware of multitasking considerations.

Nonsense. There's many real hardware out there where multitasking is a
non-issue.

Obviously I meant for platforms that support multithreading -- and BTW it has
nothing to do with hardware. Multithreading is a software feature, not a
hardware one. (You're probably thinking of pre-emption, which is a totally
different thing.)

If I want to extend the heap manager, I have to solve them each one by one
> in different ways on different platforms

Nonsense. Unless you need support for something inherently
platform-specific like multithreading.

If the feature of multithreading is in the platform, then extensions to malloc
must take these into consideration. Do you understand the difference this and
what you said? malloc/calloc/realloc/free, or any of the extensions I propose
have no API, modes, or anything that exposes anything about multithreading to
the end user. Its all in the internals.
[...] I still don't see a problem writing a super-duper heap manager on top
of malloc().
Well, that much is obvious ...

The point is, if the C standard were simply to add this functionality straight
into the library, then it would be each compiler vendor's responsibility to add
this functionality into the compiler.

Sure but then the compiler would be more expensive to make, and I would
have to pay more $$$ for a feature I don't need.

I have written such a malloc myself (its almost much faster than typical heap
managers, and contains rich built-in debugging facilities) that I would be only
too happy to donate to the C standards committee on the condition that they
endorse my extensions. Its been well tested, and is easily portable to any
flat address implementations and whose multitasking mechanisms include one of
locks, mutexes (preferred) or semaphores.

And the functionality would be
inherently portable as a result. The requirement of multitasking support would
then be pushed back into the vendors lap -- i.e., the people who have
introduced this non-portable feature in the first place.

Adding multithreading support to a language is a can of worms.

Which I am not suggesting. And its not impossible -- look at the language
ErLang. The VxWorks or QNX or OSE APIs also suggests very good abstractions
for multithreading that should be reasonably applicable to any C-style
language.
[...] I don't
think you quite know what you're suggesting here. Many difficult
semantic issues to resolve. The fun thing about C is that it's simple.
I *DO* know what I am suggesting. This heap stuff is not just off the top of
my head. I've written applications where I've *needed* these features.

By dictating its existence in
the library, it would put the responsibility of making it work right in the
hands of the vendor without affecting the C standards stance on not
acknowledging the need for multithreading.
Huh? The C standard describes the library.

Right, but the library doesn't specify anything about multitasking. Its only
implementations that need to deal with it.

Obviously, you cannot write a multithreading heap manager portably if
there is no portable standard on multithreading (doh). If you need this
you can always presume POSIX, and be on your way.
You do not understand. The *IMPLEMENTATION* needs to be multithreading aware.

Perhaps I don't understand. The implementation... of what?

malloc/free/calloc/realloc or any extensions on top of them.

What does the latter remark have to do with C's suitability for doing it?
I am saying that I don't disagree with you, but I think you are missing the
point. By simply adding the features/function into C, that would make it
defacto portable from the point of view that matters -- programmers of the C
language.

Well, let's see your proposal then. How would you weave threads in the C
language portably? No handwaving please, you will have to come up with
rock-solid semantics.

I'm not proposing anything about threads. I am talking about the heap.
Extending the heap means you have to be aware of multithreading if your
platform supports it.

For most flat-memory architectures, its actually very straightforward to add in
all the features that I am requesting. I know this because I've written my own
heap manager, which of course uses platform specific behaviour for the one
platform I am interested in. (It only gets complicated for platforms with
unusual memory, like segmented architectures, which have correspondingly
complicated heap managers today.) This is in stark contrast with the
incredibly high bar of platform-specific complications set by trying to do this
outside of the language.

I'm eagerly awaiting your proposal that shows how to do all this in a
platform independent way.

The NULL heap is the system heap.

struct heap * newHeap (struct heap *);
- allocate a subheap from a given heap.
int destroyHeap (struct heap *);
- frees all allocated memory with respect to a given heap (including
sub-heaps), then destroy the heap.
int freeAllH (struct heap *);
- frees all allocated memory with respect to a given heap (including
sub-heaps), but don't destroy the heap.

void * mallocH (struct heap *, size_t);
- Like malloc with respect to a given heap.
void *callocH (size_t, size_t);
- Like calloc with respect to a given heap.
void * reallocH (struct heap *, void *, size_t);
- Like realloc with respect to a given heap.
void * expandH (struct heap *, void *, size_t);
- Like reallocH, but will return NULL instead of moving, and will never
move the pointer.
int freeH (struct heap *, void *);
- Like free with respect to a given heap.

int dgbIsAlloc (struct heap *, void *);
- Returns 1 if, to the best of the library's ability to determine, the
pointer is a properly allocated heap entry. Returns 0, if it can
determine the pointer to not be a properly allocated heap entry.

Notice how there is no mention of multithreading in anything there? Of course,
what is not seen, is that they in fact *ALL* require very exact considerations
for multithreading if the platform supports multithreading.

The value of this is that destroyHeap/freeAllH actually costs about the same as
a handful of frees (even if you've made tens of thousands of outstanding
mallocs.) And expandH can be used to in cases were a complete copy is not
necessary in order to move an allocation to a resized allocation (this feature
would improve the performance of the "Better String Library" in many cases,
that I cannot in any way capture with C's current heap semantics.) Although
dgbIsAlloc is probabilistic (though it would never return 0 on valid heap
pointer), in practice its exactness would be so close as to be a reasonable
realiable mechanism to ensure the correctness of an allocated pointer for
debugging purposes.

>>My proposal allows the programmer to decide what is or is not useful them.
>
>I'm all for that.

Well, I'm a programmer, and I don't care about binary output -- how does your
proposal help me decide what I think is useful to me?

It already did, it seems - You just stated your decision, with regard to
binary output. Fine by me.

Not fine by me. Because some other programmer who's code I have to look at
will use it, and I won't have any idea what it is.

Don't worry, you can look it up in the man pages. Unlike your
user-defined operators... :-)

You would have to look up "%b" ("%db"? -- whatever) in *UP TO DATE* man pages.
(Is *restrict* or the new *complex* type, from C99 properly documented in
*YOUR* current man pages?) My proposed plug-in mechanism would be in modules
or header files that would, one way or another, have to be in your project
somewhere. Presumably, the source would have to be available somewhere, so
there would be *NO QUESTION* as to what it does.

Same comment applies to &&&, ||| versus user defined operators.

Hey, its not me -- apparently its people like you who wants more operators.

Just a dozen or so! But _standardized_ ones.

And you don't see the littlest problem with this proposal? If you add a dozen,
you'd better be sure that they are last dozen that anyone could possibly want
to add to the language. Their value add would have be worth the pain of having
everyone learn about 12 new symbols.

Well, lets start with ?< ?> and ?: for now; ?: instead of |||, and I
drop the &&& requests since it is much less useful. It's already in gcc
so it's implementable. Give everybody 10 years to get used to it, then
we may add a couple more.

My claim is that so few people would use these operators, that the would be
defacto not supported. They would be less useful than bitfields.

My point is that no matter what operators get added to the C language, you'll
never satisfy everyone's appetites. People will just want more and more,
though almost nobody will want all of what could be added.

My solution solves the problem once and for all. You have all the operators
you want, with whatever semantics you want.

That's too much freedom for my taste. If I would want this kind of
thing, I would yank out lex and yacc and code my own language.
Well how is that different from adding just &&& and |||? If you *REALLY
REALLY* want them, then why don't you yank out lex and yacc and code up a new
language?

Because I want to use them in C programs.

But I want them *not* to be used in C programs -- unless they are fully
documented in the source. Always.

What makes your code readable is adherence to an agreed upon coding

> standard that exists outside of what the language defines.

There are several such standards for identifier names. No such standard
exists for operator names, except: use familiar ones; preferably, steal
them from other languages.
Sounds like a reasonable convention to me. How about: All new operators must
be defined in a central module named ----. Or: Only these new operators may be
added as defined by ... yada, yada, yada.

More handwaving... Please humour me, and spell out a reasonable
convention for your operator-introduction.
The coding standards are just different.

I can't tell; I haven't seen a coding standard for operator introduction
yet.

I just gave some. What's hand waving is saying something about stack limits --
you haven't even suggested a way in which that *COULD* be possible to put down
on a piece of paper. Its *OBVIOUS* what I am talking about above. You just
want something specific to have something arbitrary to disagree with. Just
pick a convention -- if something bothers you about certain operators, spell it
out in your convention.

Just like freeform variable names, there is the same incumberance of the
programmers managing the meaning of the symbols for more generic operators.
You have not made a sufficient case to convince me that there is a real
difference between the two.

...Have you ever laid eyes on a moderately complicated APL program,
without knowing APL?

No, but I have looked at the results of the annual obfuscated C code constest.
I have never suggested that you *COULDN'T* abuse the idea. But you can abuse
just about *ANYTHING* in C. You don't think adding in ||| or &&& would add to
the potential for noise in those contests? I gave a very wordy description of
how to define an operator -- its kind of hard to obfuscate something completely
while you have so many hints (a keyword like _Operator, a whole function-like
declaration, etc) as to what it is. You would have to hide it behind macros to
truly obfuscate it, but you can use macros for arbitrary obfuscation
regardless.

><<< and @ are nice though.
BTW -- odd how you actually thought that these were "nice". Do you think you
would have a hard to remembering them? Would it wrankle on your brain because
they were odd and unfamilliar operators that are new to the language? Would
you have a tendancy to write abusive code because of the existence of these new
operators?

The meanings you gave for these rang some bells with my programming
experience. It leads me to believe you have encountered similar
situations, which gives you some bonus points for credibility in my book
:-) Seriously, I don't think such operators would be a good idea, but
I'd welcome library functions (the old fashioned ones, with names) for them.

Well I'd never use &&& or ||| as you proposed them for any reason whatsoever.
At least with the <<< and @ operators, both add just enough useful
functionality, that I *would* consider using them, knowing I would be non-
portable. That's the point of all this -- its all just opinion. With
redefinable operators someone's arbitrary opinion doesn't enter into it -- you
can make it conform to whatever opinion you have. If you disagree with someone
else's opinion, you can even *change* it, subject only to your control over the
source.

Do you think perhaps its possible to use an arbitrarily extendable operator
mechanism in order to *clarify* or make code actually more maintainable?

A resounding 'no'. It's an interesting thought, but for me it falls
short on technical and psychological grounds, as I previously tried to
explain.

You haven't shown me that it falls on technical grounds. You've just said that
its hard to implement in the compiler -- but that's obvious.

>Well, I don't know if these dozen-or-so big-number 'powermod' operations
>that are needed to establish an SSL connection are such a big deal as
>you make it.

Its not me -- its Intel, IBM, Motorola, Sun and AMD who seem to be obsessed
with these instructions.
Well they "provide" them, if that's what you mean.

They *have* to.

I don't see them working the committees to get these supported in
non-assembly languages.
That's because they don't care about how the it goes down once it hits
software. Someone has to pay something for a platform specific library? -- who
cares, as long as they sell their hardware. These same people didn't get
together to define the C standard did they? Why would they bother with an
extension just for widening multiplies?

Ok. So your point was, in the first place.... ?

My point is that e-commerce is not a *hardware* problem. At least, since all
hardware supports it, its not a hardware problem. Its a software problem. The
point is the *C* does not offer a solution for e-commerce. The Java people can
point and laugh and say they are better for e-commerce than C, and ironically,
they would be correct (Java has a BigNum class).

Instead the hardware people waste their time on trivial little specifications
like IEEE-754, which the C standards idiots don't bother to look at until 15
years later.

You shouldn't trivialize IEEE-754, it's one hell of a standard, probably
one of the best I have ever seen.

You missed the implied sarcasm ...
And as you are probably well aware,
there's a perfectly good reason why its only now that the C standard
committee is looking at it (only now hardware supporting it is widely
available). I really don't see your point here.
Only NOW?!??! If by *now*, you mean 1991, then sure. The first IEEE-754
implementation came out in like 1984 or so (the 8087). Within a few years,
everyone else came on board -- Alpha (even though DEC was one of the vendors
originally most vigorously opposed to IEEE-754), Sparc, MIPS, PPC, they are all
IEEE-754, of course.

[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.
The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.

What's so bad about using third-party libraries?

From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive). None of them are portable to platforms which are in
development (you know like might be a concern for things like *SMART CARDS*.)

Of course Amazon, Yahoo and Ebay and most banks are
kind of obsessed with them too, even if they don't know it.

I think you would find that bignum operations are a small part of the
load on e-commerce servers.

According to a paper by Intel, widening multiply accounts for something like
30% of the load on typical e-commerce transactions (typing in your credit card
over the net in a way that can't be snooped.)

Reference, please.

It was in a presentation at the Intel Developer Forum in 1999 or 2000. I only
have this presentation in print form.
"load on typical e-commerce transaction" != "load on e-commerce server".
I meant the server. Clients have cycles to burn, obviously the client
performance doesn't matter on anything less than Half-life 2.
If the transaction is simply setup session/send credit-card
information/close session, this could be right. However, a jiffy ago we
were talking SSL.
One single assembly instruction (one *bundle* on Itanium) holds a whole server
> down for 30% of its computation,
Again, you're equating server load with transaction load.

Uhh ... both sides have to do the bignum operation. You have to do it both
ways because of man-in-the-middle attacks. Server load is what we are
concerned with here, obviously (Itanium is *not* a client platform.)

Hang on, are we talking about "overflow" or "carry" here? These are two
different things with signed numbers.

What happens if a is signed and b is unsigned?
My intend was for the operation to follow the semantics of the x86 ADC assembly
instruction.

Fun thing is: it handles both signed and unsigned numbers. Not-so-funny
is the fact that the x86 supports (as do most microprocessors) both an
overflow flag and a carry flag. So, armed with this, please elaborate on
the meaning.

Yes, but I never mentioned/introduced overflow. I only considered carry. For
bignum implementation, OF is not relevant. Only the CF is.

The point is that this instruction is known to be proper for
doing correct bignum additions.

It sure is... There are libraries containing hand-written assembly that
do fine :-)

Go look at the original purpose of C again. Its supposed to be a portable
assembler. This is a job for C.

- var is set to the result of the addition; the remainder if a carry occurs.

What happens if the signedness of var, a, and b are not equal?
It just behaves like the ADC x86 assembly instruction, the details of which I
will not regurgitate here.

So on adding a signed plus unsigned.... The carry gets a value "as if"
the signed value was actually unsigned?

Yes. Note that x86 does not have a version of ADC that considers the sign of
its operands in any special way that is reflected in the CF.
[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.

It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)

What happens if the bit-widths of var, a, and b are not equal?
The bit-widths would be converted as if the (a + b) operation were happening in
isolation, to match C language semantics.

Ok, so if either is signed, they will both be signed,

No ... that is not the rule used in C. unsigned supersedes signed.

Widening multpilies cost transistor on the CPU. The hardware algorithms are
variations of your basic public school multiply algorithm -- so it takes n^2
transistors to perform the complete operation, where n is the largest bit
word that the machine accepts for the multiplier. If the multiply were not
widened they could save half of those transistors. So multiply those extra
transistors by the number of CPUs shipped with a widening multipliy (PPC,
x86s, Alphas, UltraSparcs, ... etc) and you easily end up in the billion
dollar range.
Sure, it's an important instruction.

Important enough to be ubiquitous and available in all modern CPUs.
I could easily see a quarter million just in design
effort, then some quarter million in testing, not to mention the cost of the
extra die area once it shipped -- and this is inside of *ONE* of these
companies, for *ONE* chip generation.

We're still about 999,500,000 USD short of a billion.

For example, inside of Intel, they decided that they were going to reuse their
floating point multiplier for their widening integer add for Itanium. But that
meant that the multiplier had to be able to do 128 bit multiplies (as opposed
to 82 bits, which is all the Itanium would have apparently needed) and
couldn't run a floating point and integer multiply at the same time. This has
non-trivial layout and design impact on the chip.

Sure does. I estimate the cost of this design work would be in the range
of 10^6 dollars, but not 10^9 dollars.

That's *design* cost. That's nothing compared to the *silicon* cost.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 13 '05 #134

Chris Torek

[Someone, probably Paul Hsieh, wrote:]

>Widening multpilies [ie N x N => 2N bit result] ...
>transistors by the number of CPUs shipped with a widening multipliy (PPC,
>x86s, Alphas, UltraSparcs, ... etc) and you easily end up in the billion
>dollar range.

[Sidney Cadot wrote:] Sure, it's an important instruction.

In article <MP************************@news.sf.sbcglobal.ne t>
Paul Hsieh <qe*@pobox.com> writes:Important enough to be ubiquitous and available in all modern CPUs.

Actually, UltraSparc lacks a widening multiply.

It also has a widening multiply, but not necessarily the desired one.

In particular, UltraSparc is a 64-bit architecture, with 64-bit
registers. It has a 64-bit MULX instruction, with an unsigned
variant UMULX, but this instruction does 64 x 64 -> 64 multiplies,
discarding the upper 64-bit half of the 128-bit result.

The widening multiply is the old 32-bit-compatibility instruction,
which comes in SMULcc and UMULcc variants. It multiplies the lower
32 bits of two registers with the lower-half 32-bit result going to
a third register. The upper-half 32-bit result goes into the
special "%y" register. This instruction, which is a retrofit from
the V8 SPARC, is in turn backwards-compatible with pre-V8 SPARCs
which had only multiply-step instructions. (Using the multiply-
step instructions required careful fiddling with the %y register;
one of the appendices to the V7 architecture manuals gave multiply
algorithms.)

(Mr Hseih can, of course, simply define UltraSparc as "not a modern
CPU".)

None of this is particularly relevant to C today, which also lacks
a widening multiply -- you have to simulate it with, e.g.:

unsigned long result = (unsigned long)op1 * (unsigned long)op2;

I do think the UltraSparc design was the wrong choice -- C's lack
of direct access to various instructions does not mean the instructions
are unimportant. At the same time, I do not think C *needs* access
to all instructions. There are other programming languages, and
methods of tying C code into code written in other languages. They
are not portable, and this is not important.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Nov 13 '05 #135

Paul Hsieh wrote:

Sidney Cadot <si****@jigsaw.nl> wrote:
Arthur J. O'Dwyer wrote:
On Sat, 6 Dec 2003, Sidney Cadot wrote:
Good comparison. Support for &&& and ||| is exactly as likely to
break existing tools, and is exactly as likely to make it into a
future version of C. [There may not be a firm cause-and-effect there,
but it is correlated.]
Yes, &&& and ||| would break existing tools, but there is precedent for
that (introduction of // comments).

// was added in because so many compilers had already supported them
anyway. This was in keeping with the C standards committee "codifying
standard practices" nonsense. It was already there, all over the
place.

So...?

[...] The not unimportant difference is
that adding support for &&& and ||| would be rather trivial, while
support for Paul's proposal is a complete nightmare.

As to the likelihood of either feature making it into the C standard: I
disagree. The chances of ||| and &&& being added is many orders of
magnitudes greater than that of the the "operator introduction" feature
championed by Paul. Still very close to zero, of course. One has to
approach probabilities of this magnitude logarthmically :-)

Probably true, but your suggestion is also infinitely less ambitious,
and infinitely less useful.

On the first I agree, on the second I don't, and you forgot to mention
it's infinitely easier to implement.
Your suggestion is like the complex
number of variable length macros thing in C99. They are interesting
additions which provide some kind of epsilon more functionality, but
which don't compensate with the fact that you are leaving all C89
compilers behind.

My suggestion is more like "restrict" in C99 which adds functionality
that just is not in the language in any way shape or form.
Ah, yes, restrict. Another nice example of a feature that was introduced
that breaks existing tools.

I like restrict. It is useful, plugging an obvious semantic hole that
made many optimizations impossible. On top of that, it is easy to
support (though a bit less easy to take advantage of), as per 6.7.3.1#6:

"A translator is free to ignore any or all aliasing implications of uses
of restrict."

Unfortunately, your operator introduction magic lacks both these properties.
Some people would use it no matter what -- because it adds something so
totally powerful in terms of functionality that its worth losing the
compatibility with C89.

Pardon my ignorance: What does restrict add in terms of functionality?

Best regards,

Sidney

Nov 13 '05 #136

Paul Hsieh wrote:

In article <br**********@news.tudelft.nl>, si****@jigsaw.nl says...
Paul Hsieh wrote:
I run into this every now and then. For example, I was recently trying to
solve the following problem: Create a directed graph on n points, each with an
out degree of d (I am concerned with very small d, like 2), such that the path
length between any pair of points is minimized.

Turns out that this is a problem whose complexity grows super-exponentially
with very small values of n. Despite my best efforts, I don't know the answer
for n=11, d=2 for example (I know its either 3 or 4, but I can't prove that its
not 3). More startling is the possibility that n=12,d=2 might have a smaller
latency (but I don't know that either)!
I used to do programming contests in university (both as a participant
and as a judge later on), so I quite like this sort of problem. Care to
provide some background (via regular e-mail), I think I'd like to think
about this for a bit.

This isn't a rigged contest problem. This is a real world problem.

I think you forgot the capitals on Real World, there... :-)
Imagine a
*SERVERLESS* online game topology. How do you communicate "game-moves" in
realtime between players in a way that is most efficient and with the least
perceived lag, given a limited amount of bandwidth? Solve the above problem,
and you have a solution for this problem. (Actually for games you can crank d
a little higher than 2 -- the exact problem that I am looking at is different,
and I am loathe to increase d above 2.)
Just out of curiosity: why should the graph be directed; don't the nodes
have two-way communication? And you're not considering a lag metric
other than just number of hops?
You can think about this problem all you like. If you want to know if I am
fully of it or not, see if you can get above n=10,d=2.
Perhaps I will, I'll have some time on my hands round Xmas. Sounds like
a nice problem.

Anyhow, the point is that the only way to have a chance to squeeze enough
computational juice out of my PC to solve this, I had to hard code huge amounts
of the code, and use a lot of bit twiddling tricks just for each special case.
I used the preprocessor macros as much as possible to make my code manageable,
but in order to change n, I actually have to *modify* the code by adding and
changing *n* lines of code. There's not much I can do about this, I would have
no chance to solve this otherwise.

Ok, that's a case. But wouldn't it have been better then just to make a
program generator (written in C or something else)? Or perhaps use the
m4 macro processor?

Tell me how you do this in a way that's compatible with testing it against 3
different compilers (one of the *OTHER* techniques I use to go after
performance ...) two of which are very unfriendly to being commandline driven,
while still making sense from a debugging point of view?

How about generating C source that you #include from a resident file in
your project, would that be an option?

>I'm saying that trying to fix C's intrinsic problems shouldn't start or end
>with some kind of resolution of call stack issues. Anyone who understands
>machine architecture will not be surprised about call stack depth limitations.

It's the task of a standard to spell these out, I think.
>There are far more pressing problems in the language that one would like to
>fix.

Yes, but most things that relate to the "encouraging of writing
extremely unsound and poor code", as you describe C, would be better
fixed by using another language. A lot of the inherently unsafe things
in C are sometimes needed, when doing low-level stuff.

Why is UB in isgraph(-1) needed? Why is UB in gets() needed? Why is the fact
that fgets() skips over '\0' characters needed? Why is a non-portable right
shift on signed integers needed (especially considering the one on unsigned
*is* portable)?

These are side cases, that are at least mentioned in the standard. No
big deal, IMHO. Its a big deal to me. I don't own any material on the C language that
specifies all of this. I actually go reference source code for a C library I
happen to have access to. But, of course, these sources and a real world
programmer's sense leads me to the very wrong conclusion that isgraph(-1) just
returns false.
The C standard specifies what is UB and what's not. Grepping an
implementation is a fragile way of dedicing semantics, I would say.

[...] This is much more of a big deal:

int count_nodes(struct Tree *t)
{
return t ? count_nodes(t->left)+count_nodes(t->right)
: 0;
}

Things like this are ubiquitous in many kinds of code.

Uh ... dude, that just looks like an expensive way of writing:

int count_nodes (struct Tree *t) {
return 0;
}

Hmmm yes, it is, isn't it? Now I feel all silly!
I don't suppose you did very well on these programming contests did you? Btw,
weren't you railing against trigraphs?
Held my own, thank you very much. And yes, I was railing against
trigraphs. Why do you ask? There's no trigraph in the code.

[...] Trees with a big depth are not uncommon.

Assuming you actually wanted to count the *NODES* of the tree, and were a
little concerned about performance try this:

static int count_nodes_inner (const struct Tree *t) {
return (t->left ? (1 + count_nodes_inner (t->left )) : 0) +
(t->right ? (1 + count_nodes_inner (t->right)) : 0);
}

int count_nodes (const struct Tree *t) {
return t ? count_nodes_innter (t) : 0;
}

as it avoids the leaf node tail calls, which will roughly double the
performance.

Well, I'm glad that you extracted my intended meaning from the name of
the function I used. Good thing I didn't use an operator! ;-)

[...] How can I ever claim such a function is portable if I have no
guarantee whatsoever that it will work on some architectures? Huh? Are you talking about the stack thing again?
I'm _still_ talking about 'the stack thing' yes, since you still seem to
think that this is a triviality. I think it's not.

Dude: malloc (65536) is not portable by the same reasoning, and:

If you think that, then you didn't follow my reasoning. I can find out
at compile time if malloc(65536) is allowed, and if it is, I can find
out at runtime if it has succeeded. I can do no such thing with stack
overflows. The program will exhibit all the symptoms of UB without a
proper cause that can be traced down to the standard.
int fib(int n) {
if (n <= 2) return 1;
return fib(n-1) + fib(n-2);
}

isn't going to work in any relevant way in any language, for a call like
f(1000000).
What are you trying to say here? Help me out a bit.
You simply do not understand. Malloc and multithreading are intertwined in the
implementation. If you have multithreading, they you have to lock and unlock
your heap, or use other strategies to make sure one thread isn't in the middle
of freeing while another is mallocing. If you don't, it introduces something
called a "race condition". Without exact considerations malloc and free would
simply fail to function properly in a multithreaded environment.
The point is that any extensions of the heap's semantics end up having exactly
the same problem. For example, how would one write a "freeall()" on top of
malloc? (Ignoring the fact that writing it *ON TOP* of malloc/free loses all
the potential *MASSIVE* performance advantage) Well presumably you'd have some
hidden part at the head of each allocation creating a linked list of
outstanding mallocs that were removed upon frees, and the freeall() function
would just walk through them calling free one by one. In a multithreaded
environment, once again, you'd be in race condition city. Same problem for
trying to build seperated heaps (virtual or otherwise.)

Now the C language standard says nothing about this of course, because C
doesn't say anything about multithreading at all. But that doesn't make the
problem go away -- most of the commonly used C compilers today include
multitasking extensions.
Are you sure you mean "multitasking" (which deals with processes,
usually each having their own address space) rather than
"multithreading" (which deals with threads of execution within the same
process, sharing the address space)?

I still seem to be missing your point rather blatantly, I'm afraid.
The point is that there is a *HUGE* difference between pushing such
functionality into the C standard, versus trying to obtain them from 3rd
parties. If it were put into the standard, then each vendor could modify their
heap implementation (a quite doable endeavour, since each vendor has already
solved the heap lock/unlock problem.) But via 3rd parties, the library vendor
has to learn and keep up to date with the OS primitives for each platform they
support.
Ok, I'll try to summarize my understanding of the point you try to make.
Let's consider it a priori desirable that we have a super-duper heap
manager, with all kinds of nice things like seperate heaps. We agree
that this is often useful. Now let's also assume a certain platform
where multithreading/multitasking is so important that the
platform-specific library provides support for it. We can assume that it
also sports a thread-safe malloc (yes?). You'd still want the
super-duper heap manager. Your point is that mandating a better heap
manager by the standard (without even mentioning multithreading) would
force each vendor to implement these, and implement them thread-safe if
they think multithreading is important to have. Is that close?

Now, as I see it, even if _just_ the standard malloc()/free()/etc are
thread-safe, you can build a more powerful heap manager on that alone,
using just standard C features. If the thing needs to be able to run in
a multithreaded environment, you could even do that, with some extra
effort. You need some form of mutex locking, but you can resort to
Lamport's bakery algorithm for that. Surely it won't be pretty, but it
is, in principle, possible. That's my point.

>Explain to me how you implement malloc() in a *multithreaded* environment
>portably. You could claim that C doesn't support multithreading, but I highly
>doubt your going to convince any vendor that they should shut off their
>multithreading support based on this argument.

Now your shifting the goalposts.

How so?

Since in your original statement there was no mention of multithreading.

Mentioning the implementation malloc implicitely brings multithreading into the
discussion. That is what you don't understand.

If the malloc returns either NULL or a pointer (i.e., it conforms
minimally to the standard) then, even in a multithreaded environment, it
is quite possible to build a heap manager on top of that. If you need
high performance even in such demanding circumstances, then, yes, you'd
need some support from preferably hardware-assisted mutexes or similar.
Allow me to refresh your memory:

"But a third party library can't do this portably. Because of the multithreading issue ...
[...] Its actual useful
functionality that you just can't get from the C language, and there's
no way to reliably map such functionality to the C language itself.
One is forced to know the details of the underlying platform to
implement such things. Its something that really *should* be in the
language." [Paul Hsieh, 4 posts up]

See? Yes -- the implicit answer to the "WHY" question that follows each statement,
is "because of the multithreading problem". If you were aware of the issues,
I'm sure this wouldn't be quite so bewildering to you.
Ok. Let's try to get some closure here, this is eating into my free time
too much for my taste :-)

I would say that writing a more powerful heap manager is possible in
principle, even in a multithreaded environment, as long as you have a
well-implemented malloc() (and friends) at your disposal. It would be a
drag to do so and it would be slow, but it is possible.

I think I have a clearer bearing on your statement why it would be good
to put this stuff in the library. You do need mutexes, and if the
implementation supports them, it will work better than a shared-memory
simulated mutex. So for performance reasons, I think your idea of
extending the basic heap manager, by extending its functionality as
mandated by the standard, is a good idea.

That is, as long as I can get my msize() function as well :-)

[ skipped a bunch ]My proposal allows the programmer to decide what is or is not useful them.
>>
>>I'm all for that.
>
>Well, I'm a programmer, and I don't care about binary output -- how does your
>proposal help me decide what I think is useful to me?

It already did, it seems - You just stated your decision, with regard to
binary output. Fine by me.

Not fine by me. Because some other programmer who's code I have to look at
will use it, and I won't have any idea what it is.
Don't worry, you can look it up in the man pages. Unlike your
user-defined operators... :-)

You would have to look up "%b" ("%db"? -- whatever) in *UP TO DATE* man pages.
(Is *restrict* or the new *complex* type, from C99 properly documented in
*YOUR* current man pages?)

Yes, as close as gcc is, it does. And in other cases, I have the standard.
My proposed plug-in mechanism would be in modules
or header files that would, one way or another, have to be in your project
somewhere. Presumably, the source would have to be available somewhere, so
there would be *NO QUESTION* as to what it does.

Same comment applies to &&&, ||| versus user defined operators.

>Hey, its not me -- apparently its people like you who wants more operators.

Just a dozen or so! But _standardized_ ones.

And you don't see the littlest problem with this proposal? If you add a dozen,
you'd better be sure that they are last dozen that anyone could possibly want
to add to the language. Their value add would have be worth the pain of having
everyone learn about 12 new symbols.
Well, lets start with ?< ?> and ?: for now; ?: instead of |||, and I
drop the &&& requests since it is much less useful. It's already in gcc
so it's implementable. Give everybody 10 years to get used to it, then
we may add a couple more.

My claim is that so few people would use these operators, that the would be
defacto not supported. They would be less useful than bitfields.

Ouch! That hurts.

>My point is that no matter what operators get added to the C language, you'll
>never satisfy everyone's appetites. People will just want more and more,
>though almost nobody will want all of what could be added.
>
>My solution solves the problem once and for all. You have all the operators
>you want, with whatever semantics you want.

That's too much freedom for my taste. If I would want this kind of
thing, I would yank out lex and yacc and code my own language.

Well how is that different from adding just &&& and |||? If you *REALLY
REALLY* want them, then why don't you yank out lex and yacc and code up a new
language?

Because I want to use them in C programs.

But I want them *not* to be used in C programs -- unless they are fully
documented in the source. Always.

So there we differ. This, I think, is a matter of taste. I propose to
drop this subthread as well.

>What makes your code readable is adherence to an agreed upon coding

>standard that exists outside of what the language defines.

There are severa such standards for identifier names. No such standard
exists for operator names, except: use familiar ones; preferably, steal
them from other languages.

Sounds like a reasonable convention to me. How about: All new operators must
be defined in a central module named ----. Or: Only these new operators may be
added as defined by ... yada, yada, yada.

More handwaving... Please humour me, and spell out a reasonable
convention for your operator-introduction.
The coding standards are just different.

I can't tell; I haven't seen a coding standard for operator introduction
yet. I just gave some.
I presume you mean "All new operators must be defined in a central
module named ----. Or: Only these new operators may be added as defined
by ... yada, yada, yada."

My mistake I guess, I overlooked these coding standards completely. I
will start promoting them within my place of work as of tomorrow :-)
What's hand waving is saying something about stack limits --
you haven't even suggested a way in which that *COULD* be possible
to put down on a piece of paper.
Ah, the "tit-for-tat" technique...!

So far, nobody has challenged me to come up with a specific phrasing for
this (which actually surprises me). Here you go. It's a bit rough, but
you get the idea.

---- Proposed addition to the standard to add stack overflows:
---- Draft

a. For an active function invocation, define an upper bound for the
amount of memory in use by this function as follows (in the
following, denote "Upper Limit on Memory Usage" by "ULMU"):

- For every parameter/local variable of basic type there is a
corresponding constant macro of type size_t:

(unsigned) char: ULMU_CHAR
(unsigned) short: ULMU_SHORT
...
double ULMU_DOUBLE
...
pointer types: ULMU_POINTER

- The ULMU for a struct shall be equal to
the sum of upper limits of its members.
- The ULMU for a union shall be equal to
the maximum of upper limits of its members.
- The ULMU for an array shall be the number
of elements, multiplied by the upper-limit
of its base type.

((the last one is certainly wasteful, but this could be refined))

The ULMU of a function invocation shall be calculated as the
sum of ULMU values of the types of its parameters,
local variables, and return type, PLUS the constant
ULMU_FUNC_OVERHEAD.

b. The implementation shall define a macro ULMU_TOTAL,
which will evaluate to a size_t with value at least
4095. This value is guaranteed to remain constant during
execution of the program.

c. A function invocation is guaranteed to succeed only if the
total summed ULMU values of all active function invocations
(including the to-be active function after invocation)
does not exceed ULMU_TOTAL.
Its *OBVIOUS* what I am talking about above. You just
want something specific to have something arbitrary to disagree with.
You have to trust me on this one: I'm not trying to coerce you into a
laborious writing-down of a precise rule-set, or anything. It's just
that I found that making something concrete (which you intuitively feel
should be possible) gives a lot of insight into the kind of troubles you
would get into when actually doing it.
Just pick a convention -- if something bothers you about certain
operators, spell it out in your convention.
As noted before, I think it would be not a good development if different
organisations settled on a different set of operators, even if they
meticulously documented their convention. Even things as simple as a
wrong (in the sense of: other than you're used to) bracing style can
hinder your productivity for quite some time before you get used to
that. Having a different set of condoned operators at each programming
shop sounds like a small tower of Babel to me.

Just like freeform variable names, there is the same incumberance of the
programmers managing the meaning of the symbols for more generic operators.
You have not made a sufficient case to convince me that there is a real
difference between the two.

...Have you ever laid eyes on a moderately complicated APL program,
without knowing APL? No, but I have looked at the results of the annual obfuscated C code constest.
I have never suggested that you *COULDN'T* abuse the idea. But you can abuse
just about *ANYTHING* in C.
I'm not talking about abuse, I'm talking about the recognition capacity
of unknown operators (which is zilch, and which takes ages to get used
to). Now that's bad enough for languages, but I think it would be a
mistake of quite monumental proportions to allow some sort of C
"dialects" to emerge with different sets of operators, "operator wars",
and so on.
You don't think adding in ||| or &&& would add to
the potential for noise in those contests? I gave a very wordy description of
how to define an operator -- its kind of hard to obfuscate something completely
while you have so many hints (a keyword like _Operator, a whole function-like
declaration, etc) as to what it is. You would have to hide it behind macros to
truly obfuscate it, but you can use macros for arbitrary obfuscation
regardless.
When reading the code that used (rather than defines) the operator, it
helps tremendously if the meaning jumps from the page, rather than
having to look it up in the Big Manual of Operators in use at your local
workplace.
The meanings you gave for these rang some bells with my programming
experience. It leads me to believe you have encountered similar
situations, which gives you some bonus points for credibility in my book
:-) Seriously, I don't think such operators would be a good idea, but
I'd welcome library functions (the old fashioned ones, with names) for them.

Well I'd never use &&& or ||| as you proposed them for any reason whatsoever.
At least with the <<< and @ operators, both add just enough useful
functionality, that I *would* consider using them, knowing I would be non-
portable. That's the point of all this -- its all just opinion. With
redefinable operators someone's arbitrary opinion doesn't enter into it -- you
can make it conform to whatever opinion you have.

Too much freedom is a dangerous thing, in programming at least.
If you disagree with someone else's opinion, you can even *change* it,
subject only to your control over the source.
It would make for an interesting experiment, sociologically :-)

A small hypothetical case study. Suppose engineer A, working on nuclear
reactor core controllers, works in a team where "?<" means "minimum" and
"#<" means "x is implied by y" for some data structure involved in logic
reasoning on the state of the system. Now he switches companies; he goes
to work on electronic toys and gadgets. Here the convention is
different, but the same operators are used.

Not much hard could be done, surely. Not unless someone from the toy
company joins the nucleare reactor programming team, that is :)

Do you think perhaps its possible to use an arbitrarily extendable operator
mechanism in order to *clarify* or make code actually more maintainable?

A resounding 'no'. It's an interesting thought, but for me it falls
short on technical and psychological grounds, as I previously tried to
explain.

You haven't shown me that it falls on technical grounds. You've just said that
its hard to implement in the compiler -- but that's obvious.

You haven't shown me that it can be implemented in a compiler, even in
theory. That should count for something.
[skip] My point is that e-commerce is not a *hardware* problem. At least, since all
hardware supports it, its not a hardware problem. Its a software problem. The
point is the *C* does not offer a solution for e-commerce. The Java people can
point and laugh and say they are better for e-commerce than C, and ironically,
they would be correct (Java has a BigNum class).
The irony would be complete if the PHB-types would fall for that... :-)

Instead the hardware people waste their time on trivial little specifications
like IEEE-754, which the C standards idiots don't bother to look at until 15
years later.

You shouldn't trivialize IEEE-754, it's one hell of a standard, probably
one of the best I have ever seen. You missed the implied sarcasm ...
There's a nice coding convention on operators that covers this, they're
called "emoticons" nowadays.

And as you are probably well aware,
there's a perfectly good reason why its only now that the C standard
committee is looking at it (only now hardware supporting it is widely
available). I really don't see your point here. Only NOW?!??! If by *now*, you mean 1991, then sure. The first IEEE-754
implementation came out in like 1984 or so (the 8087). Within a few years,
everyone else came on board -- Alpha (even though DEC was one of the vendors
originally most vigorously opposed to IEEE-754), Sparc, MIPS, PPC, they are all
IEEE-754, of course.
In '89, support hadn't spread wide enough to merit a statement (much
less an implementation mandate) on it in the standard.

[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.

The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.

What's so bad about using third-party libraries? From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive).
How's that? How are you going to improve things by adding a carry operator?
None of them are portable to platforms which are in
development (you know like might be a concern for things like *SMART CARDS*.)
Now there's an application where the economy of scale suggest doing a
hand-written assembly routine, if I ever saw one...

>Of course Amazon, Yahoo and Ebay and most banks are
>kind of obsessed with them too, even if they don't know it.

I think you would find that bignum operations are a small part of the
load on e-commerce servers.

According to a paper by Intel, widening multiply accounts for something like
30% of the load on typical e-commerce transactions (typing in your credit card
over the net in a way that can't be snooped.)

Reference, please. It was in a presentation at the Intel Developer Forum in 1999 or 2000. I only
have this presentation in print form.
"load on typical e-commerce transaction" != "load on e-commerce server".
I meant the server. Clients have cycles to burn, obviously the client
performance doesn't matter on anything less than Half-life 2.
The question remains: how much time is your typical e-commerce server
involved in an actual transaction? Howe much transactions per second can
a machine handle now? I would suppose that most of the time spent in a
transaction is actually a database lookup at a creditcard company, not a
powermod. The frontend webserver is just waiting for the bank I think.
But these are just half-educated guesses.
If the transaction is simply setup session/send credit-card
information/close session, this could be right. However, a jiffy ago we
were talking SSL.

One single assembly instruction (one *bundle* on Itanium) holds a whole server
> down for 30% of its computation,

Again, you're equating server load with transaction load. Uhh ... both sides have to do the bignum operation.
I don't mean "client load" when I write "transaction load", I'm still
strictly speaking about the server. I could be convinced that the amount
of time spent in powermod for a 1-shot transaction could be as high as
30%, but I'd be hard-pressed to believe a front-end webserver is busy
for more than 10% doing actual transactions. I think the latency of an
e-commerce transaction is mostly caused by frontend<->bank
communication, and intra-bank database lookup. But I cannot back this up.
You have to do it both
ways because of man-in-the-middle attacks. Server load is what we are
concerned with here, obviously (Itanium is *not* a client platform.)

Hang on, are we talking about "overflow" or "carry" here? These are two
different things with signed numbers.

What happens if a is signed and b is unsigned?
My intend was for the operation to follow the semantics of the x86 ADC assembly
instruction.

Fun thing is: it handles both signed and unsigned numbers. Not-so-funny
is the fact that the x86 supports (as do most microprocessors) both an
overflow flag and a carry flag. So, armed with this, please elaborate on
the meaning.

Yes, but I never mentioned/introduced overflow. I only considered carry. For
bignum implementation, OF is not relevant. Only the CF is.

But for signed numbers, carry is almost meaningless. So what remains of
the semantics in this case?

The point is that this instruction is known to be proper for
doing correct bignum additions.

It sure is... There are libraries containing hand-written assembly that
do fine :-) Go look at the original purpose of C again. Its supposed to be a portable
assembler. This is a job for C.
I would usually agree on statements like this, but here's an exception.
I still have trouble envisioning carry semantics.

>- var is set to the result of the addition; the remainder if a carry occurs.

What happens if the signedness of var, a, and b are not equal?

It just behaves like the ADC x86 assembly instruction, the details of which I
will not regurgitate here.

So on adding a signed plus unsigned.... The carry gets a value "as if"
the signed value was actually unsigned? Yes. Note that x86 does not have a version of ADC that considers the sign of
its operands in any special way that is reflected in the CF.
Of course, but you have to know whether your operands are signed or
unsigned to see whether the CF or the OF is the important flag.
[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.

It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)
My x86 is not too strong, but I am quite well-versed in 6502, 68k and
PowerPC, which enables me to think about these things. They all share
carry and overflow bits in their designs, with little difference. After
performing an add-with-carry, both CF and OF are set (although the V
flag may not be reset, as in the 6502), but the CF generally does not
provide useful information when adding two numbers that are signed. So I
don't see how

c +< v = a + b

would provide useful information in c, when both a and b are signed, if
it is to copy the carry status.

What happens if the bit-widths of var, a, and b are not equal?

The bit-widths would be converted as if the (a + b) operation were happening in
isolation, to match C language semantics.

Ok, so if either is signed, they will both be signed,

No ... that is not the rule used in C. unsigned supersedes signed.

Ok. That just leaves the case of two signed values, then.

I'm still wondering about the high_order_value() and low_order_value()
semantics I mentioned in my previous post, actually the part which I was
most curious about.

About the "billions of dollars" cost: I've snipped that part if you
don't mind. I like the technical issues much better :-) Let's agree that
there's a lot of money at stake.

Best regards,

Sidney

Nov 13 '05 #137

glen herrmannsfeldt

Chris Torek wrote:
(snip)

In particular, UltraSparc is a 64-bit architecture, with 64-bit
registers. It has a 64-bit MULX instruction, with an unsigned
variant UMULX, but this instruction does 64 x 64 -> 64 multiplies,
discarding the upper 64-bit half of the 128-bit result. The widening multiply is the old 32-bit-compatibility instruction,
which comes in SMULcc and UMULcc variants. It multiplies the lower
32 bits of two registers with the lower-half 32-bit result going to
a third register. (snip) None of this is particularly relevant to C today, which also lacks
a widening multiply -- you have to simulate it with, e.g.: unsigned long result = (unsigned long)op1 * (unsigned long)op2; I do think the UltraSparc design was the wrong choice -- C's lack
of direct access to various instructions does not mean the instructions
are unimportant. At the same time, I do not think C *needs* access
to all instructions. There are other programming languages, and
methods of tying C code into code written in other languages. They
are not portable, and this is not important.

At some point, one of the thinks I look at in a new architecture
is its implementation of (n)*(n)=(2n) multiply, and (2n)/(n)=(n) divide.

I am not so convinced anymore, though. They are primarily used to
write multiple precision arithmetic routines. It isn't that much
harder to write them with the 32 bit versions as the 64 bit versions.

The amount of work necessary to make the hardware is not necessarily
worthwhile for the small advantage that it supplies.

When IBM first added the extended precision (128 bit) floating point
instructions to S/360 and S/370 they did not include a divide
instruction. They found that it was not worth the cost, but instead
supplied a software simulation of it. They also supplied a simulator
for all the extended precision instructions for those processors that
needed it.

If Sun provided either a simulator or subroutine that would do those
operations, that would seem useful enough to me.

-- glen

Nov 14 '05 #138

In article <br**********@news.tudelft.nl>, si****@jigsaw.nl says...

Paul Hsieh wrote:
In article <br**********@news.tudelft.nl>, si****@jigsaw.nl says...
Paul Hsieh wrote: Imagine a *SERVERLESS* online game topology. How do you communicate
"game-moves" in realtime between players in a way that is most efficient
and with the least perceived lag, given a limited amount of bandwidth?
Solve the above problem, and you have a solution for this problem.
(Actually for games you can crank d a little higher than 2 -- the exact
problem that I am looking at is different, and I am loathe to increase d
above 2.)

Just out of curiosity: why should the graph be directed; don't the nodes
have two-way communication?

Yes, but bandwidth is also an issue. If you connect to more and more peers and
just push data out to all of them, then your upload bandwidth just gets split
more and more. d=1 leads to a trivial solution but with unacceptable lag. So
d=2 is the minimum divisor that gives rich enough possibilities for topologies
to try to solve this problem in interesting ways.

So although actually you will, on average, have 4 connections established (two
out, and two in), you are transmitting on two of them, and receiving on the
other two.
[...] And you're not considering a lag metric other than just number of
hops?
Solving variants of Travelling Salesman is next on my list of things that I'll
spend my most masochistic moments on.

You can think about this problem all you like. If you want to know if I am
fully of it or not, see if you can get above n=10,d=2.

Perhaps I will, I'll have some time on my hands round Xmas. Sounds like
a nice problem.

Its nice to think about, not so nice if you actually have to solve it.
Dijkstra's weighted shortest path algorithms were nice for client-server based
problems, but the future is peer 2 peer. I believe that this problem and
problems like it are going to be at the center of a lot of internet technology
going forward.

Ok, that's a case. But wouldn't it have been better then just to make a
program generator (written in C or something else)? Or perhaps use the
m4 macro processor?

Tell me how you do this in a way that's compatible with testing it against 3
different compilers (one of the *OTHER* techniques I use to go after
performance ...) two of which are very unfriendly to being commandline driven,
while still making sense from a debugging point of view?

How about generating C source that you #include from a resident file in
your project, would that be an option?

I don't know that MSVC's project mechanisms can be finagled to compile and
execute one problem just to build source code for another program that you
compile up. With makefiles what you are saying is easy, and I've done it in
the past, but you can see how this kind of thing puts pressure on your
development process. A sufficiently useful preprocessor mechanism would allow
one to write such code generators in a completely portable way.

Its a big deal to me. I don't own any material on the C language that
specifies all of this.

The C standard specifies what is UB and what's not. Grepping an
implementation is a fragile way of dedicing semantics, I would say.

Yes, and what percentage of C programmers in the world do you think have ever
had the standard in their hands that they actually refer to?

Assuming you actually wanted to count the *NODES* of the tree, and were a
little concerned about performance try this:

static int count_nodes_inner (const struct Tree *t) {
return (t->left ? (1 + count_nodes_inner (t->left )) : 0) +
(t->right ? (1 + count_nodes_inner (t->right)) : 0);
}

int count_nodes (const struct Tree *t) {
return t ? count_nodes_innter (t) : 0;
}

as it avoids the leaf node tail calls, which will roughly double the
performance.

Well, I'm glad that you extracted my intended meaning from the name of
the function I used. Good thing I didn't use an operator! ;-)

Yeah, well I noticed that you didn't catch *my* bug? (I missed one detail.)
I'm _still_ talking about 'the stack thing' yes, since you still seem to
think that this is a triviality. I think it's not.

Dude: malloc (65536) is not portable by the same reasoning, and:

If you think that, then you didn't follow my reasoning. I can find out
at compile time if malloc(65536) is allowed, and if it is, I can find
out at runtime if it has succeeded. I can do no such thing with stack
overflows. The program will exhibit all the symptoms of UB without a
proper cause that can be traced down to the standard.
int fib(int n) {
if (n <= 2) return 1;
return fib(n-1) + fib(n-2);
}

isn't going to work in any relevant way in any language, for a call like
f(1000000).
What are you trying to say here? Help me out a bit.

I'm trying to say that *ALL* languages have this issue. The resources required
to retain a call stack are unknown and often difficult to determine from any
reasonable code analysis. Something like "counting the depth" is not very
useful since there are no language provisions from actually counting out to
such a depth. Some compilers implement tail recursion, though doing in all
cases is equivalent to the halting problem -- so you can't even know for sure
when its happening.

If you want a really good case, look up the Ackermann function. It looks very
simple, with the only concrete constant operation being a "+1". Knowing that
there is some kind of stack bound is totally useless for this function. Most
computations of the Ackermann function blow out any fixed sized stack, and even
a fully addressable 64bit machine would not have the required memory to
implement a stack that could compute any of its non-trivial range. The point
being there is no kind of reasonable metric for measuring the stack
requirements that can be generally applied.

My objection to your proposal can simply be restated as "all machines have
limited resources", deal with it. At best you could insist that upon reaching
some limited stack depth that the problem should immediately halt or something
(with stack checking turned on, most x86 compilers provide for this in the form
of a runtime error.)
Ok, I'll try to summarize my understanding of the point you try to make.
Let's consider it a priori desirable that we have a super-duper heap
manager, with all kinds of nice things like seperate heaps. We agree
that this is often useful. Now let's also assume a certain platform
where multithreading/multitasking is so important that the
platform-specific library provides support for it. We can assume that it
also sports a thread-safe malloc (yes?). You'd still want the
super-duper heap manager. Your point is that mandating a better heap
manager by the standard (without even mentioning multithreading) would
force each vendor to implement these, and implement them thread-safe if
they think multithreading is important to have.
Basically, yes.
Now, as I see it, even if _just_ the standard malloc()/free()/etc are
thread-safe, you can build a more powerful heap manager on that alone,
using just standard C features.
This is completely and utterly untrue. *THAT* is my point. None of the
functions I proposed as extensions can be implemented (even in prototypical
way) on top of just malloc()/free()/etc and retain their intended semantics.
(Of course, you could implement them with memory leaks out the wazoo -- i.e.,
just never free anything, but that's clearly not acceptable.)
[...] If the thing needs to be able to run in
a multithreaded environment, you could even do that, with some extra
effort. You need some form of mutex locking, but you can resort to
Lamport's bakery algorithm for that.
I had to look this one up -- that's one of the classic useless non-solutions
(especially if you want to call malloc more than 4 billion times, for example),
which explains why I've never committed it to memory. You still need to be
able to identify process instances, which *BY ITSELF* is not portable. Process
instances identifiers would count as something *BEYOND* just malloc/free/etc.
[...] Surely it won't be pretty, but it is, in principle, possible. That's
my point.
Not its not. Its not portable, that's for sure.

>>Explain to me how you implement malloc() in a *multithreaded* environment
>>portably. You could claim that C doesn't support multithreading, but I highly
>>doubt your going to convince any vendor that they should shut off their
>>multithreading support based on this argument.

>Now your shifting the goalposts.

How so?

Since in your original statement there was no mention of multithreading.

Mentioning the implementation malloc implicitely brings multithreading into the
discussion. That is what you don't understand.

If the malloc returns either NULL or a pointer (i.e., it conforms
minimally to the standard) then, even in a multithreaded environment, it
is quite possible to build a heap manager on top of that.

Why don't you just try to do it. I mean without race conditions, and without
hooking out or using any of the spawn/mutex/etc primitives that you'll find in
the multithreading API. That ought to be a good trick.
[...] If you need
high performance even in such demanding circumstances, then, yes, you'd
need some support from preferably hardware-assisted mutexes or similar.
High performance is just something I realized you could get as a nice side
effect from such extensions. But the functionality enchancements alone are
worth it. *BUT* whether or not this is properly implemented with respect to
the existence of multithreading has no relevance to its performance -- that's
just a question of correctness.
Ok. Let's try to get some closure here, this is eating into my free time
too much for my taste :-)

I would say that writing a more powerful heap manager is possible in
principle, even in a multithreaded environment, as long as you have a
well-implemented malloc() (and friends) at your disposal.
*BZZZT!!!* Wrong.
It would be a drag to do so and it would be slow, but it is possible.
Impossible. Slow, fast or otherwise. Its just impossible.
I think I have a clearer bearing on your statement why it would be good
to put this stuff in the library. You do need mutexes, and if the
implementation supports them, it will work better than a shared-memory
simulated mutex. So for performance reasons, I think your idea of
extending the basic heap manager, by extending its functionality as
mandated by the standard, is a good idea.

That is, as long as I can get my msize() function as well :-)
Well ok, msize() (implemented as _msize() in WATCOM C/C++) does not require
multithreading support. That would be a good addition as well. One has to be
very careful about what kind of implementation impact this has. For example,
if an implementation has an extension which has an heap compatible
automatically growable allocation (for example touching the adjacent memory
page, which would be "paged out" by default, might automatically append it to a
"growable" allocation), then knowing its size might not make a lot of sense.

Don't worry, you can look it up in the man pages. Unlike your
user-defined operators... :-)

You would have to look up "%b" ("%db"? -- whatever) in *UP TO DATE* man pages.
(Is *restrict* or the new *complex* type, from C99 properly documented in
*YOUR* current man pages?)

Yes, as close as gcc is, it does. And in other cases, I have the standard.

But you said you were just going look in your man pages. With my proposal,
there would be no question -- the source would have to be in the project
somewhere. If someone closed their source, well, presumably your going to
expect documentation anyway, if not, you've got the same problem that you
always expect from closed source libraries.

What's hand waving is saying something about stack limits --
you haven't even suggested a way in which that *COULD* be possible
to put down on a piece of paper.

Ah, the "tit-for-tat" technique...!

So far, nobody has challenged me to come up with a specific phrasing for
this (which actually surprises me). Here you go. It's a bit rough, but
you get the idea.

---- Proposed addition to the standard to add stack overflows:
---- Draft

a. For an active function invocation, define an upper bound for the
amount of memory in use by this function as follows (in the
following, denote "Upper Limit on Memory Usage" by "ULMU"):

And what if the local/parameters are not stored, or storable on the stack?
What if an implementation uses multiple stacks depending on the *type* of the
parameter?
- For every parameter/local variable of basic type there is a
corresponding constant macro of type size_t:
You missed all the implicit temporaries required. Homework exercise: for any
fixed number n, create a function which requires at least n implicit
temporaries (not explicitely declared) to implement.
(unsigned) char: ULMU_CHAR
(unsigned) short: ULMU_SHORT
...
double ULMU_DOUBLE
...
pointer types: ULMU_POINTER

- The ULMU for a struct shall be equal to
the sum of upper limits of its members.
This doesn't sound right with implementation defined struct alignment issues.
- The ULMU for a union shall be equal to
the maximum of upper limits of its members.
Doesn't that assume that the implementation implements overlapping union
members? (It makes sense not to overlap floating point with integer if, for
example, the memory units in the CPU internally tag the type of the contents of
the memory in order to allow the FPU and Integer units to execute totally
asynchronously even when both are accessing memory, if say that tag switching
had a high cost.)
- The ULMU for an array shall be the number
of elements, multiplied by the upper-limit
of its base type.

((the last one is certainly wasteful, but this could be refined))

The ULMU of a function invocation shall be calculated as the
sum of ULMU values of the types of its parameters,
local variables, and return type, PLUS the constant
ULMU_FUNC_OVERHEAD.
And what about the number of alloca() calls? alloca() is a common extension
which just eats extra space off the stack dynamically -- the big advantage
being that you don't have to call free or anything like it, for it to be
properly cleaned up. It will be cleaned up upon the function's return (the
simplest implementation of a "freeall".)

Would it not make more sense to remove all these details, and just say that
each function creates a macro named _STACK_USAGE_<functionname> which is not
available inside the scope of any function or prior to the definition of the
function? (minus alloca() calls) This would then allow you to track the stack
usage without (incorrectly, I might add) trying to figure it out from details
that are implementation specific?

Regardless, none of this helps if you have multiple stacks, for example. In
such cases, if *any* stack runs out you are in trouble. But its obvious that
your intention is assume a *unified* stack implementation. The math won't
translate in a portable way.
b. The implementation shall define a macro ULMU_TOTAL,
which will evaluate to a size_t with value at least
4095. This value is guaranteed to remain constant during
execution of the program.
But in implementations that *I* am aware of, the stack size is set at *LINK*
time. So it can't *BE* a macro available at compile time without giving up on
this flexibility. This wouldn't work for me at all, since I *use* this feature
in my own projects. At the very least make this a global variable -- that *IS*
settable at link time.
c. A function invocation is guaranteed to succeed only if the
total summed ULMU values of all active function invocations
(including the to-be active function after invocation)
does not exceed ULMU_TOTAL.
Its *OBVIOUS* what I am talking about above. You just
want something specific to have something arbitrary to disagree with.
You have to trust me on this one: I'm not trying to coerce you into a
laborious writing-down of a precise rule-set, or anything. It's just
that I found that making something concrete (which you intuitively feel
should be possible) gives a lot of insight into the kind of troubles you
would get into when actually doing it.

But *YOU* could have written up a possible coding standard, and explain why you
think such a thing could not work in practice. There are as many coding
standards as programming projects -- picking one is always controvertial and
instantly makes you open for attack.

ust pick a convention -- if something bothers you about certain
operators, spell it out in your convention.

As noted before, I think it would be not a good development if different
organisations settled on a different set of operators, even if they
meticulously documented their convention. Even things as simple as a
wrong (in the sense of: other than you're used to) bracing style can
hinder your productivity for quite some time before you get used to
that. Having a different set of condoned operators at each programming
shop sounds like a small tower of Babel to me.

But I don't understand how this is any different from free form function names.

Just like freeform variable names, there is the same incumberance of the
programmers managing the meaning of the symbols for more generic operators.
You have not made a sufficient case to convince me that there is a real
difference between the two.

...Have you ever laid eyes on a moderately complicated APL program,
without knowing APL?
No, but I have looked at the results of the annual obfuscated C code constest.
I have never suggested that you *COULDN'T* abuse the idea. But you can abuse
just about *ANYTHING* in C.

I'm not talking about abuse, I'm talking about the recognition capacity
of unknown operators (which is zilch, and which takes ages to get used
to). Now that's bad enough for languages, but I think it would be a
mistake of quite monumental proportions to allow some sort of C
"dialects" to emerge with different sets of operators, "operator wars",
and so on.

You mean like function-name wars? Like string library wars (actually that sort
of is happening ...)? The point is that there's no chance that anyone would
mistake an *extensible* operator as being anything other than it is. To
understand it you have to look up its definition, just like a function.

You don't think adding in ||| or &&& would add to
the potential for noise in those contests? I gave a very wordy description of
how to define an operator -- its kind of hard to obfuscate something completely
while you have so many hints (a keyword like _Operator, a whole function-like
declaration, etc) as to what it is. You would have to hide it behind macros to
truly obfuscate it, but you can use macros for arbitrary obfuscation
regardless.

When reading the code that used (rather than defines) the operator, it
helps tremendously if the meaning jumps from the page, rather than
having to look it up in the Big Manual of Operators in use at your local
workplace.

You are completely ignoring that large population of people who happily use
operator overloading today. You don't understand that what you are saying is
nothing more than your very narrow opinion. And you're using your opinion to
try to trump a proposal whose merits lean very heavily on its technical
features.
Too much freedom is a dangerous thing, in programming at least.
Yeah we should all rename our functions as func0000, func0001, etc ...

If you disagree with someone else's opinion, you can even *change* it,
subject only to your control over the source.

It would make for an interesting experiment, sociologically :-)

This "experiment" is already ongoing -- notice how BSD has been forked out the
wazoo, how there are about a million regex implementations, and how Microsoft
has "winsock" as an alternative of UNIX's sockets. The world hasn't ended
because of any of this.
A small hypothetical case study. Suppose engineer A, working on nuclear
reactor core controllers, works in a team where "?<" means "minimum" and
"#<" means "x is implied by y" for some data structure involved in logic
reasoning on the state of the system. Now he switches companies; he goes
to work on electronic toys and gadgets. Here the convention is
different, but the same operators are used.
Or suppose a programmer uses clock() all his life in DOS/Windows as a higher
resolution realtime, yet portable, timer. This programmer then moves to UNIX
and is shocked to find that all mutexes turn in a clock() measurement of 0
microseconds regardless of how long they block! Even Sleep(1000) measures 0
microseconds!

The example you've made up has to do with basic comprehension of the engineer -
- since "?<" would clearly be one of the "extended operators", the programmer
is supposed to look up the its definition. Just like they would have to look
up logMsg() or any other function name that might show up in more than one
project. Just like they are supposed to magically know how clock() behaves on
different OS'es.

Do you think perhaps its possible to use an arbitrarily extendable operator
mechanism in order to *clarify* or make code actually more maintainable?

A resounding 'no'. It's an interesting thought, but for me it falls
short on technical and psychological grounds, as I previously tried to
explain.

You haven't shown me that it falls on technical grounds. You've just said that
its hard to implement in the compiler -- but that's obvious.

You haven't shown me that it can be implemented in a compiler, even in
theory. That should count for something.

I don't think there's a credible assertion that my idea is impossible. You
want me to describe a whole compiler implementation just to support my idea?
As long as one can create a well-defined definition, why should you expect it
would be impossible? Have I created a halting problem or some kind of NP-
complete algorithm or something? -- I don't think so.

>[...] I guess they're pretty satisfied with the bignum
>libs that exist, that provide assembly implementations for all important
>platforms (and even a slow fallback for others). The reality is that
>no-one seems to care except you, on this.

The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.

What's so bad about using third-party libraries?
From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive).

How's that? How are you going to improve things by adding a carry operator?

In any way I choose -- that's the point. Right now the barrier to entry is
very high, because I would have to implement a competitive library in assembly
language. I'm not kidding when I tell you these libraries are pure excrement.
Not in terms of performance -- they both perform reasonably well -- I mean in
terms of how the heck do I use these things?

You see, with language support, an intrepid developer could sacrifice *some*
performance for portability and demonstrate a much nicer and more easily usable
interface. This *competition* might *force* these popular vendors for fix
their interfaces so that they could deliver both the performance and usability
benefits. But because both of these have wads of assembly language for various
architectures, is very difficult and expensive to compete with them. Even if I
made a single x86 implementation, it would be difficult for me to beat their
performance by any measurable margin, and my benefit of having possibly a
better interface would have to be balanced against the fact that I have no
portability, while GMP does (by virtue of having been ported.)

None of them are portable to platforms which are in
development (you know like might be a concern for things like *SMART CARDS*.)

Now there's an application where the economy of scale suggest doing a
hand-written assembly routine, if I ever saw one...

About a year ago, Slumberger gave an amazing talk about implementing pico-java
bytecode into a smart card. It was incredible, because it was obvious that
they would be able to use all the Java tools/libraries, they could simulate it
almost perfectly in software -- mostly inheriting the pico-java architecture,
and because of Java's bignum class, it meant that exposing a widening multiply
and carry propogating add functionality in their smart card would be enough to
automatically have all the software they need for RSA all ready to go.

Do you understand? Although C compiler porting frameworks exist, the lack of a
standard for widening multiply and carry propogating add means they would have
to expose a mechanism themselves, then write all the software themselves just
get to the point of having a bignum library (which Java basically comes with)
before proceeding to their RSA implementation.

"load on typical e-commerce transaction" != "load on e-commerce server".
I meant the server. Clients have cycles to burn, obviously the client
performance doesn't matter on anything less than Half-life 2.

The question remains: how much time is your typical e-commerce server
involved in an actual transaction? Howe much transactions per second can
a machine handle now? I would suppose that most of the time spent in a
transaction is actually a database lookup at a creditcard company, not a
powermod.

No. Its the powermod. One instruction -- 30% of all processor time. The rest
-- including hitting a disk or talking to some authentication mechanism over a
network fits into that the 70% (though obviously you would expect some amount
of DMA overlapping.) Get it? Powermod is slower than disk swapping -- that's
what Intel was measuring, and from my own calculations, their claim is quite
credible. As I recall, you only need a about a hundred simultaneous
transactions for this to show up. Remember, Intel's marketing point is that
they claim Itanium will save you money by using fewer systems per number of
maximum sustained transactions. In a very real sense, their claim is very
credible, based solely on the fact that they have dual fully pipelined widening
multipliers.
[...] The frontend webserver is just waiting for the bank I think.
But these are just half-educated guesses.
No -- servers obviously amortize this over multiple threads, and performing
queued up block transfers. While these things take a long time, they are all
queued, blocked, buffered etc. Serial solutions are not exceptable. Your
guess is way off the mark. This wouldn't take any appreciable webserver CPU
power at all.

If the transaction is simply setup session/send credit-card
information/close session, this could be right. However, a jiffy ago we
were talking SSL.

One single assembly instruction (one *bundle* on Itanium) holds a whole server
down for 30% of its computation,

Again, you're equating server load with transaction load.
Uhh ... both sides have to do the bignum operation.

I don't mean "client load" when I write "transaction load", I'm still
strictly speaking about the server. I could be convinced that the amount
of time spent in powermod for a 1-shot transaction could be as high as
30%, but I'd be hard-pressed to believe a front-end webserver is busy
for more than 10% doing actual transactions.

No. You misunderstood. 30% of *THE WHOLE THING* is in just the powermod. I'm
talking about serving static webpages while some consumer is staring at
advertising or whatever else before they decide to buy whatever. Just their
action of pressing the "checkout" button to buy the stuff in their shopping
card just pegs the server's CPU.

I'm not at all surprised you have a hard time believing this, as your grasp of
multitasking seems tenuous at best.
[...] I think the latency of an
e-commerce transaction is mostly caused by frontend<->bank
communication, and intra-bank database lookup. But I cannot back this up.
The *LATENCY* is something you measure from the client. The latency is not
relevant to the real bottlenecks in the webserver. The client sees the latency
of waiting for the bank, but the server does *NOT* have code like:

while (HeyBankAreYouDoneYet()) {
runSetiAtHome();
}

in it. At the very least there is a blocking operation like "Sleep (1)" in the
place of "runSetiAtHome()". (Hint: the blocking is when a thread comes in
where powermod takes over and spikes the CPU usage.)

>Hang on, are we talking about "overflow" or "carry" here? These are two
>different things with signed numbers.
>What happens if a is signed and b is unsigned?

My intend was for the operation to follow the semantics of the x86 ADC assembly
instruction.

Fun thing is: it handles both signed and unsigned numbers. Not-so-funny
is the fact that the x86 supports (as do most microprocessors) both an
overflow flag and a carry flag. So, armed with this, please elaborate on
the meaning.

Yes, but I never mentioned/introduced overflow. I only considered carry. For
bignum implementation, OF is not relevant. Only the CF is.

But for signed numbers, carry is almost meaningless. So what remains of
the semantics in this case?

Why is the carry meaningless? Ok, you obviously don't know the details of how
bignum is implemented. Ok, a large 2s complement integer is filled with an
array of unsigned sub-integers, except for the one at the top. *That* one is
signed, and that's how you know if the bignum integer is negative or not. The
problem is that the act of addition can cause a roll around at the top word
which requires an expansion of bignum by an extra integer in size. The logic
you use to determine this is just related to examining the carry flag.

It all just comes down to the carry flag. The other flags are not necessary
for bignum libraries. Maybe there's a way to transform the OF into something
that can be used in a bignum library, but I am not aware of it.

But if you want the OF, then why not have another operator for that?

>>- var is set to the result of the addition; the remainder if a carry occurs.
>What happens if the signedness of var, a, and b are not equal?

It just behaves like the ADC x86 assembly instruction, the details of which I
will not regurgitate here.

So on adding a signed plus unsigned.... The carry gets a value "as if"
the signed value was actually unsigned?
Yes. Note that x86 does not have a version of ADC that considers the sign of
its operands in any special way that is reflected in the CF.

Of course, but you have to know whether your operands are signed or
unsigned to see whether the CF or the OF is the important flag.

OF is never the important flag. Its not relevant at all. I believe that
taking into account the signedness of the result and/or the operands and one of
these flags is redudant with the other flag (I'm not 100% sure, but the last
time I cared to look at OF, I believe I resolved never to look at it again for
precisely this reason). So while you could map between them, one suddenly
doesn't become more important in different contexts. I *KNOW* the solution for
implementing bignum with a CF alone (and I believe its the standard way
everyone does it), so OF is just not relevant.

[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.

It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)

My x86 is not too strong, but I am quite well-versed in 6502, 68k and
PowerPC, which enables me to think about these things. They all share
carry and overflow bits in their designs, with little difference. After
performing an add-with-carry, both CF and OF are set (although the V
flag may not be reset, as in the 6502), but the CF generally does not
provide useful information when adding two numbers that are signed. So I
don't see how

c +< v = a + b

would provide useful information in c, when both a and b are signed, if
it is to copy the carry status.

In a bignum library, the signedness of each sub-word is *dynamic* (the top word
is signed, the rest are unsigned.) So regardless of the declaration of the
you want the carry flag of an "as if the operands were unsigned".
Ok. That just leaves the case of two signed values, then.
See above. Carry is not a signedness sensitive semantic.
I'm still wondering about the high_order_value() and low_order_value()
semantics I mentioned in my previous post, actually the part which I was
most curious about.

I wasn't able to suss out a coherent question from your post, so I just deleted
this discussion. In this case, I am just copying the x86's MUL -> EDX:EAX or
AMD64's MUL -> RDX:RAX semantics. But other RISC CPUs usually have MULHI and
MULLO instructions -- they usually use some trickery by which if you issue
these instructions back to back with the same source parameters, then its
actually computed in one widening ALU operation (so that its comparable,
hardware-wise, to what the x86 does.)

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #139

Paul Hsieh wrote:

Yes, but bandwidth is also an issue. If you connect to more and more peers and
just push data out to all of them, then your upload bandwidth just gets split
more and more. d=1 leads to a trivial solution but with unacceptable lag. So
d=2 is the minimum divisor that gives rich enough possibilities for topologies
to try to solve this problem in interesting ways.

So although actually you will, on average, have 4 connections established (two
out, and two in), you are transmitting on two of them, and receiving on the
other two.
Ok.

[...] And you're not considering a lag metric other than just number of
hops? Solving variants of Travelling Salesman is next on my list of things that I'll
spend my most masochistic moments on.
You can think about this problem all you like. If you want to know if I am
fully of it or not, see if you can get above n=10,d=2.

Perhaps I will, I'll have some time on my hands round Xmas. Sounds like
a nice problem.

Its nice to think about, not so nice if you actually have to solve it.
Dijkstra's weighted shortest path algorithms were nice for client-server based
problems, but the future is peer 2 peer. I believe that this problem and
problems like it are going to be at the center of a lot of internet technology
going forward.

Ok, that's a case. But wouldn't it have been better then just to make a
program generator (written in C or something else)? Or perhaps use the
m4 macro processor?

Tell me how you do this in a way that's compatible with testing it against 3
different compilers (one of the *OTHER* techniques I use to go after
performance ...) two of which are very unfriendly to being commandline driven,
while still making sense from a debugging point of view?

How about generating C source that you #include from a resident file in
your project, would that be an option?

I don't know that MSVC's project mechanisms can be finagled to compile and
execute one problem just to build source code for another program that you
compile up. With makefiles what you are saying is easy, and I've done it in
the past, but you can see how this kind of thing puts pressure on your
development process. A sufficiently useful preprocessor mechanism would allow
one to write such code generators in a completely portable way.

Agreed.

Its a big deal to me. I don't own any material on the C language that
specifies all of this.

The C standard specifies what is UB and what's not. Grepping an
implementation is a fragile way of dedicing semantics, I would say.

Yes, and what percentage of C programmers in the world do you think have ever
had the standard in their hands that they actually refer to?

I think there are relatively few excuses for any professional C
programmer not to have access to the standard (and reading it once or
twice wouldn't hurt either). It's only 18 bucks in electronic form, and
you don't even have to go out to get it.

Assuming you actually wanted to count the *NODES* of the tree, and were a
little concerned about performance try this:

static int count_nodes_inner (const struct Tree *t) {
return (t->left ? (1 + count_nodes_inner (t->left )) : 0) +
(t->right ? (1 + count_nodes_inner (t->right)) : 0);
}

int count_nodes (const struct Tree *t) {
return t ? count_nodes_innter (t) : 0;
}

as it avoids the leaf node tail calls, which will roughly double the
performance.

Well, I'm glad that you extracted my intended meaning from the name of
the function I used. Good thing I didn't use an operator! ;-)

Yeah, well I noticed that you didn't catch *my* bug? (I missed one detail.)

I'm _still_ talking about 'the stack thing' yes, since you still seem to
think that this is a triviality. I think it's not.

Dude: malloc (65536) is not portable by the same reasoning, and:

If you think that, then you didn't follow my reasoning. I can find out
at compile time if malloc(65536) is allowed, and if it is, I can find
out at runtime if it has succeeded. I can do no such thing with stack
overflows. The program will exhibit all the symptoms of UB without a
proper cause that can be traced down to the standard.

int fib(int n) {
if (n <= 2) return 1;
return fib(n-1) + fib(n-2);
}

isn't going to work in any relevant way in any language, for a call like
f(1000000).

What are you trying to say here? Help me out a bit.

I'm trying to say that *ALL* languages have this issue. The resources required
to retain a call stack are unknown and often difficult to determine from any
reasonable code analysis. Something like "counting the depth" is not very
useful since there are no language provisions from actually counting out to
such a depth. Some compilers implement tail recursion, though doing in all
cases is equivalent to the halting problem -- so you can't even know for sure
when its happening.

If you want a really good case, look up the Ackermann function. It looks very
simple, with the only concrete constant operation being a "+1". Knowing that
there is some kind of stack bound is totally useless for this function. Most
computations of the Ackermann function blow out any fixed sized stack, and even
a fully addressable 64bit machine would not have the required memory to
implement a stack that could compute any of its non-trivial range. The point
being there is no kind of reasonable metric for measuring the stack
requirements that can be generally applied.

My objection to your proposal can simply be restated as "all machines have
limited resources", deal with it.

Then the standard would have to specify what it means exactly by a
"resource". Standards are no places for vague terminology.
At best you could insist that upon reaching
some limited stack depth that the problem should immediately halt or something
(with stack checking turned on, most x86 compilers provide for this in the form
of a runtime error.)

Ok, I'll try to summarize my understanding of the point you try to make.
Let's consider it a priori desirable that we have a super-duper heap
manager, with all kinds of nice things like seperate heaps. We agree
that this is often useful. Now let's also assume a certain platform
where multithreading/multitasking is so important that the
platform-specific library provides support for it. We can assume that it
also sports a thread-safe malloc (yes?). You'd still want the
super-duper heap manager. Your point is that mandating a better heap
manager by the standard (without even mentioning multithreading) would
force each vendor to implement these, and implement them thread-safe if
they think multithreading is important to have.

Basically, yes.

Now, as I see it, even if _just_ the standard malloc()/free()/etc are
thread-safe, you can build a more powerful heap manager on that alone,
using just standard C features.

This is completely and utterly untrue. *THAT* is my point. None of the
functions I proposed as extensions can be implemented (even in prototypical
way) on top of just malloc()/free()/etc and retain their intended semantics.
(Of course, you could implement them with memory leaks out the wazoo -- i.e.,
just never free anything, but that's clearly not acceptable.)

[...] If the thing needs to be able to run in
a multithreaded environment, you could even do that, with some extra
effort. You need some form of mutex locking, but you can resort to
Lamport's bakery algorithm for that.

I had to look this one up -- that's one of the classic useless non-solutions
(especially if you want to call malloc more than 4 billion times, for example),
which explains why I've never committed it to memory.

We have long longs nowadays. 2^32 is not an important limit.
You still need to be
able to identify process instances, which *BY ITSELF* is not portable. Process
instances identifiers would count as something *BEYOND* just malloc/free/etc.
Why do you need to distinguish process instances? In your API, each call
carries the heap it refers to as a parameter.

[...] Surely it won't be pretty, but it is, in principle, possible. That's
my point.

Not its not. Its not portable, that's for sure.

>>>Explain to me how you implement malloc() in a *multithreaded* environment
>>>portably. You could claim that C doesn't support multithreading, but I highly
>>>doubt your going to convince any vendor that they should shut off their
>>>multithreading support based on this argument.

>>Now your shifting the goalposts.

>How so?

Since in your original statement there was no mention of multithreading.

Mentioning the implementation malloc implicitely brings multithreading into the
discussion. That is what you don't understand.

If the malloc returns either NULL or a pointer (i.e., it conforms
minimally to the standard) then, even in a multithreaded environment, it
is quite possible to build a heap manager on top of that.

Why don't you just try to do it. I mean without race conditions, and without
hooking out or using any of the spawn/mutex/etc primitives that you'll find in
the multithreading API. That ought to be a good trick.

[...] If you need
high performance even in such demanding circumstances, then, yes, you'd
need some support from preferably hardware-assisted mutexes or similar.

High performance is just something I realized you could get as a nice side
effect from such extensions. But the functionality enchancements alone are
worth it. *BUT* whether or not this is properly implemented with respect to
the existence of multithreading has no relevance to its performance -- that's
just a question of correctness.

Ok. Let's try to get some closure here, this is eating into my free time
too much for my taste :-)

I would say that writing a more powerful heap manager is possible in
principle, even in a multithreaded environment, as long as you have a
well-implemented malloc() (and friends) at your disposal.

*BZZZT!!!* Wrong.

It would be a drag to do so and it would be slow, but it is possible.

Impossible. Slow, fast or otherwise. Its just impossible.

If you can explain to me why you need to explicitly distinguish threads
in the memory manager, I'll concede the point.

I think I have a clearer bearing on your statement why it would be good
to put this stuff in the library. You do need mutexes, and if the
implementation supports them, it will work better than a shared-memory
simulated mutex. So for performance reasons, I think your idea of
extending the basic heap manager, by extending its functionality as
mandated by the standard, is a good idea.

That is, as long as I can get my msize() function as well :-)

Well ok, msize() (implemented as _msize() in WATCOM C/C++) does not require
multithreading support. That would be a good addition as well. One has to be
very careful about what kind of implementation impact this has. For example,
if an implementation has an extension which has an heap compatible
automatically growable allocation (for example touching the adjacent memory
page, which would be "paged out" by default, might automatically append it to a
"growable" allocation), then knowing its size might not make a lot of sense.

Don't worry, you can look it up in the man pages. Unlike your
user-defined operators... :-)

You would have to look up "%b" ("%db"? -- whatever) in *UP TO DATE* man pages.
(Is *restrict* or the new *complex* type, from C99 properly documented in
*YOUR* current man pages?)

Yes, as close as gcc is, it does. And in other cases, I have the standard.

But you said you were just going look in your man pages. With my proposal,
there would be no question -- the source would have to be in the project
somewhere. If someone closed their source, well, presumably your going to
expect documentation anyway, if not, you've got the same problem that you
always expect from closed source libraries.

The standard is the canonical place to look for how something should
work. I can't believe I would actually have to argue over that. In fact:
- I won't.

What's hand waving is saying something about stack limits --
you haven't even suggested a way in which that *COULD* be possible
to put down on a piece of paper.

Ah, the "tit-for-tat" technique...!

So far, nobody has challenged me to come up with a specific phrasing for
this (which actually surprises me). Here you go. It's a bit rough, but
you get the idea.

---- Proposed addition to the standard to add stack overflows:
---- Draft

a. For an active function invocation, define an upper bound for the
amount of memory in use by this function as follows (in the
following, denote "Upper Limit on Memory Usage" by "ULMU"):

And what if the local/parameters are not stored, or storable on the stack?

How do you mean? They have to be stored. And I'm presuming they're
stored on the stack.
What if an implementation uses multiple stacks depending on the *type* of the
parameter?
The hypothetical implementation that does this will be out of luck. Not
because the proposal wouldn't work, mind you, but because the ULMU is a
very loose bound. But hey, it beats having no bound whatsoever.

- For every parameter/local variable of basic type there is a
corresponding constant macro of type size_t:

You missed all the implicit temporaries required. Homework exercise: for any
fixed number n, create a function which requires at least n implicit
temporaries (not explicitely declared) to implement.

Ah yes! So it seems. An interesting problem. Have to think about that
for a bit.

(unsigned) char: ULMU_CHAR
(unsigned) short: ULMU_SHORT
...
double ULMU_DOUBLE
...
pointer types: ULMU_POINTER

- The ULMU for a struct shall be equal to
the sum of upper limits of its members.

This doesn't sound right with implementation defined struct alignment issues.

ULMU_xxx can be increased to account for that. We're trying to establish
an upper bound here.

- The ULMU for a union shall be equal to
the maximum of upper limits of its members. Doesn't that assume that the implementation implements overlapping union
members?
Yes. This is a safe assumption (standard quote):

6.7.2.1 #14 The size of a union is sufficient to contain the largest of
its members. The value of at most one of the members can be stored in a
union object at anytime. A pointer to a union object, suitably
converted, points to each of its members (or if a member is a bitfield,
then to the unit in which it resides), and vice versa.
(It makes sense not to overlap floating point with integer if, for
example, the memory units in the CPU internally tag the type of the contents of
the memory in order to allow the FPU and Integer units to execute totally
asynchronously even when both are accessing memory, if say that tag switching
had a high cost.)

- The ULMU for an array shall be the number
of elements, multiplied by the upper-limit
of its base type.

((the last one is certainly wasteful, but this could be refined))

The ULMU of a function invocation shall be calculated as the
sum of ULMU values of the types of its parameters,
local variables, and return type, PLUS the constant
ULMU_FUNC_OVERHEAD.

And what about the number of alloca() calls? alloca() is a common extension
which just eats extra space off the stack dynamically -- the big advantage
being that you don't have to call free or anything like it, for it to be
properly cleaned up. It will be cleaned up upon the function's return (the
simplest implementation of a "freeall".)

Ok, we can add alloca'ed space is each function invocation, plus an
overhead constant per active alloca.
Would it not make more sense to remove all these details, and just say that
each function creates a macro named _STACK_USAGE_<functionname> which is not
available inside the scope of any function or prior to the definition of the
function? (minus alloca() calls) This would then allow you to track the stack
usage without (incorrectly, I might add) trying to figure it out from details
that are implementation specific?
A sound idea.
Regardless, none of this helps if you have multiple stacks, for example. In
such cases, if *any* stack runs out you are in trouble. But its obvious that
your intention is assume a *unified* stack implementation. The math won't
translate in a portable way.
Unless you allow that I'm trying to define an upper bound of total use,
which may be loose as far as I am concerned. The idiomatic use would be
to compare this to the ULMU_TOTAL to see if it fits. This may yield the
number of bytes on the smallest check for all I care. On those silly
platforms, you will grossly underestimate stack capacity now and then.
Still beats the current situation.

b. The implementation shall define a macro ULMU_TOTAL,
which will evaluate to a size_t with value at least
4095. This value is guaranteed to remain constant during
execution of the program. But in implementations that *I* am aware of, the stack size is set at *LINK*
time. So it can't *BE* a macro available at compile time without giving up on
this flexibility. This wouldn't work for me at all, since I *use* this feature
in my own projects. At the very least make this a global variable -- that *IS*
settable at link time.
My wording was chosen explicitly to allow for this. Where am I saying
ULMU_TOTAL evaluates to a constant at compile time?
c. A function invocation is guaranteed to succeed only if the
total summed ULMU values of all active function invocations
(including the to-be active function after invocation)
does not exceed ULMU_TOTAL.

Its *OBVIOUS* what I am talking about above. You just
want something specific to have something arbitrary to disagree with.

You have to trust me on this one: I'm not trying to coerce you into a
laborious writing-down of a precise rule-set, or anything. It's just
that I found that making something concrete (which you intuitively feel
should be possible) gives a lot of insight into the kind of troubles you
would get into when actually doing it.

But *YOU* could have written up a possible coding standard, and explain why you
think such a thing could not work in practice. There are as many coding
standards as programming projects -- picking one is always controvertial and
instantly makes you open for attack.

Just pick a convention -- if something bothers you about certain
operators, spell it out in your convention.

As noted before, I think it would be not a good development if different
organisations settled on a different set of operators, even if they
meticulously documented their convention. Even things as simple as a
wrong (in the senseof: other than you're used to) bracing style can
hinder your productivity for quite some time before you get used to
that. Having a different set of condoned operators at each programming
shop sounds like a small tower of Babel to me.

But I don't understand how this is any different from free form function names.

Well, what more can I say.

>Just like freeform variable names, there is the same incumberance of the
>programmers managing the meaning of the symbols for more generic operators.
>You have not made a sufficient case to convince me that there is a real
>difference between the two.

...Have you ever laid eyes on a moderately complicated APL program,
without knowing APL?

No, but I have looked at the results of the annual obfuscated C code constest.
I have never suggested that you *COULDN'T* abuse the idea. But you can abuse
just about *ANYTHING* in C.

I'm not talking about abuse, I'm talking about the recognition capacity
of unknown operators (which is zilch, and which takes ages to get used
to). Now that's bad enough for languages, but I think it would be a
mistake of quite monumental proportions to allow some sort of C
"dialects" to emerge with different sets of operators, "operator wars",
and so on.

You mean like function-name wars? Like string library wars (actually that sort
of is happening ...)? The point is that there's no chance that anyone would
mistake an *extensible* operator as being anything other than it is. To
understand it you have to look up its definition, just like a function.
You don't think adding in ||| or &&& would add to
the potential for noise in those contests? I gave a very wordy description of
how to define an operator -- its kind of hard to obfuscate something completely
while you have so many hints (a keyword like _Operator, a whole function-like
declaration, etc) as to what it is. You would have to hide it behind macros to
truly obfuscate it, but you can use macros for arbitrary obfuscation
regardless.

When reading the code that used (rather than defines) the operator, it
helps tremendously if the meaning jumps from the page, rather than
having to look it up in the Big Manual of Operators in use at your local
workplace.

You are completely ignoring that large population of people who happily use
operator overloading today. You don't understand that what you are saying is
nothing more than your very narrow opinion. And you're using your opinion to
try to trump a proposal whose merits lean very heavily on its technical
features.

My opionion is determined largely by the psychological objections (to
which you don't subscribe). My technical problem is that I can see no
way that this could be implemented.

Too much freedom is a dangerous thing, in programming at least.

Yeah we should all rename our functions as func0000, func0001, etc ...

If you disagree with someone else's opinion, you can even *change* it,
subject only to your control over the source.

It would make for an interesting experiment, sociologically :-)

This "experiment" is already ongoing -- notice how BSD has been forked out the
wazoo, how there are about a million regex implementations, and how Microsoft
has "winsock" as an alternative of UNIX's sockets. The world hasn't ended
because of any of this.

It's good to keep some perspective...

A small hypothetical case study. Suppose engineer A, working on nuclear
reactor core controllers, works in a team where "?<" means "minimum" and
"#<" means "x is implied by y" for some data structure involved in logic
reasoning on the state of the system. Now he switches companies; he goes
to work on electronic toys and gadgets. Here the convention is
different, but the same operators are used.

Or suppose a programmer uses clock() all his life in DOS/Windows as a higher
resolution realtime, yet portable, timer. This programmer then moves to UNIX
and is shocked to find that all mutexes turn in a clock() measurement of 0
microseconds regardless of how long they block! Even Sleep(1000) measures 0
microseconds!

The example you've made up has to do with basic comprehension of the engineer -
- since "?<" would clearly be one of the "extended operators", the programmer
is supposed to look up the its definition. Just like they would have to look
up logMsg() or any other function name that might show up in more than one
project. Just like they are supposed to magically know how clock() behaves on
different OS'es.

There is a possibility of error caused by bad design (which isn't
trimmed to the type of mistakes we fallible humans make) in your
operator introduction scheme. This is already the case in C++ operator
overloading, your proposal makes it a couple of orders of magnitude worse.

But this is turning into a rehashing exercise. Let's agree to disagree.

>Do you think perhaps its possible to use an arbitrarily extendable operator
>mechanism in order to *clarify* or make code actually more maintainable?

A resounding 'no'. It's an interesting thought, but for me it falls
short on technical and psychological grounds, as I previously tried to
explain.

You haven't shown me that it falls on technical grounds. You've just said that
its hard to implement in the compiler -- but that's obvious.

You haven't shown me that it can be implemented in a compiler, even in
theory. That should count for something.

I don't think there's a credible assertion that my idea is impossible. You
want me to describe a whole compiler implementation just to support my idea?

I listed some specific objections with regards to disambiguation of the
grammer that the parser would have to be able to handle. I don't expect
an implementation, no... But you haven't even addressed these very basic
questions.
As long as one can create a well-defined definition, why should you expect it
would be impossible? Have I created a halting problem or some kind of NP-
complete algorithm or something? -- I don't think so.
Look up "ambiguous grammer" in the Dragon book, or any other book on
compilers. That should make you think once or twice.

>>[...] I guess they're pretty satisfied with the bignum
>>libs that exist, that provide assembly implementations for all important
>>platforms (and even a slow fallback for others). The reality is that
>>no-one seems to care except you, on this.

>The hardware people care that it exists, and not about the form of its
>existence, so long as it gets used. For anyone who wants to *USE* this
>functionality, though, they are stuck with assembly, or third party libraries.

What's so bad about using third-party libraries?

From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive).

How's that? How are you going to improve things by adding a carry operator?

In any way I choose -- that's the point. Right now the barrier to entry is
very high, because I would have to implement a competitive library in assembly
language. I'm not kidding when I tell you these libraries are pure excrement.
Not in terms of performance -- they both perform reasonably well -- I mean in
terms of how the heck do I use these things?

Can't speak for RSA's library, but I don't see what your problem is with
GMP.
You see, with language support, an intrepid developer could sacrifice *some*
performance for portability and demonstrate a much nicer and more easily usable
interface. This *competition* might *force* these popular vendors for fix
their interfaces so that they could deliver both the performance and usability
benefits. But because both of these have wads of assembly language for various
architectures, is very difficult and expensive to compete with them. Even if I
made a single x86 implementation, it would be difficult for me to beat their
performance by any measurable margin, and my benefit of having possibly a
better interface would have to be balanced against the fact that I have no
portability, while GMP does (by virtue of having been ported.)

None of them are portable to platforms which are in
development (you know like might be a concern for things like *SMART CARDS*.)
Now there's an application where the economy of scale suggest doing a
hand-written assembly routine, if I ever saw one...

About a year ago, Slumberger gave an amazing talk about implementing pico-java
bytecode into a smart card. It was incredible, because it was obvious that
they would be able to use all the Java tools/libraries, they could simulate it
almost perfectly in software -- mostly inheriting the pico-java architecture,
and because of Java's bignum class, it meant that exposing a widening multiply
and carry propogating add functionality in their smart card would be enough to
automatically have all the software they need for RSA all ready to go.

....So the smartcard presumes it's inserted into a machine that has these
classes available? They don't reside on the card itself.... Well, they
should be using Java then, I suppose. No need to augment C.
Do you understand? Although C compiler porting frameworks exist, the lack of a
standard for widening multiply and carry propogating add means they would have
to expose a mechanism themselves, then write all the software themselves just
get to the point of having a bignum library (which Java basically comes with)
before proceeding to their RSA implementation.

"load on typical e-commerce transaction" != "load on e-commerce server".
I meant the server. Clients have cycles to burn, obviously the client
performance doesn't matter on anything less than Half-life 2.

The question remains: how much time is your typical e-commerce server
involved in an actual transaction? Howe much transactions per second can
a machine handle now? I would suppose that most of the time spent in a
transaction is actually a database lookup at a creditcard company, not a
powermod.

No. Its the powermod. One instruction -- 30% of all processor time. The rest
-- including hitting a disk or talking to some authentication mechanism over a
network fits into that the 70% (though obviously you would expect some amount
of DMA overlapping.) Get it? Powermod is slower than disk swapping -- that's
what Intel was measuring, and from my own calculations, their claim is quite
credible. As I recall, you only need a about a hundred simultaneous
transactions for this to show up. Remember, Intel's marketing point is that
they claim Itanium will save you money by using fewer systems per number of
maximum sustained transactions. In a very real sense, their claim is very
credible, based solely on the fact that they have dual fully pipelined widening
multipliers.

[...] The frontend webserver is just waiting for the bank I think.
But these are just half-educated guesses.

No -- servers obviously amortize this over multiple threads, and performing
queued up block transfers. While these things take a long time, they are all
queued, blocked, buffered etc. Serial solutions are not exceptable. Your
guess is way off the mark. This wouldn't take any appreciable webserver CPU
power at all.

If the transaction is simply setup session/send credit-card
information/close session, this could be right. However, a jiffy ago we
were talking SSL.
>One single assembly instruction (one *bundle* on Itanium) holds a whole server
>down for 30% of its computation,

Again, you're equating server load with transaction load.

Uhh ... both sides have to do the bignum operation.

I don't mean "client load" when I write "transaction load", I'm still
strictly speaking about the server. I could be convinced that the amount
of time spent in powermod for a 1-shot transaction could be as high as
30%, but I'd be hard-pressed to believe a front-end webserver is busy
for more than 10% doing actual transactions.

No. You misunderstood. 30% of *THE WHOLE THING* is in just the powermod. I'm
talking about serving static webpages while some consumer is staring at
advertising or whatever else before they decide to buy whatever. Just their
action of pressing the "checkout" button to buy the stuff in their shopping
card just pegs the server's CPU.

I'm not at all surprised you have a hard time believing this, as your grasp of
multitasking seems tenuous at best.

Hmmm... sweet :-)

[...] I think the latency of an
e-commerce transaction is mostly caused by frontend<->bank
communication, and intra-bank database lookup. But I cannot back this up.

The *LATENCY* is something you measure from the client. The latency is not
relevant to the real bottlenecks in the webserver. The client sees the latency
of waiting for the bank, but the server does *NOT* have code like:

while (HeyBankAreYouDoneYet()) {
runSetiAtHome();
}

in it. At the very least there is a blocking operation like "Sleep (1)" in the
place of "runSetiAtHome()". (Hint: the blocking is when a thread comes in
where powermod takes over and spikes the CPU usage.)

Ok.

>>Hang on, are we talking about "overflow" or "carry" here? These are two
>>different things with signed numbers.
>>What happens if a is signed and b is unsigned?

>My intend was for the operation to follow the semantics of the x86 ADC assembly
>instruction.

Fun thing is: it handles both signed and unsigned numbers. Not-so-funny
is the fact that the x86 supports (as do most microprocessors) both an
overflow flag and a carry flag. So, armed with this, please elaborate on
the meaning.

Yes, but I never mentioned/introduced overflow. I only considered carry. For
bignum implementation, OF is not relevant. Only the CF is.

But for signed numbers, carry is almost meaningless. So what remains of
the semantics in this case?

Why is the carry meaningless? Ok, you obviously don't know the details of how
bignum is implemented.

You're really better off not inferring what I know and don't know :-)
Ok, a large 2s complement integer is filled with an
array of unsigned sub-integers, except for the one at the top. *That* one is
signed, and that's how you know if the bignum integer is negative or not. The
problem is that the act of addition can cause a roll around at the top word
which requires an expansion of bignum by an extra integer in size. The logic
you use to determine this is just related to examining the carry flag.
Great. Now what has all this to do with defining a sensible signed <+
semantics for use in C (you know, a general purpose language, not tied
to your particular pet peeve problem).
It all just comes down to the carry flag. The other flags are not necessary
for bignum libraries. Maybe there's a way to transform the OF into something
that can be used in a bignum library, but I am not aware of it.
Because, we're talking about expanding the C language here (not about
your fairy-tale operator introduction hocus-pocus). You need decent
semantics.
But if you want the OF, then why not have another operator for that?
?

>>>- var is set to the result of the addition; the remainder if a carry occurs.
>>
>>What happens if the signedness of var, a, and b are not equal?

>It just behaves like the ADC x86 assembly instruction, the details of which I
>will not regurgitate here.

So on adding a signed plus unsigned.... The carry gets a value "as if"
the signed value was actually unsigned?

Yes. Note that x86 does not have a version of ADC that considers the sign of
its operands in any special way that is reflected in the CF.

Of course, but you have to know whether your operands are signed or
unsigned to see whether the CF or the OF is the important flag.

OF is never the important flag. Its not relevant at all.

I guess that's why everything since the 4004 supports it.

You must really consider that there's more to life than multiplying big
numbers (and that's coming from one of the few people that sees the fun
in it). We're not going to extend C with an operator to fit this (or
ANY) one purpose, however important it may be.
I believe that
taking into account the signedness of the result and/or the operands and one of
these flags is redudant with the other flag (I'm not 100% sure, but the last
time I cared to look at OF, I believe I resolved never to look at it again for
precisely this reason). So while you could map between them, one suddenly
doesn't become more important in different contexts. I *KNOW* the solution for
implementing bignum with a CF alone (and I believe its the standard way
everyone does it), so OF is just not relevant.

[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.
It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)

My x86 is not too strong, but I am quite well-versed in 6502, 68k and
PowerPC, which enables me to think about these things. They all share
carry and overflow bits in their designs, with little difference. After
performing an add-with-carry, both CF and OF are set (although the V
flag may not be reset, as in the 6502), but the CF generally does not
provide useful information when adding two numbers that are signed. So I
don't see how

c +< v = a + b

would provide useful information in c, when both a and b are signed, if
it is to copy the carry status.

In a bignum library, the signedness of each sub-word is *dynamic* (the top word
is signed, the rest are unsigned.) So regardless of the declaration of the
you want the carry flag of an "as if the operands were unsigned".

Ok. That just leaves the case of two signed values, then.

See above. Carry is not a signedness sensitive semantic.

I'm still wondering about the high_order_value() and low_order_value()
semantics I mentioned in my previous post, actually the part which I was
most curious about.

I wasn't able to suss out a coherent question from your post, so I just deleted
this discussion.

....And there I was thinking you had trouble answering it. What was I
thinking?
In this case, I am just copying the x86's MUL -> EDX:EAX or
AMD64's MUL -> RDX:RAX semantics.
So again (as with the carry vs. overflow), you're going to perform an
instruction that works on UNsigned integers even on signed integers.
Pardon my French, but that's just plain stupid. Perhaps you should be a
little less confident about your own understanding of bit-twiddling...
But other RISC CPUs usually have MULHI and
MULLO instructions -- they usually use some trickery by which if you issue
these instructions back to back with the same source parameters, then its
actually computed in one widening ALU operation (so that its comparable,
hardware-wise, to what the x86 does.)

Best regards,

Sidney

Nov 14 '05 #140

Sidney Cadot wrote:

Paul Hsieh wrote:
Its a big deal to me. I don't own any material on the C language
that specifies all of this.

The C standard specifies what is UB and what's not. Grepping an
implementation is a fragile way of dedicing semantics, I would say.
Yes, and what percentage of C programmers in the world do you think have ever
had the standard in their hands that they actually refer to?

I think there are relatively few excuses for any professional C
programmer not to have access to the standard (and reading it once or
twice wouldn't hurt either). It's only 18 bucks in electronic form,
and you don't even have to go out to get it.

Care to take a quick survey of comp.lang.c patrons who own their own copy of
the standard?

My objection to your proposal can simply be restated as "all
machines have limited resources", deal with it.

Then the standard would have to specify what it means exactly by a
"resource". Standards are no places for vague terminology.

You mean vague terminologies like "stack"?

[...] If the thing needs to be able to run in
a multithreaded environment, you could even do that, with some
extra effort. You need some form of mutex locking, but you can
resort to Lamport's bakery algorithm for that.

I had to look this one up -- that's one of the classic useless
non-solutions (especially if you want to call malloc more than 4
billion times, for example), which explains why I've never
committed it to memory.

We have long longs nowadays. 2^32 is not an important limit.

And long long's are portable?!?! Are you even following your own
train of thought?

You still need to be able to identify process instances, which *BY
ITSELF* is not portable. Process instances identifiers would
count as something *BEYOND* just malloc/free/etc.

Why do you need to distinguish process instances?

Reread the Lamport's bakery algorithm. One of the crucial steps is
"for each process, check their number, and pick the max, then add 1".
Another is "for each process that has a lower number, wait in line
behind them." This means you have a way of knowing how many processes
there are, and how to index them. This means you have to hook out or
wrap whatever "spawn" mechanism you have in your language (or use some
other platform specific mechanism to iterate through all running
threads.)
[...] In your API, each call carries the heap it refers to as a
parameter.
Yes, but if two threads simultaneously call heap functions on the same
heap (heaps are not restricted to being distinct across threads) then
there is a race condition.

I would say that writing a more powerful heap manager is possible in
principle, even in a multithreaded environment, as long as you have a
well-implemented malloc() (and friends) at your disposal.

*BZZZT!!!* Wrong.
It would be a drag to do so and it would be slow, but it is possible.

Impossible. Slow, fast or otherwise. Its just impossible.

If you can explain to me why you need to explicitly distinguish threads
in the memory manager, I'll concede the point.

Go look up the source code for *ANY* multithreading supporting
malloc/free. I absolutely guarantee that every single one of them
uses some kind of platform specific exclusion object to make sure that
two simultaneous calls to the heap do not collide.

One way or another the heap is some kind of quasi-linked list of memory
entries. You should have learned from your OS classes (assuming you passed it
and/or comprehended the material) that all ADTs, including linked-lists have to
be protected by some sort of critical section like mechanism if more than one
thread has access to it. If you do not understand this, then you don't
understand multithreading.

So far, nobody has challenged me to come up with a specific phrasing for
this (which actually surprises me). Here you go. It's a bit rough, but
you get the idea.

---- Proposed addition to the standard to add stack overflows:
---- Draft

a. For an active function invocation, define an upper bound for the
amount of memory in use by this function as follows (in the
following, denote "Upper Limit on Memory Usage" by "ULMU"):

And what if the local/parameters are not stored, or storable on the stack?

How do you mean? They have to be stored. And I'm presuming they're
stored on the stack.

Most microprocessors have these temporary storage elements called
"registers". They are often not part of any stack structure. Many modern
compilers will map local variables and temporaries to these "registers", thus
not requiring stack storage for them.

What if an implementation uses multiple stacks depending on the
*type* of the parameter?

The hypothetical implementation [...]

That the ANSI C standard currently does not have any problem with ...
[...] that does this will be out of luck. Not because the proposal
wouldn't work, mind you, but because the ULMU is a very loose bound.
But hey, it beats having no bound whatsoever.
No it doesn't. Dealing with the stack is an area that clearly should be
platform specific. For example, it could be that an embedded system has
fast and slow ram. Certain data types might be best suited to being mapped
into fast RAM, whether they are local or not, but you might have a lot more
slow RAM, so you'd put all the rest of the stuff in there.

- For every parameter/local variable of basic type there is a
corresponding constant macro of type size_t:

You missed all the implicit temporaries required. Homework
exercise: for any fixed number n, create a function which requires
at least n implicit temporaries (not explicitely declared) to
implement.

Ah yes! So it seems. An interesting problem. Have to think about that
for a bit.

Why don't you think about the implication of the results of this
exercise with respect to your proposal while you are at it.

And what about the number of alloca() calls? alloca() is a common
extension which just eats extra space off the stack dynamically --
the big advantage being that you don't have to call free or
anything like it, for it to be properly cleaned up. It will be
cleaned up upon the function's return (the simplest implementation
of a "freeall".)

Ok, we can add alloca'ed space is each function invocation, plus an
overhead constant per active alloca.

The alloca() calls can be in a loop. Knowing how much memory a set of
alloca's will perform in any function is equivalent to solving the
halting problem.
Unless you allow that I'm trying to define an upper bound of total use,
which may be loose as far as I am concerned. The idiomatic use would be
to compare this to the ULMU_TOTAL to see if it fits. This may yield the
number of bytes on the smallest check for all I care. On those silly
platforms, you will grossly underestimate stack capacity now and then.
[...] Still beats the current situation.
This versus just living with runtime stack checking? I'll take the
runtime stack checking.

When reading the code that used (rather than defines) the operator, it
helps tremendously if the meaning jumps from the page, rather than
having to look it up in the Big Manual of Operators in use at your local
workplace.

You are completely ignoring that large population of people who
happily use operator overloading today. You don't understand that
what you are saying is nothing more than your very narrow opinion.
And you're using your opinion to try to trump a proposal whose
merits lean very heavily on its technical features.

My opionion is determined largely by the psychological objections (to
which you don't subscribe.)

Says who? You see, unlike some people, by idea isn't driven by an
opinion. Its driven by a consideration for design. Since you *CAN'T*
add just any old operator to satisfy everyone, then how do you satisfy
the demand for more operators? Some variation on edefinable
operators or operator overloading is the logical conclusion -- just
fix its problems (since type-index functions don't really exist in C,
so you just add in a bunch of them so that you don't lose any of the
ones you have). For all I know, I might not ever use such a feature,
even if it *WERE* to be endorsed by the standard.
My technical problem is that I can see no way that this could be
implemented.
Well I don't know what to make of this -- you can't see a race
condition when its staring you in the face, and you can't imagine
problems with knowing the size of a function's stack usage.

A small hypothetical case study. Suppose engineer A, working on
nuclear reactor core controllers, works in a team where "?<"
means "minimum" and "#<" means "x is implied by y" for some data
structure involved in logic reasoning on the state of the
system. Now he switches companies; he goes to work on electronic
toys and gadgets. Here the convention is different, but the same
operators are used.

The example you've made up has to do with basic comprehension of
the engineer -- since "?<" would clearly be one of the "extended
operators", the programmer is supposed to look up the its
definition. Just like they would have to look up logMsg() or any
other function name that might show up in more than one project.
Just like they are supposed to magically know how clock() behaves
on different OS'es.

There is a possibility of error caused by bad design (which isn't
trimmed to the type of mistakes we fallible humans make) in your
operator introduction scheme.

Same is true with free form function names, of course.

You haven't shown me that it can be implemented in a compiler,
even in theory. That should count for something.

I don't think there's a credible assertion that my idea is
impossible. You want me to describe a whole compiler
implementation just to support my idea?

I listed some specific objections with regards to disambiguation of
the grammer that the parser would have to be able to handle.

Even without a full grammar being specified? That's a pretty good trick.

About a year ago, Slumberger gave an amazing talk about implementing pico-java
bytecode into a smart card. It was incredible, because it was obvious that
they would be able to use all the Java tools/libraries, they could simulate it
almost perfectly in software -- mostly inheriting the pico-java architecture,
and because of Java's bignum class, it meant that exposing a widening multiply
and carry propogating add functionality in their smart card would be enough to
automatically have all the software they need for RSA all ready to go.

...So the smartcard presumes it's inserted into a machine that has these
classes available?

No ...
[...] They don't reside on the card itself....
Yes they do. That's the point of creating a pico-java assembly
language. Its so that you just have to upload a minimal java kernel to
it, and all of a sudden you have the whole world's Java libraries at your
disposal.
[...] Well, they
should be using Java then, I suppose. No need to augment C.
You misunderstand the point. Most of this kind of development
actually *does* happen in C + assembly language. Slumberger just
demonstrated that using Java is far superior, for the more general
reason that they get to inherit the entire Java infrastructure to help
them design their smart card software. But hidden in all of this is
the fact that C is further penalized quite specifically on the issue
of bignums which become a huge pain in the ass to deal with. There is
*NO* infrastructure for porting bignum functionality in C, using
embedded frameworks, gcc, or otherwise. Its all just "roll your own"
territory.

Ok, a large 2s complement integer is filled with an
array of unsigned sub-integers, except for the one at the top. *That* one is
signed, and that's how you know if the bignum integer is negative or not. The
problem is that the act of addition can cause a roll around at the top word
which requires an expansion of bignum by an extra integer in size. The logic
you use to determine this is just related to examining the carry flag.

Great. Now what has all this to do with defining a sensible signed <+
semantics for use in C (you know, a general purpose language, not tied
to your particular pet peeve problem).

Oh yeah my particular pet peeve problem, which only every serious CPU
company on the planet is only too willing to solve for me.

It all just comes down to the carry flag. The other flags are not
necessary for bignum libraries. Maybe there's a way to transform
the OF into something that can be used in a bignum library, but I
am not aware of it.

Because, we're talking about expanding the C language here (not about
your fairy-tale operator introduction hocus-pocus). You need decent
semantics.

CPU companies have laid out the decent semantics. I and others have
written bignum routines -- the carry flag is where its at as far as
this is concerned.

>>>>- var is set to the result of the addition; the remainder if
>>>>a carry occurs.
>>>
>>>What happens if the signedness of var, a, and b are not equal?
>
>>It just behaves like the ADC x86 assembly instruction, the
>>details of which I will not regurgitate here.
>
>So on adding a signed plus unsigned.... The carry gets a value
>"as if" the signed value was actually unsigned?

Yes. Note that x86 does not have a version of ADC that considers
the sign of its operands in any special way that is reflected in
the CF.

Of course, but you have to know whether your operands are signed or
unsigned to see whether the CF or the OF is the important flag.

OF is never the important flag. Its not relevant at all.

I guess that's why everything since the 4004 supports it.

That's because OF is used for <,>,<=,>= comparisons. Outside of
this, I have never seen OF being used for any purpose. I.e., I would
surmise that the only relevant usage of OF is already encoded in the
language semantics of C (and just about any other language with
integers.) But I could be wrong -- can you name at least one other
purpose which isn't really just an encoding of <,>,<=,>= ?
You must really consider that there's more to life than multiplying big
numbers (and that's coming from one of the few people that sees the fun
in it). We're not going to extend C with an operator to fit this (or
ANY) one purpose, however important it may be.
Carry flag comes up time and again. Just look it up on google:

http://www.google.com/search?hl=en&lr=&ie=ISO-8859-
1&safe=off&c2coff=1&q=jc+ret+mov

Off the top of my head, there's chain shifting, bit counting, and
predication/masking.

I'm still wondering about the high_order_value() and low_order_value()
semantics I mentioned in my previous post, actually the part which I was
most curious about.

I wasn't able to suss out a coherent question from your post, so I just deleted
this discussion.

...And there I was thinking you had trouble answering it. What was I
thinking?
In this case, I am just copying the x86's MUL -> EDX:EAX or
AMD64's MUL -> RDX:RAX semantics.

So again (as with the carry vs. overflow), you're going to perform an
instruction that works on UNsigned integers even on signed integers.

Yes.
Pardon my French, but that's just plain stupid. Perhaps you should be a
little less confident about your own understanding of bit-twiddling...

Perhaps you should consider that there's a reason every assembly
language in existence associated signedness with particular
*OPERATIONS* not the *OPERANDS*. Having signed and non-signed versions of
every operator may be something that fit C when it was first designed, but that
doesn't make it necessarily correct for all relevant computations.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #141

Paul Hsieh wrote:

Sidney Cadot wrote:
Paul Hsieh wrote:
> .... snip ...>
> Of course, but you have to know whether your operands are
> signed or unsigned to see whether the CF or the OF is the
> important flag.

OF is never the important flag. Its not relevant at all.

I guess that's why everything since the 4004 supports it.

That's because OF is used for <,>,<=,>= comparisons. Outside of
this, I have never seen OF being used for any purpose. I.e., I
would surmise that the only relevant usage of OF is already
encoded in the language semantics of C (and just about any other
language with integers.) But I could be wrong -- can you name
at least one other purpose which isn't really just an encoding
of <,>,<=,>= ?

How about its named purpose, detection of overflow? I have
written x86 code generators that emitted INTO after every signed
integer operation.

Instead of these monstrous articles dealing with every aspect of
everything under the sun, I suggest you cut out single themes,
adjust the subject to suit, and deal with (or argue about) them
one at a time. I see something like seven or eight articles here
between 500 and 1000 lines long, which is ridiculous.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #142

Randy Howard

In article <MP************************@news.sf.sbcglobal.net> ,
qe*@pobox.com says...

Care to take a quick survey of comp.lang.c patrons who own their own copy of
the standard?

I bought the 18 buck pdf version quite some time ago, and more recently
the hardback version including the rationale and TC1 published by Wiley
& Sons. Most the people I work with have at least seen it, own one of
the preliminary drafts, or have sprung for the $18. If you are
claiming to be a professional C programmer that can't afford $18,
you're lying to yourself.

We have long longs nowadays. 2^32 is not an important limit.

And long long's are portable?!?! Are you even following your own
train of thought?

You are right about that, given the lack of gcc everywhere, or a
C99 compiler proper. However, it is indeed trivial to use a couple
typedef's in a header file, say "bigtypes.h" or something, that
define U64 and I64 in a platform-dependent way *IFF* it's not a
C99 implementation that does not support long long directly. A
simple #ifdef chain with an error directive at the end if the
platform isn't yet supported will make the porting effort for a
new platform (which has a local way to support 64-bit types) a
matter of a few minutes, if not seconds.

--
Randy Howard _o
2reply remove FOOBAR \<,
______________________()/ ()______________________________________________
SCO Spam-magnet: po********@sco.com

Nov 14 '05 #143

Paul Hsieh wrote:
(I'm snipping superfluous bits, I propose we concentrate on the
important things and leave some other things aside, to keep this
manageable...)

Sidney Cadot wrote:
Paul Hsieh wrote:
[snip]

I think there are relatively few excuses for any professional C
programmer not to have access to the standard (and reading it once or
twice wouldn't hurt either). It's only 18 bucks in electronic form,
and you don't even have to go out to get it.
Care to take a quick survey of comp.lang.c patrons who own their own copy of
the standard?

Sure, I think you'd be surprised.

What is your excuse for not having the standard? A free draft (N869) is
also available.

My objection to your proposal can simply be restated as "all
machines have limited resources", deal with it.

Then the standard would have to specify what it means exactly by a
"resource". Standards are no places for vague terminology.

You mean vague terminologies like "stack"?

No, I meant "resource".

There doesn't have to be any mention of "stack" in what I try to
propose. Obviously, active function invocations consume memory - that's
what I would like to see formalized.

I had to look this one up -- that's one of the classic useless
non-solutions (especially if you want to call malloc more than 4
billion times, for example), which explains why I've never
committed it to memory.

We have long longs nowadays. 2^32 is not an important limit.

And long long's are portable?!?! Are you even following your own
train of thought?

You really seem a bit out of touch with current events... Since you
don't seem to have access to the C99 standard, here's the relevant quote:

6.2.5 clause #4:

"There are five standard signed integer types, designated as signed
char, short int, int, long int, and long long int. (These and other
types may be designated in several additional ways, as described in
6.7.2.) There may also be implementation-defined extended signed integer
types. The standard and extended signed integer types are collectively
called signed integer types."

(somewhat further down, the unsigned long long int is defined).

Both must be able to represent at least 64 bit values (-2^63..2^63-1,
0..2^64-1, respectively).

You still need to be able to identify process instances, which *BY
ITSELF* is not portable. Process instances identifiers would
count as something *BEYOND* just malloc/free/etc.

Why do you need to distinguish process instances?

Reread the Lamport's bakery algorithm. One of the crucial steps is
"for each process, check their number, and pick the max, then add 1".
Another is "for each process that has a lower number, wait in line
behind them." This means you have a way of knowing how many processes
there are, and how to index them. This means you have to hook out or
wrap whatever "spawn" mechanism you have in your language (or use some
other platform specific mechanism to iterate through all running
threads.)

Could you provide a reference to the description of the algorithm you're
using? I'm not saying it's incorrect or anything, it's just that I'd
like to respond in the same terminology.

[...] In your API, each call carries the heap it refers to as a
parameter.

Yes, but if two threads simultaneously call heap functions on the same
heap (heaps are not restricted to being distinct across threads) then
there is a race condition.

So we need to put a mutex around the critical section, no big deal. All
hinges now on whether the Lamport algorithm needs process id's.
Fortunately, there's no subjective element in that, we should be able to
sort that out.

I would say that writing a more powerful heap manager is possible in
principle, even in a multithreaded environment, as long as you have a
well-implemented malloc() (and friends) at your disposal.

*BZZZT!!!* Wrong.
It would be a drag to do so and it would be slow, but it is possible.

Impossible. Slow, fast or otherwise. Its just impossible.

If you can explain to me why you need to explicitly distinguish threads
in the memory manager, I'll concede the point. Go look up the source code for *ANY* multithreading supporting
malloc/free. I absolutely guarantee that every single one of them
uses some kind of platform specific exclusion object to make sure that
two simultaneous calls to the heap do not collide.
Sure. Mutexes would suffice and I contend the bakery algorithm can
provide them. I feel rather silly getting dragged in this line of
discussion (multithreading has little to do with the C execution model)
but we'll see where this leads us.
One way or another the heap is some kind of quasi-linked list of memory
entries. You should have learned from your OS classes (assuming you passed it
and/or comprehended the material)
Could you refrain from the silly ad hominems? I am trying to take you
seriously, even though you seem to deny the importance of standards, and
even though you are advocating an (in my opinion) confused "operator
introduction" feature. Maintaining a basic level of respect helps in any
discussion.

---- Proposed addition to the standard to add stack overflows:
---- Draft

a. For an active function invocation, define an upper bound for the
amount of memory in use by this function as follows (in the
following, denote "Upper Limit on Memory Usage" by "ULMU"):

And what if the local/parameters are not stored, or storable on the stack?

How do you mean? They have to be stored. And I'm presuming they're
stored on the stack.

Most microprocessors have these temporary storage elements called
"registers". They are often not part of any stack structure. Many modern
compilers will map local variables and temporaries to these "registers", thus
not requiring stack storage for them.

Not an issue, as I try to establish an upper bound. The fact that the
implementation can do better by using registers is of no concern.

What if an implementation uses multiple stacks depending on the
*type* of the parameter?

The hypothetical implementation [...]

That the ANSI C standard currently does not have any problem with ...

[...] that does this will be out of luck. Not because the proposal
wouldn't work, mind you, but because the ULMU is a very loose bound.
But hey, it beats having no bound whatsoever.

No it doesn't. Dealing with the stack is an area that clearly should be
platform specific. For example, it could be that an embedded system has
fast and slow ram. Certain data types might be best suited to being mapped
into fast RAM, whether they are local or not, but you might have a lot more
slow RAM, so you'd put all the rest of the stuff in there.

To what does "no it doesn't" refer?

How does your example invalidate the "upper bound" idea?

- For every parameter/local variable of basic type there is a
corresponding constant macro of type size_t:

You missed all the implicit temporaries required. Homework
exercise: for any fixed number n, create a function which requires
at least n implicit temporaries (not explicitely declared) to
implement.

Ah yes! So it seems. An interesting problem. Have to think about that
for a bit.

Why don't you think about the implication of the results of this
exercise with respect to your proposal while you are at it.

If no way can be found to address this, we can no longer give an upper
bound, and the proposal is dead in the water. However, I'm not conviced
it can't be handled. For one thing, your proposal to make macro's
available outside the function scope that give the functions stack usage
upper bound may come in handy.

And what about the number of alloca() calls? alloca() is a common
extension which just eats extra space off the stack dynamically --
the big advantage being that you don't have to call free or
anything like it, for it to be properly cleaned up. It will be
cleaned up upon the function's return (the simplest implementation
of a "freeall".)

Ok, we can add alloca'ed space is each function invoation, plus an
overhead constant per active alloca.

The alloca() calls can be in a loop. Knowing how much memory a set of
alloca's will perform in any function is equivalent to solving the
halting problem.

So what? Why is this relevant? Most of the time you can still give an
upper bound. In cases where you can't (few and far inbetween) you don't
have the guarantee, which is a pity; the code can no longer be written
in a way that's fully portable. It would still be much better than what
happens now (standard-compliant programs segfaulting without so much as
a hint in the standard).

Unless you allow that I'm trying to define an upper bound of total use,
which may be loose as far as I am concerned. The idiomatic use would be
to compare this to the ULMU_TOTAL to see if it fits. This may yield the
number of bytes on the smallest check for all I care. On those silly
platforms, you will grossly underestimate stack capacity now and then.
[...] Still beats the current situation.

This versus just living with runtime stack checking? I'll take the
runtime stack checking.

Then define a proposal to formalize "runtime stack checking" in
standard-like language.

[...]
My opionion is determined largely by the psychological objections (to
which you don't subscribe.) Says who? You see, unlike some people, by idea isn't driven by an
opinion. Its driven by a consideration for design. Since you *CAN'T*
add just any old operator to satisfy everyone, then how do you satisfy
the demand for more operators? Some variation on redefinable
operators or operator overloading is the logical conclusion -- just
fix its problems (since type-index functions don't really exist in C,
so you just add in a bunch of them so that you don't lose any of the
ones you have). For all I know, I might not ever use such a feature,
even if it *WERE* to be endorsed by the standard.
I fail to see the point you want to make here.

My technical problem is that I can see no way that this could be
implemented.

Well I don't know what to make of this -- you can't see a race
condition when its staring you in the face, and you can't imagine
problems with knowing the size of a function's stack usage.

Be careful not to presume things that are not there. Other than that,
make of it what you like.

[...]
There is a possibility of error caused by bad design (which isn't
trimmed to the type of mistakes we fallible humans make) in your
operator introduction scheme.

Same is true with free form function names, of course.

The risks of errors are minute, in comparison.

You haven't shown me that it can be implemented in a compiler,
even in theory. That should count for something.

I don't think there's a credible assertion that my idea is
impossible. You want me to describe a whole compiler
implementation just to support my idea?

I listed some specific objections with regards to disambiguation of
the grammer that the parser would have to be able to handle. Even without a full grammar being specified? That's a pretty good trick.
Yes, it's called "abstract thinking". It's a cousin of "thought experiment".
[smartcards]
Yes they do. That's the point of creating a pico-java assembly
language. Its so that you just have to upload a minimal java kernel to
it, and all of a sudden you have the whole world's Java libraries at your
disposal.
How much would these combined libraries consume, in terms of memory on
the smartcard? I mean, the bignum-multiply code has to reside somewhere.

[...]
Great. Now what has all this to do with defining a sensible signed <+
semantics for use in C (you know, a general purpose language, not tied
to your particular pet peeve problem).

Oh yeah my particular pet peeve problem, which only every serious CPU
company on the planet is only too willing to solve for me.

You haven't answered the question.

It all just comes down to the carry flag. The other flags are not g
necessary for bignum libraries. Maybe there's a way to transform
the OF into something that can be used in a bignum library, but I
am not aware of it.

Because, we're talking about expanding the C language here (not about
your fairy-tale operator introduction hocus-pocus). You need decent
semantics. CPU companies have laid out the decent semantics. I and others have
written bignum routines -- the carry flag is where its at as far as
this is concerned.
That's fine. Now here's a big idea: hardware design and general-purpose
language design are two different things.

[...]
OF is never the important flag. Its not relevant at all.

I guess that's why everything since the 4004 supports it. That's because OF is used for <,>,<=,>= comparisons. Outside of
this, I have never seen OF being used for any purpose. I.e., I would
surmise that the only relevant usage of OF is already encoded in the
language semantics of C (and just about any other language with
integers.) But I could be wrong -- can you name at least one other
purpose which isn't really just an encoding of <,>,<=,>= ?
Adding, subtracting, multiplying, dividing. Basically anything
arithmetic you can do that can map the result of an operation on signed
ints to a value that doesn't correspond to the mathematical
interpretation of what the result should be.
You must really consider that there's more to life than multiplying big
numbers (and that's coming from one of the few people that sees the fun
in it). We're not going to extend C with an operator to fit this (or
ANY) one purpose, however important it may be.

Carry flag comes up time and again. Just look it up on google:

http://www.google.com/search?hl=en&lr=&ie=ISO-8859-
1&safe=off&c2coff=1&q=jc+ret+mov

That seems to yield some results concerning intel assembly. Interesting,
but how does that address my point?
Off the top of my head, there's chain shifting, bit counting, and
predication/masking.
You don't have to convince me that carry flags are important. At times I
have found myself in want of a way to access them myself, from C.
However, the point is that you would have to find a good semantics, that
fits in with the rest of C, not some ad-hoc operator to make some bignum
implementors happy.

[...]
In this case, I am just copying the x86's MUL -> EDX:EAX or
AMD64's MUL -> RDX:RAX semantics.

So again (as with the carry vs. overflow), you're going to perform an
instruction that works on UNsigned integers even on signed integers.

Yes.

That doesn't rhyme well with C. There are many subtle issues with
signed/unsigned representation and operations, and the standard is
careful to find a compromise; not assuming too much on the architecture
side, while at the same time maintaining a sound semantics for both
signed and unsigned operations. Your semantics just doesn't fit in.

Pardon my French, but that's just plain stupid. Perhaps you should be a
little less confident about your own understanding of bit-twiddling...

Perhaps you should consider that there's a reason every assembly
language in existence associated signedness with particular
*OPERATIONS* not the *OPERANDS*.
Are you aware that the C standard allows for other representations than
2-complement? Assume 1-complement representation for a moment (the
standard is specifically worded to allow this).

10000001 - 00000001 yields 10000010 if '-' is a signed operator
10000001 - 00000001 yields 10000000 if '-' is an unsigned operator

Now I don't know assembly for any 1-complement architecture, but from
this it seems to me that there will have to be different signed and
unsigned versions of 'SUB'.
Having signed and non-signed versions of
every operator may be something that fit C when it was first designed, but that
doesn't make it necessarily correct for all relevant computations.

Well, if (and that's a big if) something resembling your wide-mul would
ever make it in the standard, I'd be disappointed if it didn't for
instance allow me to multiply two big signed integers. It would be messy
to only support unsigned ones.

Best regards,

Sidney

Nov 14 '05 #144

Chris Torek

>In article <MP************************@news.sf.sbcglobal.net> ,

qe*@pobox.com says...
Care to take a quick survey of comp.lang.c patrons who own their own copy of
the standard?

In article <news:MP************************@news.megapathdsl. net>
Randy Howard <ra**********@FOOmegapathdslBAR.net> writes:I bought the 18 buck pdf version quite some time ago, and more recently
the hardback version including the rationale and TC1 published by Wiley
& Sons. ...

I still have not paid for the C99 standard, but do not need it yet
as we only claim to support C89 (not even "C95", with the various
addenda). I did buy the dead-tree edition of the 1989 C standard
some time in the early 1990s, back when it cost $75 and came with
the Rationale, and before the various "TC"s and "NA"s.

(I use an out-of-date C99 draft for online "quick reference" as
needed. I really should just get the $18 PDF version, but I find
on-line editions of technical documentation far less useful, as
yet, than printed editions. PDF is not as portable as it should
be; displays are just not high enough resolution to make reading
easy; marking pages does not work as well; and they are not
comfortable to take to the restrooms. :-) )
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Nov 14 '05 #145

Ben Pfaff

Chris Torek <no****@torek.net> writes:

(I use an out-of-date C99 draft for online "quick reference" as
needed. I really should just get the $18 PDF version, but I find
on-line editions of technical documentation far less useful, as
yet, than printed editions. PDF is not as portable as it should
be; displays are just not high enough resolution to make reading
easy; marking pages does not work as well; and they are not
comfortable to take to the restrooms. :-) )

I use a C99 PDF converted to text format most of the time. The
formatting isn't perfect, but it is good enough for most purposes
most of the time, and it is easily searchable.
--
"C has its problems, but a language designed from scratch would have some too,
and we know C's problems."
--Bjarne Stroustrup

Nov 14 '05 #146

Sidney Cadot wrote:

Paul Hsieh wrote:
Sidney Cadot wrote:
Paul Hsieh wrote:
[snip]
I think there are relatively few excuses for any professional C
programmer not to have access to the standard (and reading it once
or twice wouldn't hurt either). It's only 18 bucks in electronic
form, and you don't even have to go out to get it.
Care to take a quick survey of comp.lang.c patrons who own their
own copy of the standard?

Sure, I think you'd be surprised.

Well the few people in here who have the standard have chimed in. Now
how do you supposed we convince everyone else in the group to chime
about the fact that they don't have it? (Look up the term "push poll"
if you don't understand what I mean.)
What is your excuse for not having the standard? A free draft (N869)
is also available.
The standard does not tell me how to do the job of coding. Although
it may cause the occasional portability problem, having the raw source
of the C library is surprisingly more useful. For example, please
locate for me where in the standard it says that fgets() ignores '\0'
that are consumed -- then locate where in the standard it says its the
only string function which does this. Then tell me where in the
standard it says that strtok is not re-enterable.

I had to look this one up -- that's one of the classic useless
non-solutions (especially if you want to call malloc more than 4
billion times, for example), which explains why I've never
committed it to memory.
We have long longs nowadays. 2^32 is not an important limit.

And long long's are portable?!?! Are you even following your own
train of thought?

You really seem a bit out of touch with current events... Since you
don't seem to have access to the C99 standard, here's the relevant
quote:

Excuse me?!?! *C99* is not portable.

You still need to be able to identify process instances, which *BY
ITSELF* is not portable. Process instances identifiers would
count as something *BEYOND* just malloc/free/etc.

Why do you need to distinguish process instances?

Reread the Lamport's bakery algorithm. One of the crucial steps is
"for each process, check their number, and pick the max, then add
1". Another is "for each process that has a lower number, wait in
line behind them." This means you have a way of knowing how many
processes there are, and how to index them. This means you have
to hook out or wrap whatever "spawn" mechanism you have in your
language (or use some other platform specific mechanism to iterate
through all running threads.)

Could you provide a reference to the description of the algorithm
you're using?

Typed it into google and felt lucky. Its the first hit. Notice
language like "Process i", and statements like for (j=0; j < n; j++),
where each j indexes a process id, and n is the number of processes.
I tried to think up a way of implementing that algorithm without
specifically requiring an enumeration of processes, but it seems that
that is central to the way the algorithm works. Which is only *one*
of the reasons why that algorithm is worthless.
[...] I'm not saying it's incorrect or anything, it's just that I'd
like to respond in the same terminology.
[...] In your API, each call carries the heap it refers to as a
parameter.
Yes, but if two threads simultaneously call heap functions on the same
heap (heaps are not restricted to being distinct across threads) then
there is a race condition.

So we need to put a mutex around the critical section, no big deal.

It *is* a big deal. Mutexes, and other similar multithreading objects
are not expressible is a portable way in language like C or C++.
[...] All hinges now on whether the Lamport algorithm needs process
id's. Fortunately, there's no subjective element in that, we should
be able to sort that out.
What I don't understand is how is it that *YOU* brought up the
algorithm, and yet you are unaware of its fundamental functionality.
It only took me only 30 seconds to understand it.

>I would say that writing a more powerful heap manager is possible in
>principle, even in a multithreaded environment, as long as you have a
>well-implemented malloc() (and friends) at your disposal. [...]
>It would be a drag to do so and it would be slow, but it is possible.

Impossible. Slow, fast or otherwise. Its just impossible.

If you can explain to me why you need to explicitly distinguish threads
in the memory manager, I'll concede the point.
Go look up the source code for *ANY* multithreading supporting
malloc/free. I absolutely guarantee that every single one of them
uses some kind of platform specific exclusion object to make sure that
two simultaneous calls to the heap do not collide.

Sure. Mutexes would suffice and I contend the bakery algorithm can
provide them.

Well, but that's not my contention, and I would appreciate that you not put
words in my mouth, or follow the discussion (whichever one is causing
you do write things like this) -- my contention is that Lamport's
Bakery algorithm is not portable (I mean relative to C, which doesn't
expose and multithreading functionality, even if implementations do.)
Then again, once you open it up to non-portable code, of course,
nobody will touch Lamport's Bakery algorithm with a 10 ft pole -- its
worthless when you have far superior mechanisms for creating Mutexes
at your disposal.
[...] I feel rather silly getting dragged in this line of
discussion (multithreading has little to do with the C execution
model) but we'll see where this leads us.
Then let me review the facts for you. Multithreading is inextricably
intertwined with the implementation of malloc. Thus if a compiler
vendor wants to provide a multithreading extension to C (which they
pretty much all do) discussions of implementation of malloc, or
malloc-like functions have to at least acknowledge this as a
consideration. There is no such thing as a portable mutex in pure C
that actually encodes all the expected notions of a mutex unless tied
specifically to the extension mechanisms that are used to provide the
multitasking in the first place. The inevitable conclusion is that
its impossible to make a portable library for extending the heap
functions that are multithreading safe.

One way or another the heap is some kind of quasi-linked list of
memory entries. You should have learned from your OS classes
(assuming you passed it and/or comprehended the material)

Could you refrain from the silly ad hominems? I am trying to take you
seriously, even though you seem to deny the importance of standards,
and even though you are advocating an (in my opinion) confused
"operator introduction" feature. Maintaining a basic level of
respect helps in any discussion.

Well, if you are sensitive about being admonished in public, then can
I ask that you have a little bit of intellectual honesty? I am trying
to put forward hard core analytical ideas, and I have made efforts to
distill your own proposals is a very serious way. The response is not
supposed to be "oh but you can't prove its impossible to implement".
This is like Ari Fleischer telling the anti-war sympathizers that its
up to them to prove that there are no WMDs in Iraq. If you have a
problem with what I am saying why can't you limit it to an analytical
response?

See the problem with your suggestion of using the Lamport's bakery
algorithm, is not just that it obviously doesn't satisfy the
conditions of the original problem (portably extending malloc/etc),
but that the next thing you are going to do is throw some other
algorithm at it that also won't work in some vain attempt to throw me
off yet again. You did it full well expecting that I wouldn't chase
it down and find the flaw. Are you going to do it again, or are you
going at least take my arguments more seriously?

You see its not *UP* to me to analyze bogus algorithms to prove that
you can't implement a mutex portably. That's retarded -- even if I do,
its not actual proof of anything. If you have a serious contention
against my very technical claim, then don't you think that its at
least somewhat incumbant upon your to wade through the details
yourself?

What if an implementation uses multiple stacks depending on the
*type* of the parameter?

The hypothetical implementation [...]

That the ANSI C standard currently does not have any problem with ...
[...] that does this will be out of luck. Not because the proposal
wouldn't work, mind you, but because the ULMU is a very loose bound.
But hey, it beats having no bound whatsoever.

No it doesn't. Dealing with the stack is an area that clearly
should be platform specific. For example, it could be that an
embedded system has fast and slow ram. Certain data types might
be best suited to being mapped into fast RAM, whether they are
local or not, but you might have a lot more slow RAM, so you'd put
all the rest of the stuff in there.

To what does "no it doesn't" refer?

The last statement "... it beats having no bound ..."
How does your example invalidate the "upper bound" idea?
Look, if you have two stacks, then the right way to bound it is to
have two bounds. If one of your stacks has very limited amount of
memory (fast RAM is likely to be more expensive) then in order to
properly implement a single bound (you proposal) you need to assume
the worst case and say that you only have as much stack space of the
smaller stack (otherwise it wouldn't be a correct bound.) This is
completely worthless to systems that want to expose really really deep
call stacks but fairly shallow floating point stacks, for example
(look up the Ackerman function again, and try to imagine attempts at
mathematical simplifications, to be able to output at least *some* of
the reasonable computable values in reasonable time.)

> - For every parameter/local variable of basic type there is a
> corresponding constant macro of type size_t:

You missed all the implicit temporaries required. Homework
exercise: for any fixed number n, create a function which requires
at least n implicit temporaries (not explicitely declared) to
implement.

Ah yes! So it seems. An interesting problem. Have to think about that
for a bit.

Why don't you think about the implication of the results of this
exercise with respect to your proposal while you are at it.

If no way can be found to address this, we can no longer give an upper
bound, and the proposal is dead in the water. However, I'm not conviced
it can't be handled.

I can see once again I'll be trying to prove that the are no WMDs in
Iraq ...
[...] For one thing, your proposal to make macro's available outside
the function scope that give the functions stack usage upper bound
may come in handy.
See how that worked out? You had a proposal, I found something wron
with it, so I offered something *CONSTRUCTIVE* to try to save it,
rather than pretending it would fail because of an issue that I can
work around even if you might have missed it. Its called intellectual
honesty.

And what about the number of alloca() calls? alloca() is a common
extension which just eats extra space off the stack dynamically --
the big advantage being that you don't have to call free or
anything like it, for it to be properly cleaned up. It will be
cleaned up upon the function's return (the simplest implementation
of a "freeall".)

Ok, we can add alloca'ed space is each function invocation, plus an
overhead constant per active alloca.

The alloca() calls can be in a loop. Knowing how much memory a set of
alloca's will perform in any function is equivalent to solving the
halting problem.

So what? Why is this relevant? Most of the time you can still give an
upper bound.

Perhaps you do not know the true nature of the halting problem. You
can't know if a piece of code ends, which is equivalent to knowing if
it takes one branch or another, or how many times it executes a given
code fragment. There's no shortage of open mathematical problems for
which the output of a given function is just completely unboundable,
calculatable, or estimatable (the famous "+3 on odd, /2 if even"
problem which I can't recall the details of comes to mind) with any
current knowledge.
[...] In cases where you can't (few and far inbetween) you don't
have the guarantee, which is a pity; the code can no longer be
written in a way that's fully portable. It would still be much
better than what happens now (standard-compliant programs
segfaulting without so much as a hint in the standard).
It means you are expecting compilers to come as close to solving the
halting problem as they can. And you think variable length operators
based on a grammar is an impossible, intractible problem in
practice?!?!

Unless you allow that I'm trying to define an upper bound of total
use, which may be loose as far as I am concerned. The idiomatic
use would be to compare this to the ULMU_TOTAL to see if it fits.
This may yield the number of bytes on the smallest check for all I
care. On those silly platforms, you will grossly underestimate
stack capacity now and then. [...] Still beats the current
situation.

This versus just living with runtime stack checking? I'll take the
runtime stack checking.

Then define a proposal to formalize "runtime stack checking" in
standard-like language.

The implementation may exit or take any other defined action which halts
the execution on any function call. But it may not create undefined
behaviour from the action of performing the call itself alone.

There's a proposal that's compatible with many implementations today
(if we presume that turning stack checking *off* would be an
extension, and that leaving it on is the default conforming behavior)
and should be easy to understand and implement.

[...]
My opionion is determined largely by the psychological objections (to
which you don't subscribe.)
Says who? You see, unlike some people, by idea isn't driven by an
opinion. Its driven by a consideration for design. Since you *CAN'T*
add just any old operator to satisfy everyone, then how do you satisfy
the demand for more operators? Some variation on redefinable
operators or operator overloading is the logical conclusion -- just
fix its problems (since type-index functions don't really exist in C,
so you just add in a bunch of them so that you don't lose any of the
ones you have). For all I know, I might not ever use such a feature,
even if it *WERE* to be endorsed by the standard.

I fail to see the point you want to make here.

Just that I am trying analyze problems, and you are just spouting
opinion.

My technical problem is that I can see no way that this could be
implemented.

Well I don't know what to make of this -- you can't see a race
condition when its staring you in the face, and you can't imagine
problems with knowing the size of a function's stack usage.

Be careful not to presume things that are not there. [...]

Yeah, thanks for the warning. I'll be sure to purchase some duct tape
to deal with my neighbors who might be terrorists too.

The race is there. I reduced it to the standard notions that you are
supposed to learn about in an OS class. I've explained everything you
need to know to understand its there. Any research you do into the
subject will confirm what I'm saying. But I'm supposed to worry that
I might be wrong based on your flimsy non-arguments.

There is a possibility of error caused by bad design (which isn't
trimmed to the type of mistakes we fallible humans make) in your
operator introduction scheme.

Same is true with free form function names, of course.

The risks of errors are minute, in comparison.

An opinion you will back up how ... ?

>You haven't shown me that it can be implemented in a compiler,
>even in theory. That should count for something.

I don't think there's a credible assertion that my idea is
impossible. You want me to describe a whole compiler
implementation just to support my idea?

I listed some specific objections with regards to disambiguation of
the grammer that the parser would have to be able to handle.
Even without a full grammar being specified? That's a pretty good trick.

Yes, it's called "abstract thinking". It's a cousin of "thought
experiment".

Yeah, well believing that there are parsable grammars doesn't take very much
"abstract thinking", there millions of examples.

[smartcards]
Yes they do. That's the point of creating a pico-java assembly
language. Its so that you just have to upload a minimal java kernel to
it, and all of a sudden you have the whole world's Java libraries at your
disposal.

How much would these combined libraries consume, in terms of memory on
the smartcard? I mean, the bignum-multiply code has to reside somewhere.

If you have a widening multiply, a bignum multiply is tiny. As a WAG,
maybe 30 instructions or so? Obviously the don't *include* every
library -- I imagine they need the Java GUI for a smart card. Look
I'm not going to defend them, *they* did it, not me. Go look this stuff up
yourself:

http://www.google.com/search?hl=en&ie=UTF-
8&c2coff=1&safe=off&q=Michael+Montgomery+Java+smar t+card&spell=1

> It all just comes down to the carry flag. The other flags are
> not g necessary for bignum libraries. Maybe there's a way to
> transform the OF into something that can be used in a bignum
> library, but I am not aware of it.

Because, we're talking about expanding the C language here (not
about your fairy-tale operator introduction hocus-pocus). You
need decent semantics.
CPU companies have laid out the decent semantics. I and others have
written bignum routines -- the carry flag is where its at as far as
this is concerned.

That's fine. Now here's a big idea: hardware design and general-purpose
language design are two different things.

And so which category do you think "bignum libraries" falls into? I
mean given that every platform supports it, and that in fact an
implementation (the GMP) purports to be a software library with a
portable interface up to software, supporting just about every popular
desktop/server/workstation platform under the sun.

[...]
OF is never the important flag. Its not relevant at all.
I guess that's why everything since the 4004 supports it.
That's because OF is used for <,>,<=,>= comparisons. Outside of
this, I have never seen OF being used for any purpose. I.e., I would
surmise that the only relevant usage of OF is already encoded in the
language semantics of C (and just about any other language with
integers.) But I could be wrong -- can you name at least one other
purpose which isn't really just an encoding of <,>,<=,>= ?

Adding, subtracting, multiplying, dividing. Basically anything
arithmetic you can do that can map the result of an operation on signed
ints to a value that doesn't correspond to the mathematical
interpretation of what the result should be.

No, we all understand what produces an OF, I mean, name an actual
agorithm that uses it, that can't extract the semantics from one of
<,>,<=,>=.

[...]
In this case, I am just copying the x86's MUL -> EDX:EAX or
AMD64's MUL -> RDX:RAX semantics.

So again (as with the carry vs. overflow), you're going to perform an
instruction that works on UNsigned integers even on signed integers.

Yes.

That doesn't rhyme well with C.

Well that's right -- *bignums* don't rhyme well with C. *STRINGS*
don't rhyme well with C. *BITS* don't rhyme well with C. That
doesn't mean its not worth having them.
[...] There are many subtle issues with
signed/unsigned representation and operations, and the standard is
careful to find a compromise; not assuming too much on the
architecture side, while at the same time maintaining a sound
semantics for both signed and unsigned operations. Your semantics
just doesn't fit in.

There's your deep analysis again "... just doesn't fit in ...". I
just can't argue with that. Wow, I guess I am just wrong.

Pardon my French, but that's just plain stupid. Perhaps you should
be a little less confident about your own understanding of
bit-twiddling...
Perhaps you should consider that there's a reason every assembly
language in existence associated signedness with particular
*OPERATIONS* not the *OPERANDS*.

Are you aware that the C standard allows for other representations
than 2-complement? Assume 1-complement representation for a moment
(the standard is specifically worded to allow this).

Yes, but the semantics of what a carry is, is not crucially tied to
this. Even if you use ones complement, or something else, you can
still *compute* the carry, even if you don't automatically have it from
the built-in representation.
Having signed and non-signed versions of every operator may be
something that fit C when it was first designed, but that doesn't
make it necessarily correct for all relevant computations.

Well, if (and that's a big if) something resembling your wide-mul
would ever make it in the standard, I'd be disappointed if it didn't
for instance allow me to multiply two big signed integers. It would
be messy to only support unsigned ones.

But I didn't precisely say that. In fact losing the (un)signedness
semantics is really something you need for the carry flag idea. The
"almost like widening multiplies", that exist in C retain the sign as
you would expect, and obviously you probably wouldn't want to change
that in an extension to C.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #147

pete

Paul Hsieh wrote:

Sidney Cadot wrote:
Paul Hsieh wrote:
Sidney Cadot wrote:
>Paul Hsieh wrote:
> [snip]
>I think there are relatively few excuses for any professional C
>programmer not to have access to the standard (and reading it once
>or twice wouldn't hurt either). It's only 18 bucks in electronic
>form, and you don't even have to go out to get it.

Care to take a quick survey of comp.lang.c patrons who own their
own copy of the standard?
Sure, I think you'd be surprised.

Well the few people in here who have the standard have chimed in. Now
how do you supposed we convince everyone else in the group to chime
about the fact that they don't have it? (Look up the term "push poll"
if you don't understand what I mean.)
What is your excuse for not having the standard? A free draft (N869)
is also available.

That document gets referenced here from time to time.

Searched Groups for N869 group:comp.lang.c.*
Results 1 - 10 of about 2,200. Search took 1.30 seconds.

The standard does not tell me how to do the job of coding.
Although
it may cause the occasional portability problem, having the raw source
of the C library is surprisingly more useful. For example, please
locate for me where in the standard it says that fgets() ignores '\0'
that are consumed -- then locate where in the standard it says its the
only string function which does this. Then tell me where in the
standard it says that strtok is not re-enterable.

N869
7.1.4 Use of library functions
[#4] The functions in the standard library are not
guaranteed to be reentrant and may modify objects with
static storage duration.

--
pete

Nov 14 '05 #148

Richard Heathfield

Paul Hsieh wrote:

Sidney Cadot wrote:
Paul Hsieh wrote:
>
> Care to take a quick survey of comp.lang.c patrons who own their
> own copy of the standard?

Sure, I think you'd be surprised.

Well the few people in here who have the standard have chimed in.

Don't be ridiculous.
--
Richard Heathfield : bi****@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton

Nov 14 '05 #149