473,890 Members | 2,051 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Is C99 the final C?

I was just thinking about this, specifically wondering if there's any
features that the C specification currently lacks, and which may be
included in some future standardization .

Of course, I speak only of features in the spirit of C; something like
object-orientation, though a nice feature, does not belong in C.
Something like being able to #define a #define would be very handy,
though, e.g:

#define DECLARE_FOO(bar ) #define FOO_bar_SOMETHI NG \
#define FOO_bar_SOMETHI NG_ELSE

I'm not sure whether the features of cpp are even included in the C
standard though (and GCC has definitely taken quite a nonstandard approach
with regards to certain token expansions and whatnot), but that's one area
of improvement I see.

I would also like to see something along the lines of C++ templating,
except without the really kludgy implementation that the C++ folks decided
to go to ( and without the OOP ).

.... Mike pauses for the sound of a thousand *plonks*

Templates save a lot of time when it comes to commonly-used data
structures, and as they are entirely implemented at compile-time and don't
include, by their definition, OOP (although they can be well suited to
it), I think they would be a nice addition and in the spirit of C.

Your thoughts? I'm sure there's some vitriol coming my way but I'm
prepared 8)

--
Mike's Patented Blocklist; compile with gcc:

i=0;o(a){printf ("%u",i>>8*a&25 5);if(a){printf (".");o(--a);}}
main(){do{o(3); puts("");}while (++i);}

Nov 13 '05
193 9704
Paul Hsieh wrote:
Look, nobody uses K&R-style function declarations anymore. The reason is
because the ANSI standard obsoleted them, and everyone picked up the ANSI
standard. That only happened because *EVERYONE* moved forward and picked up
the ANSI standard. One vendor is irrelevant.
Ok. Can't speak for MW, but I think that by the end of 2007 we'll have
near-perfect C99 compilers from GNU, Sun, Microsoft, and Intel. Odds now
down to at 5:1; you're in?


With the *and* in there? Sure. Since Microsoft alone will not do it
(otherwise why not back port MFC to C?), the GNU people may decide that the
last 10% just isn't worth it, Sun has two other languages to worry about that
take precidence in terms of development resources, and once C++0x emerges, C99
development by any of these vendors will almost certainly be halted. I'd have
to be wrong on all of these guys to lose this.


That's why the odds are down to 1:5 :-) Four years is quite a long time...
Ok. I agree with you on extending the capabilities of the preprocessor
in general, although I can't come up with actual things I miss in
everyday programming. I run into this every now and then. For example, I was recently trying to
solve the following problem: Create a directed graph on n points, each with an
out degree of d (I am concerned with very small d, like 2), such that the path
length between any pair of points is minimized.

Turns out that this is a problem whose complexity grows super-exponentially
with very small values of n. Despite my best efforts, I don't know the answer
for n=11, d=2 for example (I know its either 3 or 4, but I can't prove that its
not 3). More startling is the possibility that n=12,d=2 might have a smaller
latency (but I don't know that either)!
I used to do programming contests in university (both as a participant
and as a judge later on), so I quite like this sort of problem. Care to
provide some background (via regular e-mail), I think I'd like to think
about this for a bit.
Anyhow, the point is that the only way to have a chance to squeeze enough
computational juice out of my PC to solve this, I had to hard code huge amounts
of the code, and use a lot of bit twiddling tricks just for each special case.
I used the preprocessor macros as much as possible to make my code manageable,
but in order to change n, I actually have to *modify* the code by adding and
changing *n* lines of code. There's not much I can do about this, I would have
no chance to solve this otherwise.
Ok, that's a case. But wouldn't it have been better then just to make a
program generator (written in C or something else)? Or perhaps use the
m4 macro processor?
I'm saying that trying to fix C's intrinsic problems shouldn't start or end
with some kind of resolution of call stack issues. Anyone who understands
machine architecture will not be surprised about call stack depth limitations.


It's the task of a standard to spell these out, I think.
There are far more pressing problems in the language that one would like to
fix.


Yes, but most things that relate to the "encouragin g of writing
extremely unsound and poor code", as you describe C, would be better
fixed by using another language. A lot of the inherently unsafe things
in C are sometimes needed, when doing low-level stuff.

Why is UB in isgraph(-1) needed? Why is UB in gets() needed? Why is the fact
that fgets() skips over '\0' characters needed? Why is a non-portable right
shift on signed integers needed (especially considering the one on unsigned
*is* portable)?


These are side cases, that are at least mentioned in the standard. No
big deal, IMHO. This is much more of a big deal:

int count_nodes(str uct Tree *t)
{
return t ? count_nodes(t->left)+count_no des(t->right)
: 0;
}

Things like this are ubiquitous in many kinds of code. Trees with a big
depth are not uncommon. How can I ever claim such a function is portable
if I have no guarantee whatsoever that it will work on some architectures?
[powerful heap manager]

>But a third party library can't do this portably.

I don't see why not?

Explain to me how you implement malloc() in a *multithreaded* environment
portably. You could claim that C doesn't support multithreading, but I highly
doubt your going to convince any vendor that they should shut off their
multithreadi ng support based on this argument.
Now your shifting the goalposts. How so?
Since in your original statement there was no mention of multithreading.
Allow me to refresh your memory:

"But a third party library can't do this portably. Its actual useful
functionality that you just can't get from the C language, and there's
no way to reliably map such functionality to the C language itself.
One is forced to know the details of the underlying platform to
implement such things. Its something that really *should* be in the
language." [Paul Hsieh, 4 posts up]

See?
The two are related. Writing a heap manager requires that you are
aware of multitasking considerations.
Nonsense. There's many real hardware out there where multitasking is a
non-issue.
If I want to extend the heap manager, I have to solve them each one by one
in different ways on different platforms
Nonsense. Unless you need support for something inherently
platform-specific like multithreading.
And, of course, what sort of expectation of portability would an end user have
knowing that the library itself had to be carefully hand ported?
"None whatsoever", I suspect the answer to this rhetorical question
would be? I still don't see a problem writing a super-duper heap manager
on top of malloc().
Compare this to the string library that I wrote (http://bstring.sf.net/). Its
totally portable. Although I don't have access to MAC OS X, and don't use gnu
C on Linux, in order to test to make sure, I know for a fact that end users
from both platforms have downloaded and have/are actively using it. I know it
works in 16 bit and 32 bit DOS/Windows environments, etc., etc. Its totally
portable, in the realworld sense of portable (semantically, as well as
syntactically.) The point is that the end users know that there is absolutely
0 risk of losing portability or having porting problems because of the use of
this library. Third party tools for advanced heaps makes little sense. It would only be
worth consideration if it were actively ported to many platforms -- which
increases its cost. I.e., for whatever set of platforms I am considering using
such a library, I am paying for the development cost of every playform that it
would be ported to in the price of using it. Its also highly useless for new
hardware platforms which are in development. Even having access to the source
is of lesser value if there are platform specifics in each instance that
requires munging just to port it. The point is, if the C standard were simply to add this functionality straight
into the library, then it would be each compiler vendor's responsibility to add
this functionality into the compiler.
Sure but then the compiler would be more expensive to make, and I would
have to pay more $$$ for a feature I don't need.
And the functionality would be
inherently portable as a result. The requirement of multitasking support would
then be pushed back into the vendors lap -- i.e., the people who have
introduced this non-portable feature in the first place.
Adding multithreading support to a language is a can of worms. I don't
think you quite know what you're suggesting here. Many difficult
semantic issues to resolve. The fun thing about C is that it's simple.
By dictating its existence in
the library, it would put the responsibility of making it work right in the
hands of the vendor without affecting the C standards stance on not
acknowledgin g the need for multithreading.
Huh? The C standard describes the library.
Obviously, you cannot write a multithreading heap manager portably if
there is no portable standard on multithreading (doh). If you need this
you can always presume POSIX, and be on your way. You do not understand. The *IMPLEMENTATION * needs to be multithreading aware.
Perhaps I don't understand. The implementation. .. of what?

From a programmer's point of view, the multithreading support is completely
transparent.
If you know a way of doing that, you'll make big headlines in the
computer science world. I have yet to see a multithreading model that is
not either severely limited in what it can do or hard to use.
This has no end user impact from a POSIX specification point of view.
Sorry, I don't understand this remark.
See my multithreading comment above. Also, efficient heaps are usually
written with a flat view of memory in mind. This kind of is impossible in
non-flat memory architectures (like segmented architectures.)


What does the latter remark have to do with C's suitability for doing it?

I am saying that I don't disagree with you, but I think you are missing the
point. By simply adding the features/function into C, that would make it
defacto portable from the point of view that matters -- programmers of the C
language.
Well, let's see your proposal then. How would you weave threads in the C
language portably? No handwaving please, you will have to come up with
rock-solid semantics.
For most flat-memory architectures, its actually very straightforward to add in
all the features that I am requesting. I know this because I've written my own
heap manager, which of course uses platform specific behaviour for the one
platform I am interested in. (It only gets complicated for platforms with
unusual memory, like segmented architectures, which have correspondingly
complicated heap managers today.) This is in stark contrast with the
incredibly high bar of platform-specific complications set by trying to do this
outside of the language.
I'm eagerly awaiting your proposal that shows how to do all this in a
platform independent way.
>My proposal allows the programmer to decide what is or is not useful them.

I'm all for that.

Well, I'm a programmer, and I don't care about binary output -- how does your
proposal help me decide what I think is useful to me?


It already did, it seems - You just stated your decision, with regard to
binary output. Fine by me.

Not fine by me. Because some other programmer who's code I have to look at
will use it, and I won't have any idea what it is.


Don't worry, you can look it up in the man pages. Unlike your
user-defined operators... :-)
Hey, its not me -- apparently its people like you who wants more operators.


Just a dozen or so! But _standardized_ ones.

And you don't see the littlest problem with this proposal? If you add a dozen,
you'd better be sure that they are last dozen that anyone could possibly want
to add to the language. Their value add would have be worth the pain of having
everyone learn about 12 new symbols.


Well, lets start with ?< ?> and ?: for now; ?: instead of |||, and I
drop the &&& requests since it is much less useful. It's already in gcc
so it's implementable. Give everybody 10 years to get used to it, then
we may add a couple more.
My point is that no matter what operators get added to the C language, you'll
never satisfy everyone's appetites. People will just want more and more,
though almost nobody will want all of what could be added.

My solution solves the problem once and for all. You have all the operators
you want, with whatever semantics you want.


That's too much freedom for my taste. If I would want this kind of
thing, I would yank out lex and yacc and code my own language. Well how is that different from adding just &&& and |||? If you *REALLY
REALLY* want them, then why don't you yank out lex and yacc and code up a new
language?
Because I want to use them in C programs.
I don't know if you've ever experienced the mispleasure of having to
maintain code that's not written by yourself, but it's difficult enough
as it is.


Do it all the time.
[...] Adding new operators might be interesting from a theoretical
perspective , but it surely is a bad idea from a broader software
engineering perspective.


Its no worse than trying to add in &&& or ||| today.


I disagree. Let's leave it at that, shall we?
If you see one of these new operators, you can just view it like you view an
unfamillia r function -- you'll look up its definition obviously.


There is an important difference: functions have a "name" that has a
mnemonic function.


But the name may be misleading -- as is the case, more often than not, just
reflecting the thought of the original instance by the original programmer who
maybe cut and paste it from somewhere else.


Yes, well, you can practice bad coding style in any language. Names at
least have a fighting chance of being self-explanatory, which is not
true for operators.
[...] Operators are just a bunch of pixels with no link to
anything else. It's only by a lot of repetition that you get used to
weird things like '<' and '>'. I don't know about you, But I used to do
Pascal before I switched to C. It took me quite some time before I got
used to "!=". And how long will it take the rest of us to get used to your weird &&& or |||
operators?
A couple of weeks?
What makes your code readable is adherence to an agreed upon coding

> standard that exists outside of what the language defines.


There are several such standards for identifier names. No such standard
exists for operator names, except: use familiar ones; preferably, steal
them from other languages. Sounds like a reasonable convention to me. How about: All new operators must
be defined in a central module named ----. Or: Only these new operators may be
added as defined by ... yada, yada, yada.
More handwaving... Please humour me, and spell out a reasonable
convention for your operator-introduction.
The coding standards are just different.
I can't tell; I haven't seen a coding standard for operator introduction
yet.
[...] The common denominator of all the identifier
standards is: "use meaningful names". I maintain that there is no
parallel for operators; there's no such thing as a "meaningful "
operator, except when you have been drilled to know their meaning. Your
proposal is in direct collision with this rather important fact of how
the human mind seems to work. Just like freeform variable names, there is the same incumberance of the
programmers managing the meaning of the symbols for more generic operators.
You have not made a sufficient case to convince me that there is a real
difference between the two.
....Have you ever laid eyes on a moderately complicated APL program,
without knowing APL? Or perhaps a Forth program, without knowing Forth?
That would give you some clue as to what a program looks like when it
doesn't use a lot of names, but does use a lot of unknown symbols. The
effect is similar to what written Arabic looks like to people used to
the Roman alphabet, and vice versa.
<<< and @ are nice though. BTW -- odd how you actually thought that these were "nice". Do you think you
would have a hard to remembering them? Would it wrankle on your brain because
they were odd and unfamilliar operators that are new to the language? Would
you have a tendancy to write abusive code because of the existence of these new
operators?
The meanings you gave for these rang some bells with my programming
experience. It leads me to believe you have encountered similar
situations, which gives you some bonus points for credibility in my book
:-) Seriously, I don't think such operators would be a good idea, but
I'd welcome library fuctions (the old fashioned ones, with names) for them.
Do you think perhaps its possible to use an arbitrarily extendable operator
mechanism in order to *clarify* or make code actually more maintainable?
A resounding 'no'. It's an interesting thought, but for me it falls
short on technical and psychological grounds, as I previously tried to
explain.
In two values of the widest type -- just like how just about every
microprocess or which has a multiply does it:

high *% low = a * b;


Hmmm. I hate to do this again, but could you provide semantics? Just to
keep things manageable, I'd be happy to see what happens if high, low,
a, and b are any possible combinations of bit-widths and signedness.
Could you clearly define the meaning of this? a, b, high and low must be integers.
I'm with you so far ... ;-)
The signedness of the result of a * b (as if it were not widened) dictates the result signedness.
Ok.... This would fit, I'll give you that.
Coercion will happen as necessary when storing to high and low.
Ok. Let's call the "as if it were not widened" result X; X=a*b (in a
mathematical sense).

Now before coercion can begin, we'll have to split X; that's the crucial
step of the semantics; pseudo-code:

high = coerce( high_order_valu e(X) )
low = coerce( low_order_value (X) )

Now all we need is a good definition of high_order_valu e() and
low_order_value (), based on the signedness and bit-width of X, and
possibly the signedness and bit-width of high/low. How would they look?

The whole expression will have a side effect of returning high.

I'd formulate this as "have a value", otherwise it looks ok. I'm curious
how you are going to define the high_order_valu e and low_order_value
semantics.
Well, I don't know if these dozen-or-so big-number 'powermod' operations
that are needed to establish an SSL connection are such a big deal as
you make it.

Its not me -- its Intel, IBM, Motorola, Sun and AMD who seem to be obsessed
with these instructions.
Well they "provide" them, if that's what you mean.
I don't see them working the committees to get these supported in
non-assembly languages. That's because they don't care about how the it goes down once it hits
software. Someone has to pay something for a platform specific library? -- who
cares, as long as they sell their hardware. These same people didn't get
together to define the C standard did they? Why would they bother with an
extension just for widening multiplies?
Ok. So your point was, in the first place.... ?
Instead the hardware people waste their time on trivial little specifications
like IEEE-754, which the C standards idiots don't bother to look at until 15
years later.
You shouldn't trivialize IEEE-754, it's one hell of a standard, probably
one of the best I have ever seen. And as you are probably well aware,
there's a perfectly good reason why its only now that the C standard
committee is looking at it (only now hardware supporting it is widely
available). I really don't see your point here.
[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this. The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.
What's so bad about using third-party libraries?
Of course Amazon, Yahoo and Ebay and most banks are
kind of obsessed with them too, even if they don't know it.


I think you would find that bignum operations are a small part of the
load on e-commerce servers.


According to a paper by Intel, widening multiply accounts for something like
30% of the load on typical e-commerce transactions (typing in your credit card
over the net in a way that can't be snooped.)


Reference, please.

"load on typical e-commerce transaction" != "load on e-commerce server".

If the transaction is simply setup session/send credit-card
information/close session, this could be right. However, a jiffy ago we
were talking SSL.
One single assembly instruction (one *bundle* on Itanium) holds a whole server
down for 30% of its computation,
Again, you're equating server load with transaction load.
versus the total 100K line e-commerce software required to the do the rest.
That's why HW manufacturers are keen on the number of transistors they spend
making this one opeartion reasonably fast.
I'm happy they do, really. I love speed of third-party bignum libraries
nowadays.
[...] All RSA-based protocols just do a small
amount of bignum work at session-establishment time to agree to a key
for a shared-secret algorithm.


This is only useful for much larger secure transations like ssh, or an
encrypted phone call or something. E-commerce is a much smaller, one shot
transaction, where the RSA computation dominates.


I'm sure there are situations where this arises, although it's not
likely to happen when I order something with my credit card. More
inter-bank kind of stuff.

Now why would you not simply use a library for this?
>>Many languages exists where this is possible, they are called
>>"assembly ". There is no way that you could come up with a well-defined
>>semanti cs for this.

>carry +< var = a + b;

It looks cute, I'll give you that. Could you please provide semantics?
It may be a lot less self evident than you think.

How about:

- carry is set to either 1 or 0, depending on whether or not a + b overflows
(just follow the 2s complement rules of one of a or b is negative.)


Hang on, are we talking about "overflow" or "carry" here? These are two
different things with signed numbers.

What happens if a is signed and b is unsigned? My intend was for the operation to follow the semantics of the x86 ADC assembly
instruction.
Fun thing is: it handles both signed and unsigned numbers. Not-so-funny
is the fact that the x86 supports (as do most microprocessors ) both an
overflow flag and a carry flag. So, armed with this, please elaborate on
the meaning.
The point is that this instruction is known to be proper for
doing correct bignum additions.
It sure is... There are libraries containing hand-written assembly that
do fine :-)
- var is set to the result of the addition; the remainder if a carry occurs.


What happens if the signedness of var, a, and b are not equal? It just behaves like the ADC x86 assembly instruction, the details of which I
will not regurgitate here.
So on adding a signed plus unsigned.... The carry gets a value "as if"
the signed value was actually unsigned? On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semaantics.
What happens if the bit-widths of var, a, and b are not equal? The bit-widths would be converted as if the (a + b) operation were happening in
isolation, to match C language semantics.
Ok, so if either is signed, they will both be signed, and you get back a
useless carry bit. Doesn't sound very good to me.
- The whole expression (if you put the whole thing in parenthesese) returns
the result of carry.


.... So this would presume the actual expression is: "+< var = a + b" .
There's no need to introduce a mandatory "carry" variable, then.

True. Is this a problem? Perhaps you would like it to return var as a side
effect instead to avoid this redundancy? I don't have that strong of a feeling
on it.


No, carry/overflow will do fine, it's what you're most interested in,
most of the time.
In fact, if is were only interested in the carry, I'd be out of luck:
You could just omit it as a degenerate form:

+< = a + b
Two operators sitting next to each other. I think it's ugly, but there
you go.
Basically, this is a C-esque syntax for a tuple assignment which
unfortunate ly is lacking in C:

(carry, value) = a+b Yeah see, but the problem is that this encompasses existing C-language forms.
For all I know this might be legal C syntax already (I wouldn't know, I just
don't use the "," operator in this way) in which case we are kind of already
dead with backwards compatibility.
Unfortunately, there is no tuple assignment in C. It was just a piece of
pseudo-code to elucidate the meaning.
There's also nothing in that syntax to
indicate some new thing was happening that is capturing the carry.
Suppose we have tuples, we could write (c,v) = a <+ v, where the <+
operator is defined to return a tuple expression. Now tuples, that would
be _really_ nice. Finally I could swap with (a,b)=(b,a).
Ah, I see you've never implemented a non-table-driven CRC or a binary
greatest common divisor algorithm.

You can find a binary gcd algorithm that I wrote here:

http://www.pobox.com/~qed/32bprim.c


That's not the "binary GCD algorithm", that's just Knuths version that
avoids modulos. Below is a binary GCD.


Sorry, a previous version that I never put out on the web used the binary
algorithm. I tested Knuths as much faster and thus updated it, and forgot that
I had done this.


In assembly they are quite evenly matched :-)
>The specific operations I am citing make a *HUGE* difference and have billion
>dollar price tags associated with them.

These numbers you made up from thin air, no? otherwise, I'd welcome a
reference .

Widening multpilies cost transistor on the CPU. The hardware algorithms are
variations of your basic public school multiply algorithm -- so it takes n^2
transistor s to perform the complete operation, where n is the largest bit
word that the machine accepts for the multiplier. If the multiply were not
widened they could save half of those transistors. So multiply those extra
transistor s by the number of CPUs shipped with a widening multipliy (PPC,
x86s, Alphas, UltraSparcs, ... etc) and you easily end up in the billion
dollar range.


This is probably the most elaborate version of "yes, I made these
numbers up from thin air" I've ever came across :-) But I didn't.
Oh, but you did. Billion is 1,000,000,000 which is quite a huge number.
The only way we humans can handle this is by doing some calculations...
It doesn't help to just throw numbers of this magnitude around without a
reference.
I used to work with one of these companies. People spend time
and consideration on this one instruction. You could just feel the impact that
this one instruction was going to have, and the considerations for the cost of
its implementation.
Sure, it's an important instruction.
I could easily see a quarter million just in design
effort, then some quarter million in testing, not to mention the cost of the
extra die area once it shipped -- and this is inside of *ONE* of these
companies, for *ONE* chip generation.
We're still about 999,500,000 USD short of a billion.
For example, inside of Intel, they decided that they were going to reuse their
floating point multiplier for their widening integer add for Itanium. But that
meant that the multiplier had to be able to do 128 bit multiplies (as opposed
to 82 bits, which is all the Itanium would have apparently needed) and
couldn't run a floating point and integer multiply at the same time. This has
non-trivial layout and design impact on the chip.
Sure does. I estimate the cost of this design work would be in the range
of 10^6 dollars, but not 10^9 dollars.
>I understand the need for the C language standard to be applicable to as
>many platforms as possible. But unlike some right shift detail that you
>are talking about, the widening multiply hardware actually *IS* deployed
>everywhere .


Yup. And it is used too. From machine language.

And *only* machine language. That's the point.


....Which is fine, as far as I am concerned.
Sure is. Several good big-number libraries are available that have
processor-dependent machine code to do just this.

And that's the problem. They have to be hand written in assembly. Consider
just the SWOX Gnu multiprecision library. When the Itanium was introduced,
Intel promised that it would be great for e-commerce.


Correction: the Intel marketing department promised that it would be
great for e-commerce. I'll be sure to go track down the guy who gave an 2-hour long presentation
showing the guts of a clever 56/64 bit carry avoiding bignum multiply algorithm
that Intel was pushing for Itanium and SSE-2, that he's really just a marketing
guy. Intel's claims were real -- they had working code in-house.
Sounds like a technical guy to me, who was given the nice task of
thinking about all this. Probably even had a sheet about "the importance
for e-commerce" as a rationale for all this work. That's all perfectly
fine, except that I still fail to see why it is so important to code
this kind of high-performance code in C instead of assembly.
So they projected performance results for
the Itanium without having code available to do what they claim. So people
who wanted to consider using an Itanium system based on its performance for
e-commerce were stuck -- they had no code, and had to believe Intel's claims,
or SWOX's as to what the performance would be.
It sounds like Intel should've contributed some hand-written assembly to
SWOX, then.
The only thing your example shows is that a marketing angle sometimes
doesn't rhyme well with technical realities. No, it shows that without having a proper path, technology can be bogged down
by inane weaknesses in standards. Intel had its *C* compilers ready to go for
the Itanium *LONG* before any of this happened. Even to this day, we are
*STILL* waiting for the code to make it into GMP:

http://www.swox.com/gmp/gmp-speed.html
The grey bars indicate where they *think* the performance will be (because they
can't get their hands on the platform, or because they are still hacking on the
code) and the pink bars are actual delivered performance. So is Itanium really
fast or really slow at this? From that chart its impossible to tell for sure.
It sounds like Intel should contribute some hand-written assembly to
SWOX, then.
OTOH, if instead, the C language had exposed a carry propogating add, and a
widening multiply in the language, then it would just be up to the Intel
*compiler* people to figure out how to make sure the widening multiply was
used optimally, and the SWOX/GMP people would just do a recompile for baseline
results at least.


I would guess that Intel, being both a compiler maker and the IA64
manufacture r, could have introduced a macro widemul(hr,lr,a ,b) to do
this, and help the SWOX guys out a bit?

They could have. But what kind of relationship do you think a proprietary
company like Intel has with a bunch of GPL geeks?


Not a very good one, it seems. Perhaps some PHB should reconsider. These
"GPL geeks" have delivered the gcc compiler, and all. But perhaps they
are also trying to sell their own compiler. Oh, the complexities of
managing... As a technical guy, I say they should just contribute to GMP
and don't make a lot of fuss about it. Perhaps even spin it that they
are pro open-source.
I don't know SWOX; what do they use for bignum multiplication?
Karatsuba's algorithm?

I think they have an option for that. But from my recollection of having
looked into this, by the time Karatsuba is useful, more advanced methods like
Toom Cook or straight to FFTs become applicable as well.


I think Karatsuba is best for the ranges dealt with by RSA.

Best regards,

Sidney

Nov 13 '05 #131
Sidney Cadot <si****@jigsaw. nl> wrote:
Arthur J. O'Dwyer wrote:
On Sat, 6 Dec 2003, Sidney Cadot wrote:
Paul Hsieh wrote:
aj*@nospam.a ndrew.cmu.edu says...
[...]
So would adding &&& and |||. Remember that my premise is that if one is
motivated to add one of those, why not add in something more general?

Bad comparison. Support for &&& and ||| would be infinitely easier to
add than support for what you propose.
Good comparison. Support for &&& and ||| is exactly as likely to
break existing tools, and is exactly as likely to make it into a
future version of C. [There may not be a firm cause-and-effect there,
but it is correlated.]


Yes, &&& and ||| would break existing tools, but there is precedent for
that (introduction of // comments).


// was added in because so many compilers had already supported them
anyway. This was in keeping with the C standards committee "codifying
standard practices" nonsense. It was already there, all over the
place.
[...] The not unimportant difference is
that adding support for &&& and ||| would be rather trivial, while
support for Paul's proposal is a complete nightmare.

As to the likelihood of either feature making it into the C standard: I
disagree. The chances of ||| and &&& being added is many orders of
magnitudes greater than that of the the "operator introduction" feature
championed by Paul. Still very close to zero, of course. One has to
approach probabilities of this magnitude logarthmically :-)


Probably true, but your suggestion is also infinitely less ambitious,
and infinitely less useful. Your suggestion is like the complex
number of variable length macros thing in C99. They are interesting
additions which provide some kind of epsilon more functionality, but
which don't compensate with the fact that you are leaving all C89
compilers behind.

My suggestion is more like "restrict" in C99 which adds functionality
that just is not in the language in any way shape or form. Some
people would use it no matter what -- because it adds something so
totally powerful in terms of functionality that its worth losing the
compatibility with C89.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Nov 13 '05 #132
In article <br**********@n ews.tudelft.nl> , si****@jigsaw.n l says...
Paul Hsieh wrote:
I run into this every now and then. For example, I was recently trying to
solve the following problem: Create a directed graph on n points, each with an
out degree of d (I am concerned with very small d, like 2), such that the path
length between any pair of points is minimized.

Turns out that this is a problem whose complexity grows super-exponentially
with very small values of n. Despite my best efforts, I don't know the answer
for n=11, d=2 for example (I know its either 3 or 4, but I can't prove that its
not 3). More startling is the possibility that n=12,d=2 might have a smaller
latency (but I don't know that either)!
I used to do programming contests in university (both as a participant
and as a judge later on), so I quite like this sort of problem. Care to
provide some background (via regular e-mail), I think I'd like to think
about this for a bit.


This isn't a rigged contest problem. This is a real world problem. Imagine a
*SERVERLESS* online game topology. How do you communicate "game-moves" in
realtime between players in a way that is most efficient and with the least
perceived lag, given a limited amount of bandwidth? Solve the above problem,
and you have a solution for this problem. (Actually for games you can crank d
a little higher than 2 -- the exact problem that I am looking at is different,
and I am loathe to increase d above 2.)

You can think about this problem all you like. If you want to know if I am
fully of it or not, see if you can get above n=10,d=2.
Anyhow, the point is that the only way to have a chance to squeeze enough
computational juice out of my PC to solve this, I had to hard code huge amounts
of the code, and use a lot of bit twiddling tricks just for each special case.
I used the preprocessor macros as much as possible to make my code manageable,
but in order to change n, I actually have to *modify* the code by adding and
changing *n* lines of code. There's not much I can do about this, I would have
no chance to solve this otherwise.


Ok, that's a case. But wouldn't it have been better then just to make a
program generator (written in C or something else)? Or perhaps use the
m4 macro processor?


Tell me how you do this in a way that's compatible with testing it against 3
different compilers (one of the *OTHER* techniques I use to go after
performance ...) two of which are very unfriendly to being commandline driven,
while still making sense from a debugging point of view?
I'm saying that trying to fix C's intrinsic problems shouldn't start or end
with some kind of resolution of call stack issues. Anyone who understands
machine architecture will not be surprised about call stack depth limitations.

It's the task of a standard to spell these out, I think.

There are far more pressing problems in the language that one would like to
fix.

Yes, but most things that relate to the "encouragin g of writing
extremely unsound and poor code", as you describe C, would be better
fixed by using another language. A lot of the inherently unsafe things
in C are sometimes needed, when doing low-level stuff.


Why is UB in isgraph(-1) needed? Why is UB in gets() needed? Why is the fact
that fgets() skips over '\0' characters needed? Why is a non-portable right
shift on signed integers needed (especially considering the one on unsigned
*is* portable)?


These are side cases, that are at least mentioned in the standard. No
big deal, IMHO.


Its a big deal to me. I don't own any material on the C language that
specifies all of this. I actually go reference source code for a C library I
happen to have access to. But, of course, these sources and a real world
programmer's sense leads me to the very wrong conclusion that isgraph(-1) just
returns false.
[...] This is much more of a big deal:

int count_nodes(str uct Tree *t)
{
return t ? count_nodes(t->left)+count_no des(t->right)
: 0;
}

Things like this are ubiquitous in many kinds of code.
Uh ... dude, that just looks like an expensive way of writing:

int count_nodes (struct Tree *t) {
return 0;
}

I don't suppose you did very well on these programming contests did you? Btw,
weren't you railing against trigraphs?
[...] Trees with a big depth are not uncommon.
Assuming you actually wanted to count the *NODES* of the tree, and were a
little concerned about performance try this:

static int count_nodes_inn er (const struct Tree *t) {
return (t->left ? (1 + count_nodes_inn er (t->left )) : 0) +
(t->right ? (1 + count_nodes_inn er (t->right)) : 0);
}

int count_nodes (const struct Tree *t) {
return t ? count_nodes_inn ter (t) : 0;
}

as it avoids the leaf node tail calls, which will roughly double the
performance.
[...] How can I ever claim such a function is portable if I have no
guarantee whatsoever that it will work on some architectures?
Huh? Are you talking about the stack thing again? Dude: malloc (65536) is
not portable by the same reasoning, and:

int fib(int n) {
if (n <= 2) return 1;
return fib(n-1) + fib(n-2);
}

isn't going to work in any relevant way in any language, for a call like
f(1000000).
>[powerful heap manager]
>
>>But a third party library can't do this portably.
>
>I don't see why not?
You simply do not understand. Malloc and multithreading are intertwined in the
implementation. If you have multithreading, they you have to lock and unlock
your heap, or use other strategies to make sure one thread isn't in the middle
of freeing while another is mallocing. If you don't, it introduces something
called a "race condition". Without exact considerations malloc and free would
simply fail to function properly in a multithreaded environment.

The point is that any extensions of the heap's semantics end up having exactly
the same problem. For example, how would one write a "freeall()" on top of
malloc? (Ignoring the fact that writing it *ON TOP* of malloc/free loses all
the potential *MASSIVE* performance advantage) Well presumably you'd have some
hidden part at the head of each allocation creating a linked list of
outstanding mallocs that were removed upon frees, and the freeall() function
would just walk through them calling free one by one. In a multithreaded
environment, once again, you'd be in race condition city. Same problem for
trying to build seperated heaps (virtual or otherwise.)

Now the C language standard says nothing about this of course, because C
doesn't say anything about multithreading at all. But that doesn't make the
problem go away -- most of the commonly used C compilers today include
multitasking extensions.

The point is that there is a *HUGE* difference between pushing such
functionality into the C standard, versus trying to obtain them from 3rd
parties. If it were put into the standard, then each vendor could modify their
heap implementation (a quite doable endeavour, since each vendor has already
solved the heap lock/unlock problem.) But via 3rd parties, the library vendor
has to learn and keep up to date with the OS primitives for each platform they
support.
Explain to me how you implement malloc() in a *multithreaded* environment
portably. You could claim that C doesn't support multithreading, but I highly
doubt your going to convince any vendor that they should shut off their
multithreadi ng support based on this argument.Now your shifting the goalposts.
How so?


Since in your original statement there was no mention of multithreading.


Mentioning the implementation malloc implicitely brings multithreading into the
discussion. That is what you don't understand.
Allow me to refresh your memory:

"But a third party library can't do this portably.
Because of the multithreading issue ...
[...] Its actual useful
functionality that you just can't get from the C language, and there's
no way to reliably map such functionality to the C language itself.
One is forced to know the details of the underlying platform to
implement such things. Its something that really *should* be in the
language." [Paul Hsieh, 4 posts up]

See?
Yes -- the implicit answer to the "WHY" question that follows each statement,
is "because of the multithreading problem". If you were aware of the issues,
I'm sure this wouldn't be quite so bewildering to you.
The two are related. Writing a heap manager requires that you are
aware of multitasking considerations.


Nonsense. There's many real hardware out there where multitasking is a
non-issue.


Obviously I meant for platforms that support multithreading -- and BTW it has
nothing to do with hardware. Multithreading is a software feature, not a
hardware one. (You're probably thinking of pre-emption, which is a totally
different thing.)
If I want to extend the heap manager, I have to solve them each one by one
> in different ways on different platforms


Nonsense. Unless you need support for something inherently
platform-specific like multithreading.


If the feature of multithreading is in the platform, then extensions to malloc
must take these into consideration. Do you understand the difference this and
what you said? malloc/calloc/realloc/free, or any of the extensions I propose
have no API, modes, or anything that exposes anything about multithreading to
the end user. Its all in the internals.
[...] I still don't see a problem writing a super-duper heap manager on top
of malloc().
Well, that much is obvious ...
The point is, if the C standard were simply to add this functionality straight
into the library, then it would be each compiler vendor's responsibility to add
this functionality into the compiler.


Sure but then the compiler would be more expensive to make, and I would
have to pay more $$$ for a feature I don't need.


I have written such a malloc myself (its almost much faster than typical heap
managers, and contains rich built-in debugging facilities) that I would be only
too happy to donate to the C standards committee on the condition that they
endorse my extensions. Its been well tested, and is easily portable to any
flat address implementations and whose multitasking mechanisms include one of
locks, mutexes (preferred) or semaphores.
And the functionality would be
inherently portable as a result. The requirement of multitasking support would
then be pushed back into the vendors lap -- i.e., the people who have
introduced this non-portable feature in the first place.


Adding multithreading support to a language is a can of worms.


Which I am not suggesting. And its not impossible -- look at the language
ErLang. The VxWorks or QNX or OSE APIs also suggests very good abstractions
for multithreading that should be reasonably applicable to any C-style
language.
[...] I don't
think you quite know what you're suggesting here. Many difficult
semantic issues to resolve. The fun thing about C is that it's simple.
I *DO* know what I am suggesting. This heap stuff is not just off the top of
my head. I've written applications where I've *needed* these features.
By dictating its existence in
the library, it would put the responsibility of making it work right in the
hands of the vendor without affecting the C standards stance on not
acknowledgin g the need for multithreading.
Huh? The C standard describes the library.


Right, but the library doesn't specify anything about multitasking. Its only
implementations that need to deal with it.
Obviously, you cannot write a multithreading heap manager portably if
there is no portable standard on multithreading (doh). If you need this
you can always presume POSIX, and be on your way.
You do not understand. The *IMPLEMENTATION * needs to be multithreading aware.


Perhaps I don't understand. The implementation. .. of what?


malloc/free/calloc/realloc or any extensions on top of them.
What does the latter remark have to do with C's suitability for doing it?
I am saying that I don't disagree with you, but I think you are missing the
point. By simply adding the features/function into C, that would make it
defacto portable from the point of view that matters -- programmers of the C
language.


Well, let's see your proposal then. How would you weave threads in the C
language portably? No handwaving please, you will have to come up with
rock-solid semantics.


I'm not proposing anything about threads. I am talking about the heap.
Extending the heap means you have to be aware of multithreading if your
platform supports it.
For most flat-memory architectures, its actually very straightforward to add in
all the features that I am requesting. I know this because I've written my own
heap manager, which of course uses platform specific behaviour for the one
platform I am interested in. (It only gets complicated for platforms with
unusual memory, like segmented architectures, which have correspondingly
complicated heap managers today.) This is in stark contrast with the
incredibly high bar of platform-specific complications set by trying to do this
outside of the language.


I'm eagerly awaiting your proposal that shows how to do all this in a
platform independent way.


The NULL heap is the system heap.

struct heap * newHeap (struct heap *);
- allocate a subheap from a given heap.
int destroyHeap (struct heap *);
- frees all allocated memory with respect to a given heap (including
sub-heaps), then destroy the heap.
int freeAllH (struct heap *);
- frees all allocated memory with respect to a given heap (including
sub-heaps), but don't destroy the heap.

void * mallocH (struct heap *, size_t);
- Like malloc with respect to a given heap.
void *callocH (size_t, size_t);
- Like calloc with respect to a given heap.
void * reallocH (struct heap *, void *, size_t);
- Like realloc with respect to a given heap.
void * expandH (struct heap *, void *, size_t);
- Like reallocH, but will return NULL instead of moving, and will never
move the pointer.
int freeH (struct heap *, void *);
- Like free with respect to a given heap.

int dgbIsAlloc (struct heap *, void *);
- Returns 1 if, to the best of the library's ability to determine, the
pointer is a properly allocated heap entry. Returns 0, if it can
determine the pointer to not be a properly allocated heap entry.

Notice how there is no mention of multithreading in anything there? Of course,
what is not seen, is that they in fact *ALL* require very exact considerations
for multithreading if the platform supports multithreading.

The value of this is that destroyHeap/freeAllH actually costs about the same as
a handful of frees (even if you've made tens of thousands of outstanding
mallocs.) And expandH can be used to in cases were a complete copy is not
necessary in order to move an allocation to a resized allocation (this feature
would improve the performance of the "Better String Library" in many cases,
that I cannot in any way capture with C's current heap semantics.) Although
dgbIsAlloc is probabilistic (though it would never return 0 on valid heap
pointer), in practice its exactness would be so close as to be a reasonable
realiable mechanism to ensure the correctness of an allocated pointer for
debugging purposes.
>>My proposal allows the programmer to decide what is or is not useful them.
>
>I'm all for that.

Well, I'm a programmer, and I don't care about binary output -- how does your
proposal help me decide what I think is useful to me?

It already did, it seems - You just stated your decision, with regard to
binary output. Fine by me.


Not fine by me. Because some other programmer who's code I have to look at
will use it, and I won't have any idea what it is.


Don't worry, you can look it up in the man pages. Unlike your
user-defined operators... :-)


You would have to look up "%b" ("%db"? -- whatever) in *UP TO DATE* man pages.
(Is *restrict* or the new *complex* type, from C99 properly documented in
*YOUR* current man pages?) My proposed plug-in mechanism would be in modules
or header files that would, one way or another, have to be in your project
somewhere. Presumably, the source would have to be available somewhere, so
there would be *NO QUESTION* as to what it does.

Same comment applies to &&&, ||| versus user defined operators.
Hey, its not me -- apparently its people like you who wants more operators.

Just a dozen or so! But _standardized_ ones.


And you don't see the littlest problem with this proposal? If you add a dozen,
you'd better be sure that they are last dozen that anyone could possibly want
to add to the language. Their value add would have be worth the pain of having
everyone learn about 12 new symbols.


Well, lets start with ?< ?> and ?: for now; ?: instead of |||, and I
drop the &&& requests since it is much less useful. It's already in gcc
so it's implementable. Give everybody 10 years to get used to it, then
we may add a couple more.


My claim is that so few people would use these operators, that the would be
defacto not supported. They would be less useful than bitfields.
My point is that no matter what operators get added to the C language, you'll
never satisfy everyone's appetites. People will just want more and more,
though almost nobody will want all of what could be added.

My solution solves the problem once and for all. You have all the operators
you want, with whatever semantics you want.

That's too much freedom for my taste. If I would want this kind of
thing, I would yank out lex and yacc and code my own language.
Well how is that different from adding just &&& and |||? If you *REALLY
REALLY* want them, then why don't you yank out lex and yacc and code up a new
language?


Because I want to use them in C programs.


But I want them *not* to be used in C programs -- unless they are fully
documented in the source. Always.
What makes your code readable is adherence to an agreed upon coding

> standard that exists outside of what the language defines.

There are several such standards for identifier names. No such standard
exists for operator names, except: use familiar ones; preferably, steal
them from other languages.
Sounds like a reasonable convention to me. How about: All new operators must
be defined in a central module named ----. Or: Only these new operators may be
added as defined by ... yada, yada, yada.


More handwaving... Please humour me, and spell out a reasonable
convention for your operator-introduction.
The coding standards are just different.


I can't tell; I haven't seen a coding standard for operator introduction
yet.


I just gave some. What's hand waving is saying something about stack limits --
you haven't even suggested a way in which that *COULD* be possible to put down
on a piece of paper. Its *OBVIOUS* what I am talking about above. You just
want something specific to have something arbitrary to disagree with. Just
pick a convention -- if something bothers you about certain operators, spell it
out in your convention.
Just like freeform variable names, there is the same incumberance of the
programmers managing the meaning of the symbols for more generic operators.
You have not made a sufficient case to convince me that there is a real
difference between the two.


...Have you ever laid eyes on a moderately complicated APL program,
without knowing APL?


No, but I have looked at the results of the annual obfuscated C code constest.
I have never suggested that you *COULDN'T* abuse the idea. But you can abuse
just about *ANYTHING* in C. You don't think adding in ||| or &&& would add to
the potential for noise in those contests? I gave a very wordy description of
how to define an operator -- its kind of hard to obfuscate something completely
while you have so many hints (a keyword like _Operator, a whole function-like
declaration, etc) as to what it is. You would have to hide it behind macros to
truly obfuscate it, but you can use macros for arbitrary obfuscation
regardless.
><<< and @ are nice though.
BTW -- odd how you actually thought that these were "nice". Do you think you
would have a hard to remembering them? Would it wrankle on your brain because
they were odd and unfamilliar operators that are new to the language? Would
you have a tendancy to write abusive code because of the existence of these new
operators?


The meanings you gave for these rang some bells with my programming
experience. It leads me to believe you have encountered similar
situations, which gives you some bonus points for credibility in my book
:-) Seriously, I don't think such operators would be a good idea, but
I'd welcome library functions (the old fashioned ones, with names) for them.


Well I'd never use &&& or ||| as you proposed them for any reason whatsoever.
At least with the <<< and @ operators, both add just enough useful
functionality, that I *would* consider using them, knowing I would be non-
portable. That's the point of all this -- its all just opinion. With
redefinable operators someone's arbitrary opinion doesn't enter into it -- you
can make it conform to whatever opinion you have. If you disagree with someone
else's opinion, you can even *change* it, subject only to your control over the
source.
Do you think perhaps its possible to use an arbitrarily extendable operator
mechanism in order to *clarify* or make code actually more maintainable?


A resounding 'no'. It's an interesting thought, but for me it falls
short on technical and psychological grounds, as I previously tried to
explain.


You haven't shown me that it falls on technical grounds. You've just said that
its hard to implement in the compiler -- but that's obvious.
>Well, I don't know if these dozen-or-so big-number 'powermod' operations
>that are needed to establish an SSL connection are such a big deal as
>you make it.

Its not me -- its Intel, IBM, Motorola, Sun and AMD who seem to be obsessed
with these instructions.
Well they "provide" them, if that's what you mean.


They *have* to.
I don't see them working the committees to get these supported in
non-assembly languages.
That's because they don't care about how the it goes down once it hits
software. Someone has to pay something for a platform specific library? -- who
cares, as long as they sell their hardware. These same people didn't get
together to define the C standard did they? Why would they bother with an
extension just for widening multiplies?


Ok. So your point was, in the first place.... ?


My point is that e-commerce is not a *hardware* problem. At least, since all
hardware supports it, its not a hardware problem. Its a software problem. The
point is the *C* does not offer a solution for e-commerce. The Java people can
point and laugh and say they are better for e-commerce than C, and ironically,
they would be correct (Java has a BigNum class).
Instead the hardware people waste their time on trivial little specifications
like IEEE-754, which the C standards idiots don't bother to look at until 15
years later.


You shouldn't trivialize IEEE-754, it's one hell of a standard, probably
one of the best I have ever seen.


You missed the implied sarcasm ...
And as you are probably well aware,
there's a perfectly good reason why its only now that the C standard
committee is looking at it (only now hardware supporting it is widely
available). I really don't see your point here.
Only NOW?!??! If by *now*, you mean 1991, then sure. The first IEEE-754
implementation came out in like 1984 or so (the 8087). Within a few years,
everyone else came on board -- Alpha (even though DEC was one of the vendors
originally most vigorously opposed to IEEE-754), Sparc, MIPS, PPC, they are all
IEEE-754, of course.
[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.
The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.


What's so bad about using third-party libraries?


From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive). None of them are portable to platforms which are in
development (you know like might be a concern for things like *SMART CARDS*.)
Of course Amazon, Yahoo and Ebay and most banks are
kind of obsessed with them too, even if they don't know it.

I think you would find that bignum operations are a small part of the
load on e-commerce servers.


According to a paper by Intel, widening multiply accounts for something like
30% of the load on typical e-commerce transactions (typing in your credit card
over the net in a way that can't be snooped.)


Reference, please.


It was in a presentation at the Intel Developer Forum in 1999 or 2000. I only
have this presentation in print form.
"load on typical e-commerce transaction" != "load on e-commerce server".
I meant the server. Clients have cycles to burn, obviously the client
performance doesn't matter on anything less than Half-life 2.
If the transaction is simply setup session/send credit-card
information/close session, this could be right. However, a jiffy ago we
were talking SSL.
One single assembly instruction (one *bundle* on Itanium) holds a whole server
> down for 30% of its computation,
Again, you're equating server load with transaction load.


Uhh ... both sides have to do the bignum operation. You have to do it both
ways because of man-in-the-middle attacks. Server load is what we are
concerned with here, obviously (Itanium is *not* a client platform.)
Hang on, are we talking about "overflow" or "carry" here? These are two
different things with signed numbers.

What happens if a is signed and b is unsigned?
My intend was for the operation to follow the semantics of the x86 ADC assembly
instruction.


Fun thing is: it handles both signed and unsigned numbers. Not-so-funny
is the fact that the x86 supports (as do most microprocessors ) both an
overflow flag and a carry flag. So, armed with this, please elaborate on
the meaning.


Yes, but I never mentioned/introduced overflow. I only considered carry. For
bignum implementation, OF is not relevant. Only the CF is.
The point is that this instruction is known to be proper for
doing correct bignum additions.


It sure is... There are libraries containing hand-written assembly that
do fine :-)


Go look at the original purpose of C again. Its supposed to be a portable
assembler. This is a job for C.
- var is set to the result of the addition; the remainder if a carry occurs.

What happens if the signedness of var, a, and b are not equal?
It just behaves like the ADC x86 assembly instruction, the details of which I
will not regurgitate here.


So on adding a signed plus unsigned.... The carry gets a value "as if"
the signed value was actually unsigned?


Yes. Note that x86 does not have a version of ADC that considers the sign of
its operands in any special way that is reflected in the CF.
[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.


It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)
What happens if the bit-widths of var, a, and b are not equal?
The bit-widths would be converted as if the (a + b) operation were happening in
isolation, to match C language semantics.


Ok, so if either is signed, they will both be signed,


No ... that is not the rule used in C. unsigned supersedes signed.
Widening multpilies cost transistor on the CPU. The hardware algorithms are
variations of your basic public school multiply algorithm -- so it takes n^2
transistor s to perform the complete operation, where n is the largest bit
word that the machine accepts for the multiplier. If the multiply were not
widened they could save half of those transistors. So multiply those extra
transistor s by the number of CPUs shipped with a widening multipliy (PPC,
x86s, Alphas, UltraSparcs, ... etc) and you easily end up in the billion
dollar range.
Sure, it's an important instruction.


Important enough to be ubiquitous and available in all modern CPUs.
I could easily see a quarter million just in design
effort, then some quarter million in testing, not to mention the cost of the
extra die area once it shipped -- and this is inside of *ONE* of these
companies, for *ONE* chip generation.


We're still about 999,500,000 USD short of a billion.


They are willing to spend a half million in development, because they know its
going to cost them a hundred million in extra silicon, and want to make sure
its money well spent (Whoops! I am practically giving away which chip company
I am talking about). You, of course, have missed this and the obvious implied
multiplier over all the other chip architectures, over all generations that
have such a feature. A billion dollars looks like a conservative, but
obviously reasonable order of magnitude guess as to the overall cost of
implementing widening multiplies.
For example, inside of Intel, they decided that they were going to reuse their
floating point multiplier for their widening integer add for Itanium. But that
meant that the multiplier had to be able to do 128 bit multiplies (as opposed
to 82 bits, which is all the Itanium would have apparently needed) and
couldn't run a floating point and integer multiply at the same time. This has
non-trivial layout and design impact on the chip.


Sure does. I estimate the cost of this design work would be in the range
of 10^6 dollars, but not 10^9 dollars.


That's *design* cost. That's nothing compared to the *silicon* cost.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Nov 13 '05 #133
In article <br**********@n ews.tudelft.nl> , si****@jigsaw.n l says...
Paul Hsieh wrote:
I run into this every now and then. For example, I was recently trying to
solve the following problem: Create a directed graph on n points, each with an
out degree of d (I am concerned with very small d, like 2), such that the path
length between any pair of points is minimized.

Turns out that this is a problem whose complexity grows super-exponentially
with very small values of n. Despite my best efforts, I don't know the answer
for n=11, d=2 for example (I know its either 3 or 4, but I can't prove that its
not 3). More startling is the possibility that n=12,d=2 might have a smaller
latency (but I don't know that either)!
I used to do programming contests in university (both as a participant
and as a judge later on), so I quite like this sort of problem. Care to
provide some background (via regular e-mail), I think I'd like to think
about this for a bit.


This isn't a rigged contest problem. This is a real world problem. Imagine a
*SERVERLESS* online game topology. How do you communicate "game-moves" in
realtime between players in a way that is most efficient and with the least
perceived lag, given a limited amount of bandwidth? Solve the above problem,
and you have a solution for this problem. (Actually for games you can crank d
a little higher than 2 -- the exact problem that I am looking at is different,
and I am loathe to increase d above 2.)

You can think about this problem all you like. If you want to know if I am
fully of it or not, see if you can get above n=10,d=2.
Anyhow, the point is that the only way to have a chance to squeeze enough
computational juice out of my PC to solve this, I had to hard code huge amounts
of the code, and use a lot of bit twiddling tricks just for each special case.
I used the preprocessor macros as much as possible to make my code manageable,
but in order to change n, I actually have to *modify* the code by adding and
changing *n* lines of code. There's not much I can do about this, I would have
no chance to solve this otherwise.


Ok, that's a case. But wouldn't it have been better then just to make a
program generator (written in C or something else)? Or perhaps use the
m4 macro processor?


Tell me how you do this in a way that's compatible with testing it against 3
different compilers (one of the *OTHER* techniques I use to go after
performance ...) two of which are very unfriendly to being commandline driven,
while still making sense from a debugging point of view?
I'm saying that trying to fix C's intrinsic problems shouldn't start or end
with some kind of resolution of call stack issues. Anyone who understands
machine architecture will not be surprised about call stack depth limitations.

It's the task of a standard to spell these out, I think.

There are far more pressing problems in the language that one would like to
fix.

Yes, but most things that relate to the "encouragin g of writing
extremely unsound and poor code", as you describe C, would be better
fixed by using another language. A lot of the inherently unsafe things
in C are sometimes needed, when doing low-level stuff.


Why is UB in isgraph(-1) needed? Why is UB in gets() needed? Why is the fact
that fgets() skips over '\0' characters needed? Why is a non-portable right
shift on signed integers needed (especially considering the one on unsigned
*is* portable)?


These are side cases, that are at least mentioned in the standard. No
big deal, IMHO.


Its a big deal to me. I don't own any material on the C language that
specifies all of this. I actually go reference source code for a C library I
happen to have access to. But, of course, these sources and a real world
programmer's sense leads me to the very wrong conclusion that isgraph(-1) just
returns false.
[...] This is much more of a big deal:

int count_nodes(str uct Tree *t)
{
return t ? count_nodes(t->left)+count_no des(t->right)
: 0;
}

Things like this are ubiquitous in many kinds of code.
Uh ... dude, that just looks like an expensive way of writing:

int count_nodes (struct Tree *t) {
return 0;
}

I don't suppose you did very well on these programming contests did you? Btw,
weren't you railing against trigraphs?
[...] Trees with a big depth are not uncommon.
Assuming you actually wanted to count the *NODES* of the tree, and were a
little concerned about performance try this:

static int count_nodes_inn er (const struct Tree *t) {
return (t->left ? (1 + count_nodes_inn er (t->left )) : 0) +
(t->right ? (1 + count_nodes_inn er (t->right)) : 0);
}

int count_nodes (const struct Tree *t) {
return t ? count_nodes_inn ter (t) : 0;
}

as it avoids the leaf node tail calls, which will roughly double the
performance.
[...] How can I ever claim such a function is portable if I have no
guarantee whatsoever that it will work on some architectures?
Huh? Are you talking about the stack thing again? Dude: malloc (65536) is
not portable by the same reasoning, and:

int fib(int n) {
if (n <= 2) return 1;
return fib(n-1) + fib(n-2);
}

isn't going to work in any relevant way in any language, for a call like
f(1000000).
>[powerful heap manager]
>
>>But a third party library can't do this portably.
>
>I don't see why not?
You simply do not understand. Malloc and multithreading are intertwined in the
implementation. If you have multithreading, they you have to lock and unlock
your heap, or use other strategies to make sure one thread isn't in the middle
of freeing while another is mallocing. If you don't, it introduces something
called a "race condition". Without exact considerations malloc and free would
simply fail to function properly in a multithreaded environment.

The point is that any extensions of the heap's semantics end up having exactly
the same problem. For example, how would one write a "freeall()" on top of
malloc? (Ignoring the fact that writing it *ON TOP* of malloc/free loses all
the potential *MASSIVE* performance advantage) Well presumably you'd have some
hidden part at the head of each allocation creating a linked list of
outstanding mallocs that were removed upon frees, and the freeall() function
would just walk through them calling free one by one. In a multithreaded
environment, once again, you'd be in race condition city. Same problem for
trying to build seperated heaps (virtual or otherwise.)

Now the C language standard says nothing about this of course, because C
doesn't say anything about multithreading at all. But that doesn't make the
problem go away -- most of the commonly used C compilers today include
multitasking extensions.

The point is that there is a *HUGE* difference between pushing such
functionality into the C standard, versus trying to obtain them from 3rd
parties. If it were put into the standard, then each vendor could modify their
heap implementation (a quite doable endeavour, since each vendor has already
solved the heap lock/unlock problem.) But via 3rd parties, the library vendor
has to learn and keep up to date with the OS primitives for each platform they
support.
Explain to me how you implement malloc() in a *multithreaded* environment
portably. You could claim that C doesn't support multithreading, but I highly
doubt your going to convince any vendor that they should shut off their
multithreadi ng support based on this argument.Now your shifting the goalposts.
How so?


Since in your original statement there was no mention of multithreading.


Mentioning the implementation malloc implicitely brings multithreading into the
discussion. That is what you don't understand.
Allow me to refresh your memory:

"But a third party library can't do this portably.
Because of the multithreading issue ...
[...] Its actual useful
functionality that you just can't get from the C language, and there's
no way to reliably map such functionality to the C language itself.
One is forced to know the details of the underlying platform to
implement such things. Its something that really *should* be in the
language." [Paul Hsieh, 4 posts up]

See?
Yes -- the implicit answer to the "WHY" question that follows each statement,
is "because of the multithreading problem". If you were aware of the issues,
I'm sure this wouldn't be quite so bewildering to you.
The two are related. Writing a heap manager requires that you are
aware of multitasking considerations.


Nonsense. There's many real hardware out there where multitasking is a
non-issue.


Obviously I meant for platforms that support multithreading -- and BTW it has
nothing to do with hardware. Multithreading is a software feature, not a
hardware one. (You're probably thinking of pre-emption, which is a totally
different thing.)
If I want to extend the heap manager, I have to solve them each one by one
> in different ways on different platforms


Nonsense. Unless you need support for something inherently
platform-specific like multithreading.


If the feature of multithreading is in the platform, then extensions to malloc
must take these into consideration. Do you understand the difference this and
what you said? malloc/calloc/realloc/free, or any of the extensions I propose
have no API, modes, or anything that exposes anything about multithreading to
the end user. Its all in the internals.
[...] I still don't see a problem writing a super-duper heap manager on top
of malloc().
Well, that much is obvious ...
The point is, if the C standard were simply to add this functionality straight
into the library, then it would be each compiler vendor's responsibility to add
this functionality into the compiler.


Sure but then the compiler would be more expensive to make, and I would
have to pay more $$$ for a feature I don't need.


I have written such a malloc myself (its almost much faster than typical heap
managers, and contains rich built-in debugging facilities) that I would be only
too happy to donate to the C standards committee on the condition that they
endorse my extensions. Its been well tested, and is easily portable to any
flat address implementations and whose multitasking mechanisms include one of
locks, mutexes (preferred) or semaphores.
And the functionality would be
inherently portable as a result. The requirement of multitasking support would
then be pushed back into the vendors lap -- i.e., the people who have
introduced this non-portable feature in the first place.


Adding multithreading support to a language is a can of worms.


Which I am not suggesting. And its not impossible -- look at the language
ErLang. The VxWorks or QNX or OSE APIs also suggests very good abstractions
for multithreading that should be reasonably applicable to any C-style
language.
[...] I don't
think you quite know what you're suggesting here. Many difficult
semantic issues to resolve. The fun thing about C is that it's simple.
I *DO* know what I am suggesting. This heap stuff is not just off the top of
my head. I've written applications where I've *needed* these features.
By dictating its existence in
the library, it would put the responsibility of making it work right in the
hands of the vendor without affecting the C standards stance on not
acknowledgin g the need for multithreading.
Huh? The C standard describes the library.


Right, but the library doesn't specify anything about multitasking. Its only
implementations that need to deal with it.
Obviously, you cannot write a multithreading heap manager portably if
there is no portable standard on multithreading (doh). If you need this
you can always presume POSIX, and be on your way.
You do not understand. The *IMPLEMENTATION * needs to be multithreading aware.


Perhaps I don't understand. The implementation. .. of what?


malloc/free/calloc/realloc or any extensions on top of them.
What does the latter remark have to do with C's suitability for doing it?
I am saying that I don't disagree with you, but I think you are missing the
point. By simply adding the features/function into C, that would make it
defacto portable from the point of view that matters -- programmers of the C
language.


Well, let's see your proposal then. How would you weave threads in the C
language portably? No handwaving please, you will have to come up with
rock-solid semantics.


I'm not proposing anything about threads. I am talking about the heap.
Extending the heap means you have to be aware of multithreading if your
platform supports it.
For most flat-memory architectures, its actually very straightforward to add in
all the features that I am requesting. I know this because I've written my own
heap manager, which of course uses platform specific behaviour for the one
platform I am interested in. (It only gets complicated for platforms with
unusual memory, like segmented architectures, which have correspondingly
complicated heap managers today.) This is in stark contrast with the
incredibly high bar of platform-specific complications set by trying to do this
outside of the language.


I'm eagerly awaiting your proposal that shows how to do all this in a
platform independent way.


The NULL heap is the system heap.

struct heap * newHeap (struct heap *);
- allocate a subheap from a given heap.
int destroyHeap (struct heap *);
- frees all allocated memory with respect to a given heap (including
sub-heaps), then destroy the heap.
int freeAllH (struct heap *);
- frees all allocated memory with respect to a given heap (including
sub-heaps), but don't destroy the heap.

void * mallocH (struct heap *, size_t);
- Like malloc with respect to a given heap.
void *callocH (size_t, size_t);
- Like calloc with respect to a given heap.
void * reallocH (struct heap *, void *, size_t);
- Like realloc with respect to a given heap.
void * expandH (struct heap *, void *, size_t);
- Like reallocH, but will return NULL instead of moving, and will never
move the pointer.
int freeH (struct heap *, void *);
- Like free with respect to a given heap.

int dgbIsAlloc (struct heap *, void *);
- Returns 1 if, to the best of the library's ability to determine, the
pointer is a properly allocated heap entry. Returns 0, if it can
determine the pointer to not be a properly allocated heap entry.

Notice how there is no mention of multithreading in anything there? Of course,
what is not seen, is that they in fact *ALL* require very exact considerations
for multithreading if the platform supports multithreading.

The value of this is that destroyHeap/freeAllH actually costs about the same as
a handful of frees (even if you've made tens of thousands of outstanding
mallocs.) And expandH can be used to in cases were a complete copy is not
necessary in order to move an allocation to a resized allocation (this feature
would improve the performance of the "Better String Library" in many cases,
that I cannot in any way capture with C's current heap semantics.) Although
dgbIsAlloc is probabilistic (though it would never return 0 on valid heap
pointer), in practice its exactness would be so close as to be a reasonable
realiable mechanism to ensure the correctness of an allocated pointer for
debugging purposes.
>>My proposal allows the programmer to decide what is or is not useful them.
>
>I'm all for that.

Well, I'm a programmer, and I don't care about binary output -- how does your
proposal help me decide what I think is useful to me?

It already did, it seems - You just stated your decision, with regard to
binary output. Fine by me.


Not fine by me. Because some other programmer who's code I have to look at
will use it, and I won't have any idea what it is.


Don't worry, you can look it up in the man pages. Unlike your
user-defined operators... :-)


You would have to look up "%b" ("%db"? -- whatever) in *UP TO DATE* man pages.
(Is *restrict* or the new *complex* type, from C99 properly documented in
*YOUR* current man pages?) My proposed plug-in mechanism would be in modules
or header files that would, one way or another, have to be in your project
somewhere. Presumably, the source would have to be available somewhere, so
there would be *NO QUESTION* as to what it does.

Same comment applies to &&&, ||| versus user defined operators.
Hey, its not me -- apparently its people like you who wants more operators.

Just a dozen or so! But _standardized_ ones.


And you don't see the littlest problem with this proposal? If you add a dozen,
you'd better be sure that they are last dozen that anyone could possibly want
to add to the language. Their value add would have be worth the pain of having
everyone learn about 12 new symbols.


Well, lets start with ?< ?> and ?: for now; ?: instead of |||, and I
drop the &&& requests since it is much less useful. It's already in gcc
so it's implementable. Give everybody 10 years to get used to it, then
we may add a couple more.


My claim is that so few people would use these operators, that the would be
defacto not supported. They would be less useful than bitfields.
My point is that no matter what operators get added to the C language, you'll
never satisfy everyone's appetites. People will just want more and more,
though almost nobody will want all of what could be added.

My solution solves the problem once and for all. You have all the operators
you want, with whatever semantics you want.

That's too much freedom for my taste. If I would want this kind of
thing, I would yank out lex and yacc and code my own language.
Well how is that different from adding just &&& and |||? If you *REALLY
REALLY* want them, then why don't you yank out lex and yacc and code up a new
language?


Because I want to use them in C programs.


But I want them *not* to be used in C programs -- unless they are fully
documented in the source. Always.
What makes your code readable is adherence to an agreed upon coding

> standard that exists outside of what the language defines.

There are several such standards for identifier names. No such standard
exists for operator names, except: use familiar ones; preferably, steal
them from other languages.
Sounds like a reasonable convention to me. How about: All new operators must
be defined in a central module named ----. Or: Only these new operators may be
added as defined by ... yada, yada, yada.


More handwaving... Please humour me, and spell out a reasonable
convention for your operator-introduction.
The coding standards are just different.


I can't tell; I haven't seen a coding standard for operator introduction
yet.


I just gave some. What's hand waving is saying something about stack limits --
you haven't even suggested a way in which that *COULD* be possible to put down
on a piece of paper. Its *OBVIOUS* what I am talking about above. You just
want something specific to have something arbitrary to disagree with. Just
pick a convention -- if something bothers you about certain operators, spell it
out in your convention.
Just like freeform variable names, there is the same incumberance of the
programmers managing the meaning of the symbols for more generic operators.
You have not made a sufficient case to convince me that there is a real
difference between the two.


...Have you ever laid eyes on a moderately complicated APL program,
without knowing APL?


No, but I have looked at the results of the annual obfuscated C code constest.
I have never suggested that you *COULDN'T* abuse the idea. But you can abuse
just about *ANYTHING* in C. You don't think adding in ||| or &&& would add to
the potential for noise in those contests? I gave a very wordy description of
how to define an operator -- its kind of hard to obfuscate something completely
while you have so many hints (a keyword like _Operator, a whole function-like
declaration, etc) as to what it is. You would have to hide it behind macros to
truly obfuscate it, but you can use macros for arbitrary obfuscation
regardless.
><<< and @ are nice though.
BTW -- odd how you actually thought that these were "nice". Do you think you
would have a hard to remembering them? Would it wrankle on your brain because
they were odd and unfamilliar operators that are new to the language? Would
you have a tendancy to write abusive code because of the existence of these new
operators?


The meanings you gave for these rang some bells with my programming
experience. It leads me to believe you have encountered similar
situations, which gives you some bonus points for credibility in my book
:-) Seriously, I don't think such operators would be a good idea, but
I'd welcome library functions (the old fashioned ones, with names) for them.


Well I'd never use &&& or ||| as you proposed them for any reason whatsoever.
At least with the <<< and @ operators, both add just enough useful
functionality, that I *would* consider using them, knowing I would be non-
portable. That's the point of all this -- its all just opinion. With
redefinable operators someone's arbitrary opinion doesn't enter into it -- you
can make it conform to whatever opinion you have. If you disagree with someone
else's opinion, you can even *change* it, subject only to your control over the
source.
Do you think perhaps its possible to use an arbitrarily extendable operator
mechanism in order to *clarify* or make code actually more maintainable?


A resounding 'no'. It's an interesting thought, but for me it falls
short on technical and psychological grounds, as I previously tried to
explain.


You haven't shown me that it falls on technical grounds. You've just said that
its hard to implement in the compiler -- but that's obvious.
>Well, I don't know if these dozen-or-so big-number 'powermod' operations
>that are needed to establish an SSL connection are such a big deal as
>you make it.

Its not me -- its Intel, IBM, Motorola, Sun and AMD who seem to be obsessed
with these instructions.
Well they "provide" them, if that's what you mean.


They *have* to.
I don't see them working the committees to get these supported in
non-assembly languages.
That's because they don't care about how the it goes down once it hits
software. Someone has to pay something for a platform specific library? -- who
cares, as long as they sell their hardware. These same people didn't get
together to define the C standard did they? Why would they bother with an
extension just for widening multiplies?


Ok. So your point was, in the first place.... ?


My point is that e-commerce is not a *hardware* problem. At least, since all
hardware supports it, its not a hardware problem. Its a software problem. The
point is the *C* does not offer a solution for e-commerce. The Java people can
point and laugh and say they are better for e-commerce than C, and ironically,
they would be correct (Java has a BigNum class).
Instead the hardware people waste their time on trivial little specifications
like IEEE-754, which the C standards idiots don't bother to look at until 15
years later.


You shouldn't trivialize IEEE-754, it's one hell of a standard, probably
one of the best I have ever seen.


You missed the implied sarcasm ...
And as you are probably well aware,
there's a perfectly good reason why its only now that the C standard
committee is looking at it (only now hardware supporting it is widely
available). I really don't see your point here.
Only NOW?!??! If by *now*, you mean 1991, then sure. The first IEEE-754
implementation came out in like 1984 or so (the 8087). Within a few years,
everyone else came on board -- Alpha (even though DEC was one of the vendors
originally most vigorously opposed to IEEE-754), Sparc, MIPS, PPC, they are all
IEEE-754, of course.
[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.
The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.


What's so bad about using third-party libraries?


From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive). None of them are portable to platforms which are in
development (you know like might be a concern for things like *SMART CARDS*.)
Of course Amazon, Yahoo and Ebay and most banks are
kind of obsessed with them too, even if they don't know it.

I think you would find that bignum operations are a small part of the
load on e-commerce servers.


According to a paper by Intel, widening multiply accounts for something like
30% of the load on typical e-commerce transactions (typing in your credit card
over the net in a way that can't be snooped.)


Reference, please.


It was in a presentation at the Intel Developer Forum in 1999 or 2000. I only
have this presentation in print form.
"load on typical e-commerce transaction" != "load on e-commerce server".
I meant the server. Clients have cycles to burn, obviously the client
performance doesn't matter on anything less than Half-life 2.
If the transaction is simply setup session/send credit-card
information/close session, this could be right. However, a jiffy ago we
were talking SSL.
One single assembly instruction (one *bundle* on Itanium) holds a whole server
> down for 30% of its computation,
Again, you're equating server load with transaction load.


Uhh ... both sides have to do the bignum operation. You have to do it both
ways because of man-in-the-middle attacks. Server load is what we are
concerned with here, obviously (Itanium is *not* a client platform.)
Hang on, are we talking about "overflow" or "carry" here? These are two
different things with signed numbers.

What happens if a is signed and b is unsigned?
My intend was for the operation to follow the semantics of the x86 ADC assembly
instruction.


Fun thing is: it handles both signed and unsigned numbers. Not-so-funny
is the fact that the x86 supports (as do most microprocessors ) both an
overflow flag and a carry flag. So, armed with this, please elaborate on
the meaning.


Yes, but I never mentioned/introduced overflow. I only considered carry. For
bignum implementation, OF is not relevant. Only the CF is.
The point is that this instruction is known to be proper for
doing correct bignum additions.


It sure is... There are libraries containing hand-written assembly that
do fine :-)


Go look at the original purpose of C again. Its supposed to be a portable
assembler. This is a job for C.
- var is set to the result of the addition; the remainder if a carry occurs.

What happens if the signedness of var, a, and b are not equal?
It just behaves like the ADC x86 assembly instruction, the details of which I
will not regurgitate here.


So on adding a signed plus unsigned.... The carry gets a value "as if"
the signed value was actually unsigned?


Yes. Note that x86 does not have a version of ADC that considers the sign of
its operands in any special way that is reflected in the CF.
[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.


It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)
What happens if the bit-widths of var, a, and b are not equal?
The bit-widths would be converted as if the (a + b) operation were happening in
isolation, to match C language semantics.


Ok, so if either is signed, they will both be signed,


No ... that is not the rule used in C. unsigned supersedes signed.
Widening multpilies cost transistor on the CPU. The hardware algorithms are
variations of your basic public school multiply algorithm -- so it takes n^2
transistor s to perform the complete operation, where n is the largest bit
word that the machine accepts for the multiplier. If the multiply were not
widened they could save half of those transistors. So multiply those extra
transistor s by the number of CPUs shipped with a widening multipliy (PPC,
x86s, Alphas, UltraSparcs, ... etc) and you easily end up in the billion
dollar range.
Sure, it's an important instruction.


Important enough to be ubiquitous and available in all modern CPUs.
I could easily see a quarter million just in design
effort, then some quarter million in testing, not to mention the cost of the
extra die area once it shipped -- and this is inside of *ONE* of these
companies, for *ONE* chip generation.


We're still about 999,500,000 USD short of a billion.


They are willing to spend a half million in development, because they know its
going to cost them a hundred million in extra silicon, and want to make sure
its money well spent (Whoops! I am practically giving away which chip company
I am talking about). You, of course, have missed this and the obvious implied
multiplier over all the other chip architectures, over all generations that
have such a feature. A billion dollars looks like a conservative, but
obviously reasonable order of magnitude guess as to the overall cost of
implementing widening multiplies.
For example, inside of Intel, they decided that they were going to reuse their
floating point multiplier for their widening integer add for Itanium. But that
meant that the multiplier had to be able to do 128 bit multiplies (as opposed
to 82 bits, which is all the Itanium would have apparently needed) and
couldn't run a floating point and integer multiply at the same time. This has
non-trivial layout and design impact on the chip.


Sure does. I estimate the cost of this design work would be in the range
of 10^6 dollars, but not 10^9 dollars.


That's *design* cost. That's nothing compared to the *silicon* cost.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Nov 13 '05 #134
[Someone, probably Paul Hsieh, wrote:]
>Widening multpilies [ie N x N => 2N bit result] ...
>transistor s by the number of CPUs shipped with a widening multipliy (PPC,
>x86s, Alphas, UltraSparcs, ... etc) and you easily end up in the billion
>dollar range.

[Sidney Cadot wrote:] Sure, it's an important instruction.

In article <MP************ ************@ne ws.sf.sbcglobal .net>
Paul Hsieh <qe*@pobox.co m> writes:Important enough to be ubiquitous and available in all modern CPUs.


Actually, UltraSparc lacks a widening multiply.

It also has a widening multiply, but not necessarily the desired one.

In particular, UltraSparc is a 64-bit architecture, with 64-bit
registers. It has a 64-bit MULX instruction, with an unsigned
variant UMULX, but this instruction does 64 x 64 -> 64 multiplies,
discarding the upper 64-bit half of the 128-bit result.

The widening multiply is the old 32-bit-compatibility instruction,
which comes in SMULcc and UMULcc variants. It multiplies the lower
32 bits of two registers with the lower-half 32-bit result going to
a third register. The upper-half 32-bit result goes into the
special "%y" register. This instruction, which is a retrofit from
the V8 SPARC, is in turn backwards-compatible with pre-V8 SPARCs
which had only multiply-step instructions. (Using the multiply-
step instructions required careful fiddling with the %y register;
one of the appendices to the V7 architecture manuals gave multiply
algorithms.)

(Mr Hseih can, of course, simply define UltraSparc as "not a modern
CPU".)

None of this is particularly relevant to C today, which also lacks
a widening multiply -- you have to simulate it with, e.g.:

unsigned long result = (unsigned long)op1 * (unsigned long)op2;

I do think the UltraSparc design was the wrong choice -- C's lack
of direct access to various instructions does not mean the instructions
are unimportant. At the same time, I do not think C *needs* access
to all instructions. There are other programming languages, and
methods of tying C code into code written in other languages. They
are not portable, and this is not important.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 13 '05 #135
Paul Hsieh wrote:
Sidney Cadot <si****@jigsaw. nl> wrote:
Arthur J. O'Dwyer wrote:
On Sat, 6 Dec 2003, Sidney Cadot wrote:
Good comparison. Support for &&& and ||| is exactly as likely to
break existing tools, and is exactly as likely to make it into a
future version of C. [There may not be a firm cause-and-effect there,
but it is correlated.]
Yes, &&& and ||| would break existing tools, but there is precedent for
that (introduction of // comments).


// was added in because so many compilers had already supported them
anyway. This was in keeping with the C standards committee "codifying
standard practices" nonsense. It was already there, all over the
place.


So...?
[...] The not unimportant difference is
that adding support for &&& and ||| would be rather trivial, while
support for Paul's proposal is a complete nightmare.

As to the likelihood of either feature making it into the C standard: I
disagree. The chances of ||| and &&& being added is many orders of
magnitudes greater than that of the the "operator introduction" feature
championed by Paul. Still very close to zero, of course. One has to
approach probabilities of this magnitude logarthmically :-)

Probably true, but your suggestion is also infinitely less ambitious,
and infinitely less useful.


On the first I agree, on the second I don't, and you forgot to mention
it's infinitely easier to implement.
Your suggestion is like the complex
number of variable length macros thing in C99. They are interesting
additions which provide some kind of epsilon more functionality, but
which don't compensate with the fact that you are leaving all C89
compilers behind.

My suggestion is more like "restrict" in C99 which adds functionality
that just is not in the language in any way shape or form.
Ah, yes, restrict. Another nice example of a feature that was introduced
that breaks existing tools.

I like restrict. It is useful, plugging an obvious semantic hole that
made many optimizations impossible. On top of that, it is easy to
support (though a bit less easy to take advantage of), as per 6.7.3.1#6:

"A translator is free to ignore any or all aliasing implications of uses
of restrict."

Unfortunately, your operator introduction magic lacks both these properties.
Some people would use it no matter what -- because it adds something so
totally powerful in terms of functionality that its worth losing the
compatibility with C89.


Pardon my ignorance: What does restrict add in terms of functionality?

Best regards,

Sidney

Nov 13 '05 #136
Paul Hsieh wrote:
In article <br**********@n ews.tudelft.nl> , si****@jigsaw.n l says...
Paul Hsieh wrote:
I run into this every now and then. For example, I was recently trying to
solve the following problem: Create a directed graph on n points, each with an
out degree of d (I am concerned with very small d, like 2), such that the path
length between any pair of points is minimized.

Turns out that this is a problem whose complexity grows super-exponentially
with very small values of n. Despite my best efforts, I don't know the answer
for n=11, d=2 for example (I know its either 3 or 4, but I can't prove that its
not 3). More startling is the possibility that n=12,d=2 might have a smaller
latency (but I don't know that either)!
I used to do programming contests in university (both as a participant
and as a judge later on), so I quite like this sort of problem. Care to
provide some background (via regular e-mail), I think I'd like to think
about this for a bit.

This isn't a rigged contest problem. This is a real world problem.


I think you forgot the capitals on Real World, there... :-)
Imagine a
*SERVERLESS* online game topology. How do you communicate "game-moves" in
realtime between players in a way that is most efficient and with the least
perceived lag, given a limited amount of bandwidth? Solve the above problem,
and you have a solution for this problem. (Actually for games you can crank d
a little higher than 2 -- the exact problem that I am looking at is different,
and I am loathe to increase d above 2.)
Just out of curiosity: why should the graph be directed; don't the nodes
have two-way communication? And you're not considering a lag metric
other than just number of hops?
You can think about this problem all you like. If you want to know if I am
fully of it or not, see if you can get above n=10,d=2.
Perhaps I will, I'll have some time on my hands round Xmas. Sounds like
a nice problem.
Anyhow, the point is that the only way to have a chance to squeeze enough
computationa l juice out of my PC to solve this, I had to hard code huge amounts
of the code, and use a lot of bit twiddling tricks just for each special case.
I used the preprocessor macros as much as possible to make my code manageable,
but in order to change n, I actually have to *modify* the code by adding and
changing *n* lines of code. There's not much I can do about this, I would have
no chance to solve this otherwise.


Ok, that's a case. But wouldn't it have been better then just to make a
program generator (written in C or something else)? Or perhaps use the
m4 macro processor?

Tell me how you do this in a way that's compatible with testing it against 3
different compilers (one of the *OTHER* techniques I use to go after
performance ...) two of which are very unfriendly to being commandline driven,
while still making sense from a debugging point of view?


How about generating C source that you #include from a resident file in
your project, would that be an option?
>I'm saying that trying to fix C's intrinsic problems shouldn't start or end
>with some kind of resolution of call stack issues. Anyone who understands
>machine architecture will not be surprised about call stack depth limitations.

It's the task of a standard to spell these out, I think.
>There are far more pressing problems in the language that one would like to
>fix.

Yes, but most things that relate to the "encouragin g of writing
extremely unsound and poor code", as you describe C, would be better
fixed by using another language. A lot of the inherently unsafe things
in C are sometimes needed, when doing low-level stuff.

Why is UB in isgraph(-1) needed? Why is UB in gets() needed? Why is the fact
that fgets() skips over '\0' characters needed? Why is a non-portable right
shift on signed integers needed (especially considering the one on unsigned
*is* portable)?


These are side cases, that are at least mentioned in the standard. No
big deal, IMHO. Its a big deal to me. I don't own any material on the C language that
specifies all of this. I actually go reference source code for a C library I
happen to have access to. But, of course, these sources and a real world
programmer's sense leads me to the very wrong conclusion that isgraph(-1) just
returns false.
The C standard specifies what is UB and what's not. Grepping an
implementation is a fragile way of dedicing semantics, I would say.
[...] This is much more of a big deal:

int count_nodes(str uct Tree *t)
{
return t ? count_nodes(t->left)+count_no des(t->right)
: 0;
}

Things like this are ubiquitous in many kinds of code.


Uh ... dude, that just looks like an expensive way of writing:

int count_nodes (struct Tree *t) {
return 0;
}


Hmmm yes, it is, isn't it? Now I feel all silly!
I don't suppose you did very well on these programming contests did you? Btw,
weren't you railing against trigraphs?
Held my own, thank you very much. And yes, I was railing against
trigraphs. Why do you ask? There's no trigraph in the code.
[...] Trees with a big depth are not uncommon.

Assuming you actually wanted to count the *NODES* of the tree, and were a
little concerned about performance try this:

static int count_nodes_inn er (const struct Tree *t) {
return (t->left ? (1 + count_nodes_inn er (t->left )) : 0) +
(t->right ? (1 + count_nodes_inn er (t->right)) : 0);
}

int count_nodes (const struct Tree *t) {
return t ? count_nodes_inn ter (t) : 0;
}

as it avoids the leaf node tail calls, which will roughly double the
performance.


Well, I'm glad that you extracted my intended meaning from the name of
the function I used. Good thing I didn't use an operator! ;-)
[...] How can I ever claim such a function is portable if I have no
guarantee whatsoever that it will work on some architectures? Huh? Are you talking about the stack thing again?
I'm _still_ talking about 'the stack thing' yes, since you still seem to
think that this is a triviality. I think it's not.

Dude: malloc (65536) is not portable by the same reasoning, and:

If you think that, then you didn't follow my reasoning. I can find out
at compile time if malloc(65536) is allowed, and if it is, I can find
out at runtime if it has succeeded. I can do no such thing with stack
overflows. The program will exhibit all the symptoms of UB without a
proper cause that can be traced down to the standard.
int fib(int n) {
if (n <= 2) return 1;
return fib(n-1) + fib(n-2);
}

isn't going to work in any relevant way in any language, for a call like
f(1000000).
What are you trying to say here? Help me out a bit.
You simply do not understand. Malloc and multithreading are intertwined in the
implementation. If you have multithreading, they you have to lock and unlock
your heap, or use other strategies to make sure one thread isn't in the middle
of freeing while another is mallocing. If you don't, it introduces something
called a "race condition". Without exact considerations malloc and free would
simply fail to function properly in a multithreaded environment.
The point is that any extensions of the heap's semantics end up having exactly
the same problem. For example, how would one write a "freeall()" on top of
malloc? (Ignoring the fact that writing it *ON TOP* of malloc/free loses all
the potential *MASSIVE* performance advantage) Well presumably you'd have some
hidden part at the head of each allocation creating a linked list of
outstanding mallocs that were removed upon frees, and the freeall() function
would just walk through them calling free one by one. In a multithreaded
environment, once again, you'd be in race condition city. Same problem for
trying to build seperated heaps (virtual or otherwise.)

Now the C language standard says nothing about this of course, because C
doesn't say anything about multithreading at all. But that doesn't make the
problem go away -- most of the commonly used C compilers today include
multitasking extensions.
Are you sure you mean "multitaski ng" (which deals with processes,
usually each having their own address space) rather than
"multithreading " (which deals with threads of execution within the same
process, sharing the address space)?

I still seem to be missing your point rather blatantly, I'm afraid.
The point is that there is a *HUGE* difference between pushing such
functionality into the C standard, versus trying to obtain them from 3rd
parties. If it were put into the standard, then each vendor could modify their
heap implementation (a quite doable endeavour, since each vendor has already
solved the heap lock/unlock problem.) But via 3rd parties, the library vendor
has to learn and keep up to date with the OS primitives for each platform they
support.
Ok, I'll try to summarize my understanding of the point you try to make.
Let's consider it a priori desirable that we have a super-duper heap
manager, with all kinds of nice things like seperate heaps. We agree
that this is often useful. Now let's also assume a certain platform
where multithreading/multitasking is so important that the
platform-specific library provides support for it. We can assume that it
also sports a thread-safe malloc (yes?). You'd still want the
super-duper heap manager. Your point is that mandating a better heap
manager by the standard (without even mentioning multithreading) would
force each vendor to implement these, and implement them thread-safe if
they think multithreading is important to have. Is that close?

Now, as I see it, even if _just_ the standard malloc()/free()/etc are
thread-safe, you can build a more powerful heap manager on that alone,
using just standard C features. If the thing needs to be able to run in
a multithreaded environment, you could even do that, with some extra
effort. You need some form of mutex locking, but you can resort to
Lamport's bakery algorithm for that. Surely it won't be pretty, but it
is, in principle, possible. That's my point.
>Explain to me how you implement malloc() in a *multithreaded* environment
>portably . You could claim that C doesn't support multithreading, but I highly
>doubt your going to convince any vendor that they should shut off their
>multithrea ding support based on this argument.

Now your shifting the goalposts.

How so?


Since in your original statement there was no mention of multithreading.


Mentioning the implementation malloc implicitely brings multithreading into the
discussion. That is what you don't understand.


If the malloc returns either NULL or a pointer (i.e., it conforms
minimally to the standard) then, even in a multithreaded environment, it
is quite possible to build a heap manager on top of that. If you need
high performance even in such demanding circumstances, then, yes, you'd
need some support from preferably hardware-assisted mutexes or similar.
Allow me to refresh your memory:

"But a third party library can't do this portably. Because of the multithreading issue ...
[...] Its actual useful
functionali ty that you just can't get from the C language, and there's
no way to reliably map such functionality to the C language itself.
One is forced to know the details of the underlying platform to
implement such things. Its something that really *should* be in the
language." [Paul Hsieh, 4 posts up]

See? Yes -- the implicit answer to the "WHY" question that follows each statement,
is "because of the multithreading problem". If you were aware of the issues,
I'm sure this wouldn't be quite so bewildering to you.
Ok. Let's try to get some closure here, this is eating into my free time
too much for my taste :-)

I would say that writing a more powerful heap manager is possible in
principle, even in a multithreaded environment, as long as you have a
well-implemented malloc() (and friends) at your disposal. It would be a
drag to do so and it would be slow, but it is possible.

I think I have a clearer bearing on your statement why it would be good
to put this stuff in the library. You do need mutexes, and if the
implementation supports them, it will work better than a shared-memory
simulated mutex. So for performance reasons, I think your idea of
extending the basic heap manager, by extending its functionality as
mandated by the standard, is a good idea.

That is, as long as I can get my msize() function as well :-)

[ skipped a bunch ]My proposal allows the programmer to decide what is or is not useful them.
>>
>>I'm all for that.
>
>Well, I'm a programmer, and I don't care about binary output -- how does your
>proposal help me decide what I think is useful to me?

It already did, it seems - You just stated your decision, with regard to
binary output. Fine by me.

Not fine by me. Because some other programmer who's code I have to look at
will use it, and I won't have any idea what it is.
Don't worry, you can look it up in the man pages. Unlike your
user-defined operators... :-)

You would have to look up "%b" ("%db"? -- whatever) in *UP TO DATE* man pages.
(Is *restrict* or the new *complex* type, from C99 properly documented in
*YOUR* current man pages?)


Yes, as close as gcc is, it does. And in other cases, I have the standard.
My proposed plug-in mechanism would be in modules
or header files that would, one way or another, have to be in your project
somewhere. Presumably, the source would have to be available somewhere, so
there would be *NO QUESTION* as to what it does.

Same comment applies to &&&, ||| versus user defined operators.

>Hey, its not me -- apparently its people like you who wants more operators.

Just a dozen or so! But _standardized_ ones.

And you don't see the littlest problem with this proposal? If you add a dozen,
you'd better be sure that they are last dozen that anyone could possibly want
to add to the language. Their value add would have be worth the pain of having
everyone learn about 12 new symbols.
Well, lets start with ?< ?> and ?: for now; ?: instead of |||, and I
drop the &&& requests since it is much less useful. It's already in gcc
so it's implementable. Give everybody 10 years to get used to it, then
we may add a couple more.

My claim is that so few people would use these operators, that the would be
defacto not supported. They would be less useful than bitfields.


Ouch! That hurts.
>My point is that no matter what operators get added to the C language, you'll
>never satisfy everyone's appetites. People will just want more and more,
>though almost nobody will want all of what could be added.
>
>My solution solves the problem once and for all. You have all the operators
>you want, with whatever semantics you want.

That's too much freedom for my taste. If I would want this kind of
thing, I would yank out lex and yacc and code my own language.

Well how is that different from adding just &&& and |||? If you *REALLY
REALLY* want them, then why don't you yank out lex and yacc and code up a new
language?


Because I want to use them in C programs.


But I want them *not* to be used in C programs -- unless they are fully
documented in the source. Always.


So there we differ. This, I think, is a matter of taste. I propose to
drop this subthread as well.
>What makes your code readable is adherence to an agreed upon coding

>standard that exists outside of what the language defines.

There are severa such standards for identifier names. No such standard
exists for operator names, except: use familiar ones; preferably, steal
them from other languages.

Sounds like a reasonable convention to me. How about: All new operators must
be defined in a central module named ----. Or: Only these new operators may be
added as defined by ... yada, yada, yada.


More handwaving... Please humour me, and spell out a reasonable
convention for your operator-introduction.
The coding standards are just different.


I can't tell; I haven't seen a coding standard for operator introduction
yet. I just gave some.
I presume you mean "All new operators must be defined in a central
module named ----. Or: Only these new operators may be added as defined
by ... yada, yada, yada."

My mistake I guess, I overlooked these coding standards completely. I
will start promoting them within my place of work as of tomorrow :-)
What's hand waving is saying something about stack limits --
you haven't even suggested a way in which that *COULD* be possible
to put down on a piece of paper.
Ah, the "tit-for-tat" technique...!

So far, nobody has challenged me to come up with a specific phrasing for
this (which actually surprises me). Here you go. It's a bit rough, but
you get the idea.

---- Proposed addition to the standard to add stack overflows:
---- Draft

a. For an active function invocation, define an upper bound for the
amount of memory in use by this function as follows (in the
following, denote "Upper Limit on Memory Usage" by "ULMU"):

- For every parameter/local variable of basic type there is a
corresponding constant macro of type size_t:

(unsigned) char: ULMU_CHAR
(unsigned) short: ULMU_SHORT
...
double ULMU_DOUBLE
...
pointer types: ULMU_POINTER

- The ULMU for a struct shall be equal to
the sum of upper limits of its members.
- The ULMU for a union shall be equal to
the maximum of upper limits of its members.
- The ULMU for an array shall be the number
of elements, multiplied by the upper-limit
of its base type.

((the last one is certainly wasteful, but this could be refined))

The ULMU of a function invocation shall be calculated as the
sum of ULMU values of the types of its parameters,
local variables, and return type, PLUS the constant
ULMU_FUNC_OVERH EAD.

b. The implementation shall define a macro ULMU_TOTAL,
which will evaluate to a size_t with value at least
4095. This value is guaranteed to remain constant during
execution of the program.

c. A function invocation is guaranteed to succeed only if the
total summed ULMU values of all active function invocations
(including the to-be active function after invocation)
does not exceed ULMU_TOTAL.
Its *OBVIOUS* what I am talking about above. You just
want something specific to have something arbitrary to disagree with.
You have to trust me on this one: I'm not trying to coerce you into a
laborious writing-down of a precise rule-set, or anything. It's just
that I found that making something concrete (which you intuitively feel
should be possible) gives a lot of insight into the kind of troubles you
would get into when actually doing it.
Just pick a convention -- if something bothers you about certain
operators, spell it out in your convention.
As noted before, I think it would be not a good development if different
organisations settled on a different set of operators, even if they
meticulously documented their convention. Even things as simple as a
wrong (in the sense of: other than you're used to) bracing style can
hinder your productivity for quite some time before you get used to
that. Having a different set of condoned operators at each programming
shop sounds like a small tower of Babel to me.
Just like freeform variable names, there is the same incumberance of the
programmer s managing the meaning of the symbols for more generic operators.
You have not made a sufficient case to convince me that there is a real
difference between the two.


...Have you ever laid eyes on a moderately complicated APL program,
without knowing APL? No, but I have looked at the results of the annual obfuscated C code constest.
I have never suggested that you *COULDN'T* abuse the idea. But you can abuse
just about *ANYTHING* in C.
I'm not talking about abuse, I'm talking about the recognition capacity
of unknown operators (which is zilch, and which takes ages to get used
to). Now that's bad enough for languages, but I think it would be a
mistake of quite monumental proportions to allow some sort of C
"dialects" to emerge with different sets of operators, "operator wars",
and so on.
You don't think adding in ||| or &&& would add to
the potential for noise in those contests? I gave a very wordy description of
how to define an operator -- its kind of hard to obfuscate something completely
while you have so many hints (a keyword like _Operator, a whole function-like
declaration, etc) as to what it is. You would have to hide it behind macros to
truly obfuscate it, but you can use macros for arbitrary obfuscation
regardless.
When reading the code that used (rather than defines) the operator, it
helps tremendously if the meaning jumps from the page, rather than
having to look it up in the Big Manual of Operators in use at your local
workplace.
The meanings you gave for these rang some bells with my programming
experience. It leads me to believe you have encountered similar
situations, which gives you some bonus points for credibility in my book
:-) Seriously, I don't think such operators would be a good idea, but
I'd welcome library functions (the old fashioned ones, with names) for them.

Well I'd never use &&& or ||| as you proposed them for any reason whatsoever.
At least with the <<< and @ operators, both add just enough useful
functionality, that I *would* consider using them, knowing I would be non-
portable. That's the point of all this -- its all just opinion. With
redefinable operators someone's arbitrary opinion doesn't enter into it -- you
can make it conform to whatever opinion you have.


Too much freedom is a dangerous thing, in programming at least.
If you disagree with someone else's opinion, you can even *change* it,
subject only to your control over the source.
It would make for an interesting experiment, sociologically :-)

A small hypothetical case study. Suppose engineer A, working on nuclear
reactor core controllers, works in a team where "?<" means "minimum" and
"#<" means "x is implied by y" for some data structure involved in logic
reasoning on the state of the system. Now he switches companies; he goes
to work on electronic toys and gadgets. Here the convention is
different, but the same operators are used.

Not much hard could be done, surely. Not unless someone from the toy
company joins the nucleare reactor programming team, that is :)
Do you think perhaps its possible to use an arbitrarily extendable operator
mechanism in order to *clarify* or make code actually more maintainable?


A resounding 'no'. It's an interesting thought, but for me it falls
short on technical and psychological grounds, as I previously tried to
explain.

You haven't shown me that it falls on technical grounds. You've just said that
its hard to implement in the compiler -- but that's obvious.


You haven't shown me that it can be implemented in a compiler, even in
theory. That should count for something.
[skip] My point is that e-commerce is not a *hardware* problem. At least, since all
hardware supports it, its not a hardware problem. Its a software problem. The
point is the *C* does not offer a solution for e-commerce. The Java people can
point and laugh and say they are better for e-commerce than C, and ironically,
they would be correct (Java has a BigNum class).
The irony would be complete if the PHB-types would fall for that... :-)
Instead the hardware people waste their time on trivial little specifications
like IEEE-754, which the C standards idiots don't bother to look at until 15
years later.


You shouldn't trivialize IEEE-754, it's one hell of a standard, probably
one of the best I have ever seen. You missed the implied sarcasm ...
There's a nice coding convention on operators that covers this, they're
called "emoticons" nowadays.
And as you are probably well aware,
there's a perfectly good reason why its only now that the C standard
committee is looking at it (only now hardware supporting it is widely
available). I really don't see your point here. Only NOW?!??! If by *now*, you mean 1991, then sure. The first IEEE-754
implementation came out in like 1984 or so (the 8087). Within a few years,
everyone else came on board -- Alpha (even though DEC was one of the vendors
originally most vigorously opposed to IEEE-754), Sparc, MIPS, PPC, they are all
IEEE-754, of course.
In '89, support hadn't spread wide enough to merit a statement (much
less an implementation mandate) on it in the standard.
[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.

The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionalit y, though, they are stuck with assembly, or third party libraries.


What's so bad about using third-party libraries? From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive).
How's that? How are you going to improve things by adding a carry operator?
None of them are portable to platforms which are in
development (you know like might be a concern for things like *SMART CARDS*.)
Now there's an application where the economy of scale suggest doing a
hand-written assembly routine, if I ever saw one...
>Of course Amazon, Yahoo and Ebay and most banks are
>kind of obsessed with them too, even if they don't know it.

I think you would find that bignum operations are a small part of the
load on e-commerce servers.

According to a paper by Intel, widening multiply accounts for something like
30% of the load on typical e-commerce transactions (typing in your credit card
over the net in a way that can't be snooped.)


Reference, please. It was in a presentation at the Intel Developer Forum in 1999 or 2000. I only
have this presentation in print form.
"load on typical e-commerce transaction" != "load on e-commerce server".
I meant the server. Clients have cycles to burn, obviously the client
performance doesn't matter on anything less than Half-life 2.
The question remains: how much time is your typical e-commerce server
involved in an actual transaction? Howe much transactions per second can
a machine handle now? I would suppose that most of the time spent in a
transaction is actually a database lookup at a creditcard company, not a
powermod. The frontend webserver is just waiting for the bank I think.
But these are just half-educated guesses.
If the transaction is simply setup session/send credit-card
information/close session, this could be right. However, a jiffy ago we
were talking SSL.

One single assembly instruction (one *bundle* on Itanium) holds a whole server
> down for 30% of its computation,


Again, you're equating server load with transaction load. Uhh ... both sides have to do the bignum operation.
I don't mean "client load" when I write "transactio n load", I'm still
strictly speaking about the server. I could be convinced that the amount
of time spent in powermod for a 1-shot transaction could be as high as
30%, but I'd be hard-pressed to believe a front-end webserver is busy
for more than 10% doing actual transactions. I think the latency of an
e-commerce transaction is mostly caused by frontend<->bank
communication, and intra-bank database lookup. But I cannot back this up.
You have to do it both
ways because of man-in-the-middle attacks. Server load is what we are
concerned with here, obviously (Itanium is *not* a client platform.)

Hang on, are we talking about "overflow" or "carry" here? These are two
different things with signed numbers.

What happens if a is signed and b is unsigned?
My intend was for the operation to follow the semantics of the x86 ADC assembly
instructio n.


Fun thing is: it handles both signed and unsigned numbers. Not-so-funny
is the fact that the x86 supports (as do most microprocessors ) both an
overflow flag and a carry flag. So, armed with this, please elaborate on
the meaning.

Yes, but I never mentioned/introduced overflow. I only considered carry. For
bignum implementation, OF is not relevant. Only the CF is.


But for signed numbers, carry is almost meaningless. So what remains of
the semantics in this case?
The point is that this instruction is known to be proper for
doing correct bignum additions.


It sure is... There are libraries containing hand-written assembly that
do fine :-) Go look at the original purpose of C again. Its supposed to be a portable
assembler. This is a job for C.
I would usually agree on statements like this, but here's an exception.
I still have trouble envisioning carry semantics.
>- var is set to the result of the addition; the remainder if a carry occurs.

What happens if the signedness of var, a, and b are not equal?

It just behaves like the ADC x86 assembly instruction, the details of which I
will not regurgitate here.


So on adding a signed plus unsigned.... The carry gets a value "as if"
the signed value was actually unsigned? Yes. Note that x86 does not have a version of ADC that considers the sign of
its operands in any special way that is reflected in the CF.
Of course, but you have to know whether your operands are signed or
unsigned to see whether the CF or the OF is the important flag.
[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.

It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)
My x86 is not too strong, but I am quite well-versed in 6502, 68k and
PowerPC, which enables me to think about these things. They all share
carry and overflow bits in their designs, with little difference. After
performing an add-with-carry, both CF and OF are set (although the V
flag may not be reset, as in the 6502), but the CF generally does not
provide useful information when adding two numbers that are signed. So I
don't see how

c +< v = a + b

would provide useful information in c, when both a and b are signed, if
it is to copy the carry status.
What happens if the bit-widths of var, a, and b are not equal?

The bit-widths would be converted as if the (a + b) operation were happening in
isolation, to match C language semantics.


Ok, so if either is signed, they will both be signed,

No ... that is not the rule used in C. unsigned supersedes signed.


Ok. That just leaves the case of two signed values, then.

I'm still wondering about the high_order_valu e() and low_order_value ()
semantics I mentioned in my previous post, actually the part which I was
most curious about.

About the "billions of dollars" cost: I've snipped that part if you
don't mind. I like the technical issues much better :-) Let's agree that
there's a lot of money at stake.

Best regards,

Sidney

Nov 13 '05 #137
Chris Torek wrote:
(snip)
In particular, UltraSparc is a 64-bit architecture, with 64-bit
registers. It has a 64-bit MULX instruction, with an unsigned
variant UMULX, but this instruction does 64 x 64 -> 64 multiplies,
discarding the upper 64-bit half of the 128-bit result. The widening multiply is the old 32-bit-compatibility instruction,
which comes in SMULcc and UMULcc variants. It multiplies the lower
32 bits of two registers with the lower-half 32-bit result going to
a third register. (snip) None of this is particularly relevant to C today, which also lacks
a widening multiply -- you have to simulate it with, e.g.: unsigned long result = (unsigned long)op1 * (unsigned long)op2; I do think the UltraSparc design was the wrong choice -- C's lack
of direct access to various instructions does not mean the instructions
are unimportant. At the same time, I do not think C *needs* access
to all instructions. There are other programming languages, and
methods of tying C code into code written in other languages. They
are not portable, and this is not important.


At some point, one of the thinks I look at in a new architecture
is its implementation of (n)*(n)=(2n) multiply, and (2n)/(n)=(n) divide.

I am not so convinced anymore, though. They are primarily used to
write multiple precision arithmetic routines. It isn't that much
harder to write them with the 32 bit versions as the 64 bit versions.

The amount of work necessary to make the hardware is not necessarily
worthwhile for the small advantage that it supplies.

When IBM first added the extended precision (128 bit) floating point
instructions to S/360 and S/370 they did not include a divide
instruction. They found that it was not worth the cost, but instead
supplied a software simulation of it. They also supplied a simulator
for all the extended precision instructions for those processors that
needed it.

If Sun provided either a simulator or subroutine that would do those
operations, that would seem useful enough to me.

-- glen
Nov 14 '05 #138
In article <br**********@n ews.tudelft.nl> , si****@jigsaw.n l says...
Paul Hsieh wrote:
In article <br**********@n ews.tudelft.nl> , si****@jigsaw.n l says...
Paul Hsieh wrote: Imagine a *SERVERLESS* online game topology. How do you communicate
"game-moves" in realtime between players in a way that is most efficient
and with the least perceived lag, given a limited amount of bandwidth?
Solve the above problem, and you have a solution for this problem.
(Actually for games you can crank d a little higher than 2 -- the exact
problem that I am looking at is different, and I am loathe to increase d
above 2.)


Just out of curiosity: why should the graph be directed; don't the nodes
have two-way communication?


Yes, but bandwidth is also an issue. If you connect to more and more peers and
just push data out to all of them, then your upload bandwidth just gets split
more and more. d=1 leads to a trivial solution but with unacceptable lag. So
d=2 is the minimum divisor that gives rich enough possibilities for topologies
to try to solve this problem in interesting ways.

So although actually you will, on average, have 4 connections established (two
out, and two in), you are transmitting on two of them, and receiving on the
other two.
[...] And you're not considering a lag metric other than just number of
hops?
Solving variants of Travelling Salesman is next on my list of things that I'll
spend my most masochistic moments on.
You can think about this problem all you like. If you want to know if I am
fully of it or not, see if you can get above n=10,d=2.


Perhaps I will, I'll have some time on my hands round Xmas. Sounds like
a nice problem.


Its nice to think about, not so nice if you actually have to solve it.
Dijkstra's weighted shortest path algorithms were nice for client-server based
problems, but the future is peer 2 peer. I believe that this problem and
problems like it are going to be at the center of a lot of internet technology
going forward.
Ok, that's a case. But wouldn't it have been better then just to make a
program generator (written in C or something else)? Or perhaps use the
m4 macro processor?


Tell me how you do this in a way that's compatible with testing it against 3
different compilers (one of the *OTHER* techniques I use to go after
performance ...) two of which are very unfriendly to being commandline driven,
while still making sense from a debugging point of view?


How about generating C source that you #include from a resident file in
your project, would that be an option?


I don't know that MSVC's project mechanisms can be finagled to compile and
execute one problem just to build source code for another program that you
compile up. With makefiles what you are saying is easy, and I've done it in
the past, but you can see how this kind of thing puts pressure on your
development process. A sufficiently useful preprocessor mechanism would allow
one to write such code generators in a completely portable way.
Its a big deal to me. I don't own any material on the C language that
specifies all of this.


The C standard specifies what is UB and what's not. Grepping an
implementation is a fragile way of dedicing semantics, I would say.


Yes, and what percentage of C programmers in the world do you think have ever
had the standard in their hands that they actually refer to?
Assuming you actually wanted to count the *NODES* of the tree, and were a
little concerned about performance try this:

static int count_nodes_inn er (const struct Tree *t) {
return (t->left ? (1 + count_nodes_inn er (t->left )) : 0) +
(t->right ? (1 + count_nodes_inn er (t->right)) : 0);
}

int count_nodes (const struct Tree *t) {
return t ? count_nodes_inn ter (t) : 0;
}

as it avoids the leaf node tail calls, which will roughly double the
performance.


Well, I'm glad that you extracted my intended meaning from the name of
the function I used. Good thing I didn't use an operator! ;-)


Yeah, well I noticed that you didn't catch *my* bug? (I missed one detail.)
I'm _still_ talking about 'the stack thing' yes, since you still seem to
think that this is a triviality. I think it's not.

Dude: malloc (65536) is not portable by the same reasoning, and:

If you think that, then you didn't follow my reasoning. I can find out
at compile time if malloc(65536) is allowed, and if it is, I can find
out at runtime if it has succeeded. I can do no such thing with stack
overflows. The program will exhibit all the symptoms of UB without a
proper cause that can be traced down to the standard.
int fib(int n) {
if (n <= 2) return 1;
return fib(n-1) + fib(n-2);
}

isn't going to work in any relevant way in any language, for a call like
f(1000000).
What are you trying to say here? Help me out a bit.


I'm trying to say that *ALL* languages have this issue. The resources required
to retain a call stack are unknown and often difficult to determine from any
reasonable code analysis. Something like "counting the depth" is not very
useful since there are no language provisions from actually counting out to
such a depth. Some compilers implement tail recursion, though doing in all
cases is equivalent to the halting problem -- so you can't even know for sure
when its happening.

If you want a really good case, look up the Ackermann function. It looks very
simple, with the only concrete constant operation being a "+1". Knowing that
there is some kind of stack bound is totally useless for this function. Most
computations of the Ackermann function blow out any fixed sized stack, and even
a fully addressable 64bit machine would not have the required memory to
implement a stack that could compute any of its non-trivial range. The point
being there is no kind of reasonable metric for measuring the stack
requirements that can be generally applied.

My objection to your proposal can simply be restated as "all machines have
limited resources", deal with it. At best you could insist that upon reaching
some limited stack depth that the problem should immediately halt or something
(with stack checking turned on, most x86 compilers provide for this in the form
of a runtime error.)
Ok, I'll try to summarize my understanding of the point you try to make.
Let's consider it a priori desirable that we have a super-duper heap
manager, with all kinds of nice things like seperate heaps. We agree
that this is often useful. Now let's also assume a certain platform
where multithreading/multitasking is so important that the
platform-specific library provides support for it. We can assume that it
also sports a thread-safe malloc (yes?). You'd still want the
super-duper heap manager. Your point is that mandating a better heap
manager by the standard (without even mentioning multithreading) would
force each vendor to implement these, and implement them thread-safe if
they think multithreading is important to have.
Basically, yes.
Now, as I see it, even if _just_ the standard malloc()/free()/etc are
thread-safe, you can build a more powerful heap manager on that alone,
using just standard C features.
This is completely and utterly untrue. *THAT* is my point. None of the
functions I proposed as extensions can be implemented (even in prototypical
way) on top of just malloc()/free()/etc and retain their intended semantics.
(Of course, you could implement them with memory leaks out the wazoo -- i.e.,
just never free anything, but that's clearly not acceptable.)
[...] If the thing needs to be able to run in
a multithreaded environment, you could even do that, with some extra
effort. You need some form of mutex locking, but you can resort to
Lamport's bakery algorithm for that.
I had to look this one up -- that's one of the classic useless non-solutions
(especially if you want to call malloc more than 4 billion times, for example),
which explains why I've never committed it to memory. You still need to be
able to identify process instances, which *BY ITSELF* is not portable. Process
instances identifiers would count as something *BEYOND* just malloc/free/etc.
[...] Surely it won't be pretty, but it is, in principle, possible. That's
my point.
Not its not. Its not portable, that's for sure.
>>Explain to me how you implement malloc() in a *multithreaded* environment
>>portably . You could claim that C doesn't support multithreading, but I highly
>>doubt your going to convince any vendor that they should shut off their
>>multithrea ding support based on this argument.

>Now your shifting the goalposts.

How so?

Since in your original statement there was no mention of multithreading.


Mentioning the implementation malloc implicitely brings multithreading into the
discussion. That is what you don't understand.


If the malloc returns either NULL or a pointer (i.e., it conforms
minimally to the standard) then, even in a multithreaded environment, it
is quite possible to build a heap manager on top of that.


Why don't you just try to do it. I mean without race conditions, and without
hooking out or using any of the spawn/mutex/etc primitives that you'll find in
the multithreading API. That ought to be a good trick.
[...] If you need
high performance even in such demanding circumstances, then, yes, you'd
need some support from preferably hardware-assisted mutexes or similar.
High performance is just something I realized you could get as a nice side
effect from such extensions. But the functionality enchancements alone are
worth it. *BUT* whether or not this is properly implemented with respect to
the existence of multithreading has no relevance to its performance -- that's
just a question of correctness.
Ok. Let's try to get some closure here, this is eating into my free time
too much for my taste :-)

I would say that writing a more powerful heap manager is possible in
principle, even in a multithreaded environment, as long as you have a
well-implemented malloc() (and friends) at your disposal.
*BZZZT!!!* Wrong.
It would be a drag to do so and it would be slow, but it is possible.
Impossible. Slow, fast or otherwise. Its just impossible.
I think I have a clearer bearing on your statement why it would be good
to put this stuff in the library. You do need mutexes, and if the
implementation supports them, it will work better than a shared-memory
simulated mutex. So for performance reasons, I think your idea of
extending the basic heap manager, by extending its functionality as
mandated by the standard, is a good idea.

That is, as long as I can get my msize() function as well :-)
Well ok, msize() (implemented as _msize() in WATCOM C/C++) does not require
multithreading support. That would be a good addition as well. One has to be
very careful about what kind of implementation impact this has. For example,
if an implementation has an extension which has an heap compatible
automatically growable allocation (for example touching the adjacent memory
page, which would be "paged out" by default, might automatically append it to a
"growable" allocation), then knowing its size might not make a lot of sense.
Don't worry, you can look it up in the man pages. Unlike your
user-defined operators... :-)


You would have to look up "%b" ("%db"? -- whatever) in *UP TO DATE* man pages.
(Is *restrict* or the new *complex* type, from C99 properly documented in
*YOUR* current man pages?)


Yes, as close as gcc is, it does. And in other cases, I have the standard.


But you said you were just going look in your man pages. With my proposal,
there would be no question -- the source would have to be in the project
somewhere. If someone closed their source, well, presumably your going to
expect documentation anyway, if not, you've got the same problem that you
always expect from closed source libraries.
What's hand waving is saying something about stack limits --
you haven't even suggested a way in which that *COULD* be possible
to put down on a piece of paper.


Ah, the "tit-for-tat" technique...!

So far, nobody has challenged me to come up with a specific phrasing for
this (which actually surprises me). Here you go. It's a bit rough, but
you get the idea.

---- Proposed addition to the standard to add stack overflows:
---- Draft

a. For an active function invocation, define an upper bound for the
amount of memory in use by this function as follows (in the
following, denote "Upper Limit on Memory Usage" by "ULMU"):


And what if the local/parameters are not stored, or storable on the stack?
What if an implementation uses multiple stacks depending on the *type* of the
parameter?
- For every parameter/local variable of basic type there is a
corresponding constant macro of type size_t:
You missed all the implicit temporaries required. Homework exercise: for any
fixed number n, create a function which requires at least n implicit
temporaries (not explicitely declared) to implement.
(unsigned) char: ULMU_CHAR
(unsigned) short: ULMU_SHORT
...
double ULMU_DOUBLE
...
pointer types: ULMU_POINTER

- The ULMU for a struct shall be equal to
the sum of upper limits of its members.
This doesn't sound right with implementation defined struct alignment issues.
- The ULMU for a union shall be equal to
the maximum of upper limits of its members.
Doesn't that assume that the implementation implements overlapping union
members? (It makes sense not to overlap floating point with integer if, for
example, the memory units in the CPU internally tag the type of the contents of
the memory in order to allow the FPU and Integer units to execute totally
asynchronously even when both are accessing memory, if say that tag switching
had a high cost.)
- The ULMU for an array shall be the number
of elements, multiplied by the upper-limit
of its base type.

((the last one is certainly wasteful, but this could be refined))

The ULMU of a function invocation shall be calculated as the
sum of ULMU values of the types of its parameters,
local variables, and return type, PLUS the constant
ULMU_FUNC_OVERH EAD.
And what about the number of alloca() calls? alloca() is a common extension
which just eats extra space off the stack dynamically -- the big advantage
being that you don't have to call free or anything like it, for it to be
properly cleaned up. It will be cleaned up upon the function's return (the
simplest implementation of a "freeall".)

Would it not make more sense to remove all these details, and just say that
each function creates a macro named _STACK_USAGE_<f unctionname> which is not
available inside the scope of any function or prior to the definition of the
function? (minus alloca() calls) This would then allow you to track the stack
usage without (incorrectly, I might add) trying to figure it out from details
that are implementation specific?

Regardless, none of this helps if you have multiple stacks, for example. In
such cases, if *any* stack runs out you are in trouble. But its obvious that
your intention is assume a *unified* stack implementation. The math won't
translate in a portable way.
b. The implementation shall define a macro ULMU_TOTAL,
which will evaluate to a size_t with value at least
4095. This value is guaranteed to remain constant during
execution of the program.
But in implementations that *I* am aware of, the stack size is set at *LINK*
time. So it can't *BE* a macro available at compile time without giving up on
this flexibility. This wouldn't work for me at all, since I *use* this feature
in my own projects. At the very least make this a global variable -- that *IS*
settable at link time.
c. A function invocation is guaranteed to succeed only if the
total summed ULMU values of all active function invocations
(including the to-be active function after invocation)
does not exceed ULMU_TOTAL.
Its *OBVIOUS* what I am talking about above. You just
want something specific to have something arbitrary to disagree with.
You have to trust me on this one: I'm not trying to coerce you into a
laborious writing-down of a precise rule-set, or anything. It's just
that I found that making something concrete (which you intuitively feel
should be possible) gives a lot of insight into the kind of troubles you
would get into when actually doing it.


But *YOU* could have written up a possible coding standard, and explain why you
think such a thing could not work in practice. There are as many coding
standards as programming projects -- picking one is always controvertial and
instantly makes you open for attack.
ust pick a convention -- if something bothers you about certain
operators, spell it out in your convention.


As noted before, I think it would be not a good development if different
organisations settled on a different set of operators, even if they
meticulously documented their convention. Even things as simple as a
wrong (in the sense of: other than you're used to) bracing style can
hinder your productivity for quite some time before you get used to
that. Having a different set of condoned operators at each programming
shop sounds like a small tower of Babel to me.


But I don't understand how this is any different from free form function names.
Just like freeform variable names, there is the same incumberance of the
programmer s managing the meaning of the symbols for more generic operators.
You have not made a sufficient case to convince me that there is a real
difference between the two.

...Have you ever laid eyes on a moderately complicated APL program,
without knowing APL?
No, but I have looked at the results of the annual obfuscated C code constest.
I have never suggested that you *COULDN'T* abuse the idea. But you can abuse
just about *ANYTHING* in C.


I'm not talking about abuse, I'm talking about the recognition capacity
of unknown operators (which is zilch, and which takes ages to get used
to). Now that's bad enough for languages, but I think it would be a
mistake of quite monumental proportions to allow some sort of C
"dialects" to emerge with different sets of operators, "operator wars",
and so on.


You mean like function-name wars? Like string library wars (actually that sort
of is happening ...)? The point is that there's no chance that anyone would
mistake an *extensible* operator as being anything other than it is. To
understand it you have to look up its definition, just like a function.
You don't think adding in ||| or &&& would add to
the potential for noise in those contests? I gave a very wordy description of
how to define an operator -- its kind of hard to obfuscate something completely
while you have so many hints (a keyword like _Operator, a whole function-like
declaration, etc) as to what it is. You would have to hide it behind macros to
truly obfuscate it, but you can use macros for arbitrary obfuscation
regardless.


When reading the code that used (rather than defines) the operator, it
helps tremendously if the meaning jumps from the page, rather than
having to look it up in the Big Manual of Operators in use at your local
workplace.


You are completely ignoring that large population of people who happily use
operator overloading today. You don't understand that what you are saying is
nothing more than your very narrow opinion. And you're using your opinion to
try to trump a proposal whose merits lean very heavily on its technical
features.
Too much freedom is a dangerous thing, in programming at least.
Yeah we should all rename our functions as func0000, func0001, etc ...
If you disagree with someone else's opinion, you can even *change* it,
subject only to your control over the source.


It would make for an interesting experiment, sociologically :-)


This "experiment " is already ongoing -- notice how BSD has been forked out the
wazoo, how there are about a million regex implementations , and how Microsoft
has "winsock" as an alternative of UNIX's sockets. The world hasn't ended
because of any of this.
A small hypothetical case study. Suppose engineer A, working on nuclear
reactor core controllers, works in a team where "?<" means "minimum" and
"#<" means "x is implied by y" for some data structure involved in logic
reasoning on the state of the system. Now he switches companies; he goes
to work on electronic toys and gadgets. Here the convention is
different, but the same operators are used.
Or suppose a programmer uses clock() all his life in DOS/Windows as a higher
resolution realtime, yet portable, timer. This programmer then moves to UNIX
and is shocked to find that all mutexes turn in a clock() measurement of 0
microseconds regardless of how long they block! Even Sleep(1000) measures 0
microseconds!

The example you've made up has to do with basic comprehension of the engineer -
- since "?<" would clearly be one of the "extended operators", the programmer
is supposed to look up the its definition. Just like they would have to look
up logMsg() or any other function name that might show up in more than one
project. Just like they are supposed to magically know how clock() behaves on
different OS'es.
Do you think perhaps its possible to use an arbitrarily extendable operator
mechanism in order to *clarify* or make code actually more maintainable?

A resounding 'no'. It's an interesting thought, but for me it falls
short on technical and psychological grounds, as I previously tried to
explain.


You haven't shown me that it falls on technical grounds. You've just said that
its hard to implement in the compiler -- but that's obvious.


You haven't shown me that it can be implemented in a compiler, even in
theory. That should count for something.


I don't think there's a credible assertion that my idea is impossible. You
want me to describe a whole compiler implementation just to support my idea?
As long as one can create a well-defined definition, why should you expect it
would be impossible? Have I created a halting problem or some kind of NP-
complete algorithm or something? -- I don't think so.
>[...] I guess they're pretty satisfied with the bignum
>libs that exist, that provide assembly implementations for all important
>platforms (and even a slow fallback for others). The reality is that
>no-one seems to care except you, on this.

The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionalit y, though, they are stuck with assembly, or third party libraries.

What's so bad about using third-party libraries?
From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive).


How's that? How are you going to improve things by adding a carry operator?


In any way I choose -- that's the point. Right now the barrier to entry is
very high, because I would have to implement a competitive library in assembly
language. I'm not kidding when I tell you these libraries are pure excrement.
Not in terms of performance -- they both perform reasonably well -- I mean in
terms of how the heck do I use these things?

You see, with language support, an intrepid developer could sacrifice *some*
performance for portability and demonstrate a much nicer and more easily usable
interface. This *competition* might *force* these popular vendors for fix
their interfaces so that they could deliver both the performance and usability
benefits. But because both of these have wads of assembly language for various
architectures, is very difficult and expensive to compete with them. Even if I
made a single x86 implementation, it would be difficult for me to beat their
performance by any measurable margin, and my benefit of having possibly a
better interface would have to be balanced against the fact that I have no
portability, while GMP does (by virtue of having been ported.)
None of them are portable to platforms which are in
development (you know like might be a concern for things like *SMART CARDS*.)


Now there's an application where the economy of scale suggest doing a
hand-written assembly routine, if I ever saw one...


About a year ago, Slumberger gave an amazing talk about implementing pico-java
bytecode into a smart card. It was incredible, because it was obvious that
they would be able to use all the Java tools/libraries, they could simulate it
almost perfectly in software -- mostly inheriting the pico-java architecture,
and because of Java's bignum class, it meant that exposing a widening multiply
and carry propogating add functionality in their smart card would be enough to
automatically have all the software they need for RSA all ready to go.

Do you understand? Although C compiler porting frameworks exist, the lack of a
standard for widening multiply and carry propogating add means they would have
to expose a mechanism themselves, then write all the software themselves just
get to the point of having a bignum library (which Java basically comes with)
before proceeding to their RSA implementation.
"load on typical e-commerce transaction" != "load on e-commerce server".
I meant the server. Clients have cycles to burn, obviously the client
performance doesn't matter on anything less than Half-life 2.


The question remains: how much time is your typical e-commerce server
involved in an actual transaction? Howe much transactions per second can
a machine handle now? I would suppose that most of the time spent in a
transaction is actually a database lookup at a creditcard company, not a
powermod.


No. Its the powermod. One instruction -- 30% of all processor time. The rest
-- including hitting a disk or talking to some authentication mechanism over a
network fits into that the 70% (though obviously you would expect some amount
of DMA overlapping.) Get it? Powermod is slower than disk swapping -- that's
what Intel was measuring, and from my own calculations, their claim is quite
credible. As I recall, you only need a about a hundred simultaneous
transactions for this to show up. Remember, Intel's marketing point is that
they claim Itanium will save you money by using fewer systems per number of
maximum sustained transactions. In a very real sense, their claim is very
credible, based solely on the fact that they have dual fully pipelined widening
multipliers.
[...] The frontend webserver is just waiting for the bank I think.
But these are just half-educated guesses.
No -- servers obviously amortize this over multiple threads, and performing
queued up block transfers. While these things take a long time, they are all
queued, blocked, buffered etc. Serial solutions are not exceptable. Your
guess is way off the mark. This wouldn't take any appreciable webserver CPU
power at all.
If the transaction is simply setup session/send credit-card
information/close session, this could be right. However, a jiffy ago we
were talking SSL.

One single assembly instruction (one *bundle* on Itanium) holds a whole server
down for 30% of its computation,

Again, you're equating server load with transaction load.
Uhh ... both sides have to do the bignum operation.


I don't mean "client load" when I write "transactio n load", I'm still
strictly speaking about the server. I could be convinced that the amount
of time spent in powermod for a 1-shot transaction could be as high as
30%, but I'd be hard-pressed to believe a front-end webserver is busy
for more than 10% doing actual transactions.


No. You misunderstood. 30% of *THE WHOLE THING* is in just the powermod. I'm
talking about serving static webpages while some consumer is staring at
advertising or whatever else before they decide to buy whatever. Just their
action of pressing the "checkout" button to buy the stuff in their shopping
card just pegs the server's CPU.

I'm not at all surprised you have a hard time believing this, as your grasp of
multitasking seems tenuous at best.
[...] I think the latency of an
e-commerce transaction is mostly caused by frontend<->bank
communication, and intra-bank database lookup. But I cannot back this up.
The *LATENCY* is something you measure from the client. The latency is not
relevant to the real bottlenecks in the webserver. The client sees the latency
of waiting for the bank, but the server does *NOT* have code like:

while (HeyBankAreYouD oneYet()) {
runSetiAtHome() ;
}

in it. At the very least there is a blocking operation like "Sleep (1)" in the
place of "runSetiAtHome( )". (Hint: the blocking is when a thread comes in
where powermod takes over and spikes the CPU usage.)
>Hang on, are we talking about "overflow" or "carry" here? These are two
>different things with signed numbers.
>What happens if a is signed and b is unsigned?

My intend was for the operation to follow the semantics of the x86 ADC assembly
instructio n.

Fun thing is: it handles both signed and unsigned numbers. Not-so-funny
is the fact that the x86 supports (as do most microprocessors ) both an
overflow flag and a carry flag. So, armed with this, please elaborate on
the meaning.


Yes, but I never mentioned/introduced overflow. I only considered carry. For
bignum implementation, OF is not relevant. Only the CF is.


But for signed numbers, carry is almost meaningless. So what remains of
the semantics in this case?


Why is the carry meaningless? Ok, you obviously don't know the details of how
bignum is implemented. Ok, a large 2s complement integer is filled with an
array of unsigned sub-integers, except for the one at the top. *That* one is
signed, and that's how you know if the bignum integer is negative or not. The
problem is that the act of addition can cause a roll around at the top word
which requires an expansion of bignum by an extra integer in size. The logic
you use to determine this is just related to examining the carry flag.

It all just comes down to the carry flag. The other flags are not necessary
for bignum libraries. Maybe there's a way to transform the OF into something
that can be used in a bignum library, but I am not aware of it.

But if you want the OF, then why not have another operator for that?
>>- var is set to the result of the addition; the remainder if a carry occurs.
>What happens if the signedness of var, a, and b are not equal?

It just behaves like the ADC x86 assembly instruction, the details of which I
will not regurgitate here.

So on adding a signed plus unsigned.... The carry gets a value "as if"
the signed value was actually unsigned?
Yes. Note that x86 does not have a version of ADC that considers the sign of
its operands in any special way that is reflected in the CF.


Of course, but you have to know whether your operands are signed or
unsigned to see whether the CF or the OF is the important flag.


OF is never the important flag. Its not relevant at all. I believe that
taking into account the signedness of the result and/or the operands and one of
these flags is redudant with the other flag (I'm not 100% sure, but the last
time I cared to look at OF, I believe I resolved never to look at it again for
precisely this reason). So while you could map between them, one suddenly
doesn't become more important in different contexts. I *KNOW* the solution for
implementing bignum with a CF alone (and I believe its the standard way
everyone does it), so OF is just not relevant.
[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.

It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)


My x86 is not too strong, but I am quite well-versed in 6502, 68k and
PowerPC, which enables me to think about these things. They all share
carry and overflow bits in their designs, with little difference. After
performing an add-with-carry, both CF and OF are set (although the V
flag may not be reset, as in the 6502), but the CF generally does not
provide useful information when adding two numbers that are signed. So I
don't see how

c +< v = a + b

would provide useful information in c, when both a and b are signed, if
it is to copy the carry status.


In a bignum library, the signedness of each sub-word is *dynamic* (the top word
is signed, the rest are unsigned.) So regardless of the declaration of the
you want the carry flag of an "as if the operands were unsigned".
Ok. That just leaves the case of two signed values, then.
See above. Carry is not a signedness sensitive semantic.
I'm still wondering about the high_order_valu e() and low_order_value ()
semantics I mentioned in my previous post, actually the part which I was
most curious about.


I wasn't able to suss out a coherent question from your post, so I just deleted
this discussion. In this case, I am just copying the x86's MUL -> EDX:EAX or
AMD64's MUL -> RDX:RAX semantics. But other RISC CPUs usually have MULHI and
MULLO instructions -- they usually use some trickery by which if you issue
these instructions back to back with the same source parameters, then its
actually computed in one widening ALU operation (so that its comparable,
hardware-wise, to what the x86 does.)

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Nov 14 '05 #139
Paul Hsieh wrote:
Yes, but bandwidth is also an issue. If you connect to more and more peers and
just push data out to all of them, then your upload bandwidth just gets split
more and more. d=1 leads to a trivial solution but with unacceptable lag. So
d=2 is the minimum divisor that gives rich enough possibilities for topologies
to try to solve this problem in interesting ways.

So although actually you will, on average, have 4 connections established (two
out, and two in), you are transmitting on two of them, and receiving on the
other two.
Ok.
[...] And you're not considering a lag metric other than just number of
hops? Solving variants of Travelling Salesman is next on my list of things that I'll
spend my most masochistic moments on.
You can think about this problem all you like. If you want to know if I am
fully of it or not, see if you can get above n=10,d=2.


Perhaps I will, I'll have some time on my hands round Xmas. Sounds like
a nice problem.

Its nice to think about, not so nice if you actually have to solve it.
Dijkstra's weighted shortest path algorithms were nice for client-server based
problems, but the future is peer 2 peer. I believe that this problem and
problems like it are going to be at the center of a lot of internet technology
going forward.

Ok, that's a case. But wouldn't it have been better then just to make a
program generator (written in C or something else)? Or perhaps use the
m4 macro processor?

Tell me how you do this in a way that's compatible with testing it against 3
different compilers (one of the *OTHER* techniques I use to go after
performanc e ...) two of which are very unfriendly to being commandline driven,
while still making sense from a debugging point of view?


How about generating C source that you #include from a resident file in
your project, would that be an option?

I don't know that MSVC's project mechanisms can be finagled to compile and
execute one problem just to build source code for another program that you
compile up. With makefiles what you are saying is easy, and I've done it in
the past, but you can see how this kind of thing puts pressure on your
development process. A sufficiently useful preprocessor mechanism would allow
one to write such code generators in a completely portable way.


Agreed.
Its a big deal to me. I don't own any material on the C language that
specifies all of this.


The C standard specifies what is UB and what's not. Grepping an
implementatio n is a fragile way of dedicing semantics, I would say.

Yes, and what percentage of C programmers in the world do you think have ever
had the standard in their hands that they actually refer to?


I think there are relatively few excuses for any professional C
programmer not to have access to the standard (and reading it once or
twice wouldn't hurt either). It's only 18 bucks in electronic form, and
you don't even have to go out to get it.
Assuming you actually wanted to count the *NODES* of the tree, and were a
little concerned about performance try this:

static int count_nodes_inn er (const struct Tree *t) {
return (t->left ? (1 + count_nodes_inn er (t->left )) : 0) +
(t->right ? (1 + count_nodes_inn er (t->right)) : 0);
}

int count_nodes (const struct Tree *t) {
return t ? count_nodes_inn ter (t) : 0;
}

as it avoids the leaf node tail calls, which will roughly double the
performanc e.


Well, I'm glad that you extracted my intended meaning from the name of
the function I used. Good thing I didn't use an operator! ;-)

Yeah, well I noticed that you didn't catch *my* bug? (I missed one detail.)

I'm _still_ talking about 'the stack thing' yes, since you still seem to
think that this is a triviality. I think it's not.

Dude: malloc (65536) is not portable by the same reasoning, and:

If you think that, then you didn't follow my reasoning. I can find out
at compile time if malloc(65536) is allowed, and if it is, I can find
out at runtime if it has succeeded. I can do no such thing with stack
overflows. The program will exhibit all the symptoms of UB without a
proper cause that can be traced down to the standard.

int fib(int n) {
if (n <= 2) return 1;
return fib(n-1) + fib(n-2);
}

isn't going to work in any relevant way in any language, for a call like
f(1000000) .


What are you trying to say here? Help me out a bit.

I'm trying to say that *ALL* languages have this issue. The resources required
to retain a call stack are unknown and often difficult to determine from any
reasonable code analysis. Something like "counting the depth" is not very
useful since there are no language provisions from actually counting out to
such a depth. Some compilers implement tail recursion, though doing in all
cases is equivalent to the halting problem -- so you can't even know for sure
when its happening.

If you want a really good case, look up the Ackermann function. It looks very
simple, with the only concrete constant operation being a "+1". Knowing that
there is some kind of stack bound is totally useless for this function. Most
computations of the Ackermann function blow out any fixed sized stack, and even
a fully addressable 64bit machine would not have the required memory to
implement a stack that could compute any of its non-trivial range. The point
being there is no kind of reasonable metric for measuring the stack
requirements that can be generally applied.

My objection to your proposal can simply be restated as "all machines have
limited resources", deal with it.


Then the standard would have to specify what it means exactly by a
"resource". Standards are no places for vague terminology.
At best you could insist that upon reaching
some limited stack depth that the problem should immediately halt or something
(with stack checking turned on, most x86 compilers provide for this in the form
of a runtime error.)

Ok, I'll try to summarize my understanding of the point you try to make.
Let's consider it a priori desirable that we have a super-duper heap
manager, with all kinds of nice things like seperate heaps. We agree
that this is often useful. Now let's also assume a certain platform
where multithreading/multitasking is so important that the
platform-specific library provides support for it. We can assume that it
also sports a thread-safe malloc (yes?). You'd still want the
super-duper heap manager. Your point is that mandating a better heap
manager by the standard (without even mentioning multithreading) would
force each vendor to implement these, and implement them thread-safe if
they think multithreading is important to have.

Basically, yes.

Now, as I see it, even if _just_ the standard malloc()/free()/etc are
thread-safe, you can build a more powerful heap manager on that alone,
using just standard C features.

This is completely and utterly untrue. *THAT* is my point. None of the
functions I proposed as extensions can be implemented (even in prototypical
way) on top of just malloc()/free()/etc and retain their intended semantics.
(Of course, you could implement them with memory leaks out the wazoo -- i.e.,
just never free anything, but that's clearly not acceptable.)

[...] If the thing needs to be able to run in
a multithreaded environment, you could even do that, with some extra
effort. You need some form of mutex locking, but you can resort to
Lamport's bakery algorithm for that.

I had to look this one up -- that's one of the classic useless non-solutions
(especially if you want to call malloc more than 4 billion times, for example),
which explains why I've never committed it to memory.


We have long longs nowadays. 2^32 is not an important limit.
You still need to be
able to identify process instances, which *BY ITSELF* is not portable. Process
instances identifiers would count as something *BEYOND* just malloc/free/etc.
Why do you need to distinguish process instances? In your API, each call
carries the heap it refers to as a parameter.
[...] Surely it won't be pretty, but it is, in principle, possible. That's
my point.

Not its not. Its not portable, that's for sure.

>>>Explai n to me how you implement malloc() in a *multithreaded* environment
>>>portably . You could claim that C doesn't support multithreading, but I highly
>>>doubt your going to convince any vendor that they should shut off their
>>>multithr eading support based on this argument.

>>Now your shifting the goalposts.

>How so?

Since in your original statement there was no mention of multithreading.

Mentioning the implementation malloc implicitely brings multithreading into the
discussion . That is what you don't understand.


If the malloc returns either NULL or a pointer (i.e., it conforms
minimally to the standard) then, even in a multithreaded environment, it
is quite possible to build a heap manager on top of that.

Why don't you just try to do it. I mean without race conditions, and without
hooking out or using any of the spawn/mutex/etc primitives that you'll find in
the multithreading API. That ought to be a good trick.

[...] If you need
high performance even in such demanding circumstances, then, yes, you'd
need some support from preferably hardware-assisted mutexes or similar.

High performance is just something I realized you could get as a nice side
effect from such extensions. But the functionality enchancements alone are
worth it. *BUT* whether or not this is properly implemented with respect to
the existence of multithreading has no relevance to its performance -- that's
just a question of correctness.

Ok. Let's try to get some closure here, this is eating into my free time
too much for my taste :-)

I would say that writing a more powerful heap manager is possible in
principle, even in a multithreaded environment, as long as you have a
well-implemented malloc() (and friends) at your disposal.

*BZZZT!!!* Wrong.

It would be a drag to do so and it would be slow, but it is possible.

Impossible. Slow, fast or otherwise. Its just impossible.


If you can explain to me why you need to explicitly distinguish threads
in the memory manager, I'll concede the point.
I think I have a clearer bearing on your statement why it would be good
to put this stuff in the library. You do need mutexes, and if the
implementatio n supports them, it will work better than a shared-memory
simulated mutex. So for performance reasons, I think your idea of
extending the basic heap manager, by extending its functionality as
mandated by the standard, is a good idea.

That is, as long as I can get my msize() function as well :-)

Well ok, msize() (implemented as _msize() in WATCOM C/C++) does not require
multithreading support. That would be a good addition as well. One has to be
very careful about what kind of implementation impact this has. For example,
if an implementation has an extension which has an heap compatible
automatically growable allocation (for example touching the adjacent memory
page, which would be "paged out" by default, might automatically append it to a
"growable" allocation), then knowing its size might not make a lot of sense.

Don't worry, you can look it up in the man pages. Unlike your
user-defined operators... :-)

You would have to look up "%b" ("%db"? -- whatever) in *UP TO DATE* man pages.
(Is *restrict* or the new *complex* type, from C99 properly documented in
*YOUR* current man pages?)


Yes, as close as gcc is, it does. And in other cases, I have the standard.

But you said you were just going look in your man pages. With my proposal,
there would be no question -- the source would have to be in the project
somewhere. If someone closed their source, well, presumably your going to
expect documentation anyway, if not, you've got the same problem that you
always expect from closed source libraries.


The standard is the canonical place to look for how something should
work. I can't believe I would actually have to argue over that. In fact:
- I won't.
What's hand waving is saying something about stack limits --
you haven't even suggested a way in which that *COULD* be possible
to put down on a piece of paper.


Ah, the "tit-for-tat" technique...!

So far, nobody has challenged me to come up with a specific phrasing for
this (which actually surprises me). Here you go. It's a bit rough, but
you get the idea.

---- Proposed addition to the standard to add stack overflows:
---- Draft

a. For an active function invocation, define an upper bound for the
amount of memory in use by this function as follows (in the
following, denote "Upper Limit on Memory Usage" by "ULMU"):

And what if the local/parameters are not stored, or storable on the stack?


How do you mean? They have to be stored. And I'm presuming they're
stored on the stack.
What if an implementation uses multiple stacks depending on the *type* of the
parameter?
The hypothetical implementation that does this will be out of luck. Not
because the proposal wouldn't work, mind you, but because the ULMU is a
very loose bound. But hey, it beats having no bound whatsoever.

- For every parameter/local variable of basic type there is a
corresponding constant macro of type size_t:


You missed all the implicit temporaries required. Homework exercise: for any
fixed number n, create a function which requires at least n implicit
temporaries (not explicitely declared) to implement.


Ah yes! So it seems. An interesting problem. Have to think about that
for a bit.
(unsigned) char: ULMU_CHAR
(unsigned) short: ULMU_SHORT
...
double ULMU_DOUBLE
...
pointer types: ULMU_POINTER

- The ULMU for a struct shall be equal to
the sum of upper limits of its members.

This doesn't sound right with implementation defined struct alignment issues.


ULMU_xxx can be increased to account for that. We're trying to establish
an upper bound here.
- The ULMU for a union shall be equal to
the maximum of upper limits of its members. Doesn't that assume that the implementation implements overlapping union
members?
Yes. This is a safe assumption (standard quote):

6.7.2.1 #14 The size of a union is sufficient to contain the largest of
its members. The value of at most one of the members can be stored in a
union object at anytime. A pointer to a union object, suitably
converted, points to each of its members (or if a member is a bitfield,
then to the unit in which it resides), and vice versa.
(It makes sense not to overlap floating point with integer if, for
example, the memory units in the CPU internally tag the type of the contents of
the memory in order to allow the FPU and Integer units to execute totally
asynchronously even when both are accessing memory, if say that tag switching
had a high cost.)

- The ULMU for an array shall be the number
of elements, multiplied by the upper-limit
of its base type.

((the last one is certainly wasteful, but this could be refined))

The ULMU of a function invocation shall be calculated as the
sum of ULMU values of the types of its parameters,
local variables, and return type, PLUS the constant
ULMU_FUNC_OVERH EAD.

And what about the number of alloca() calls? alloca() is a common extension
which just eats extra space off the stack dynamically -- the big advantage
being that you don't have to call free or anything like it, for it to be
properly cleaned up. It will be cleaned up upon the function's return (the
simplest implementation of a "freeall".)


Ok, we can add alloca'ed space is each function invocation, plus an
overhead constant per active alloca.
Would it not make more sense to remove all these details, and just say that
each function creates a macro named _STACK_USAGE_<f unctionname> which is not
available inside the scope of any function or prior to the definition of the
function? (minus alloca() calls) This would then allow you to track the stack
usage without (incorrectly, I might add) trying to figure it out from details
that are implementation specific?
A sound idea.
Regardless, none of this helps if you have multiple stacks, for example. In
such cases, if *any* stack runs out you are in trouble. But its obvious that
your intention is assume a *unified* stack implementation. The math won't
translate in a portable way.
Unless you allow that I'm trying to define an upper bound of total use,
which may be loose as far as I am concerned. The idiomatic use would be
to compare this to the ULMU_TOTAL to see if it fits. This may yield the
number of bytes on the smallest check for all I care. On those silly
platforms, you will grossly underestimate stack capacity now and then.
Still beats the current situation.
b. The implementation shall define a macro ULMU_TOTAL,
which will evaluate to a size_t with value at least
4095. This value is guaranteed to remain constant during
execution of the program. But in implementations that *I* am aware of, the stack size is set at *LINK*
time. So it can't *BE* a macro available at compile time without giving up on
this flexibility. This wouldn't work for me at all, since I *use* this feature
in my own projects. At the very least make this a global variable -- that *IS*
settable at link time.
My wording was chosen explicitly to allow for this. Where am I saying
ULMU_TOTAL evaluates to a constant at compile time?
c. A function invocation is guaranteed to succeed only if the
total summed ULMU values of all active function invocations
(including the to-be active function after invocation)
does not exceed ULMU_TOTAL.

Its *OBVIOUS* what I am talking about above. You just
want something specific to have something arbitrary to disagree with.


You have to trust me on this one: I'm not trying to coerce you into a
laborious writing-down of a precise rule-set, or anything. It's just
that I found that making something concrete (which you intuitively feel
should be possible) gives a lot of insight into the kind of troubles you
would get into when actually doing it.

But *YOU* could have written up a possible coding standard, and explain why you
think such a thing could not work in practice. There are as many coding
standards as programming projects -- picking one is always controvertial and
instantly makes you open for attack.

Just pick a convention -- if something bothers you about certain
operators, spell it out in your convention.


As noted before, I think it would be not a good development if different
organisatio ns settled on a different set of operators, even if they
meticulousl y documented their convention. Even things as simple as a
wrong (in the senseof: other than you're used to) bracing style can
hinder your productivity for quite some time before you get used to
that. Having a different set of condoned operators at each programming
shop sounds like a small tower of Babel to me.

But I don't understand how this is any different from free form function names.


Well, what more can I say.
>Just like freeform variable names, there is the same incumberance of the
>programmer s managing the meaning of the symbols for more generic operators.
>You have not made a sufficient case to convince me that there is a real
>differen ce between the two.

...Have you ever laid eyes on a moderately complicated APL program,
without knowing APL?

No, but I have looked at the results of the annual obfuscated C code constest.
I have never suggested that you *COULDN'T* abuse the idea. But you can abuse
just about *ANYTHING* in C.


I'm not talking about abuse, I'm talking about the recognition capacity
of unknown operators (which is zilch, and which takes ages to get used
to). Now that's bad enough for languages, but I think it would be a
mistake of quite monumental proportions to allow some sort of C
"dialects" to emerge with different sets of operators, "operator wars",
and so on.

You mean like function-name wars? Like string library wars (actually that sort
of is happening ...)? The point is that there's no chance that anyone would
mistake an *extensible* operator as being anything other than it is. To
understand it you have to look up its definition, just like a function.
You don't think adding in ||| or &&& would add to
the potential for noise in those contests? I gave a very wordy description of
how to define an operator -- its kind of hard to obfuscate something completely
while you have so many hints (a keyword like _Operator, a whole function-like
declaratio n, etc) as to what it is. You would have to hide it behind macros to
truly obfuscate it, but you can use macros for arbitrary obfuscation
regardless .


When reading the code that used (rather than defines) the operator, it
helps tremendously if the meaning jumps from the page, rather than
having to look it up in the Big Manual of Operators in use at your local
workplace.

You are completely ignoring that large population of people who happily use
operator overloading today. You don't understand that what you are saying is
nothing more than your very narrow opinion. And you're using your opinion to
try to trump a proposal whose merits lean very heavily on its technical
features.


My opionion is determined largely by the psychological objections (to
which you don't subscribe). My technical problem is that I can see no
way that this could be implemented.
Too much freedom is a dangerous thing, in programming at least.

Yeah we should all rename our functions as func0000, func0001, etc ...

If you disagree with someone else's opinion, you can even *change* it,
subject only to your control over the source.


It would make for an interesting experiment, sociologically :-)

This "experiment " is already ongoing -- notice how BSD has been forked out the
wazoo, how there are about a million regex implementations , and how Microsoft
has "winsock" as an alternative of UNIX's sockets. The world hasn't ended
because of any of this.


It's good to keep some perspective...
A small hypothetical case study. Suppose engineer A, working on nuclear
reactor core controllers, works in a team where "?<" means "minimum" and
"#<" means "x is implied by y" for some data structure involved in logic
reasoning on the state of the system. Now he switches companies; he goes
to work on electronic toys and gadgets. Here the convention is
different, but the same operators are used.

Or suppose a programmer uses clock() all his life in DOS/Windows as a higher
resolution realtime, yet portable, timer. This programmer then moves to UNIX
and is shocked to find that all mutexes turn in a clock() measurement of 0
microseconds regardless of how long they block! Even Sleep(1000) measures 0
microseconds!

The example you've made up has to do with basic comprehension of the engineer -
- since "?<" would clearly be one of the "extended operators", the programmer
is supposed to look up the its definition. Just like they would have to look
up logMsg() or any other function name that might show up in more than one
project. Just like they are supposed to magically know how clock() behaves on
different OS'es.


There is a possibility of error caused by bad design (which isn't
trimmed to the type of mistakes we fallible humans make) in your
operator introduction scheme. This is already the case in C++ operator
overloading, your proposal makes it a couple of orders of magnitude worse.

But this is turning into a rehashing exercise. Let's agree to disagree.
>Do you think perhaps its possible to use an arbitrarily extendable operator
>mechanis m in order to *clarify* or make code actually more maintainable?

A resounding 'no'. It's an interesting thought, but for me it falls
short on technical and psychological grounds, as I previously tried to
explain.

You haven't shown me that it falls on technical grounds. You've just said that
its hard to implement in the compiler -- but that's obvious.


You haven't shown me that it can be implemented in a compiler, even in
theory. That should count for something.

I don't think there's a credible assertion that my idea is impossible. You
want me to describe a whole compiler implementation just to support my idea?


I listed some specific objections with regards to disambiguation of the
grammer that the parser would have to be able to handle. I don't expect
an implementation, no... But you haven't even addressed these very basic
questions.
As long as one can create a well-defined definition, why should you expect it
would be impossible? Have I created a halting problem or some kind of NP-
complete algorithm or something? -- I don't think so.
Look up "ambiguous grammer" in the Dragon book, or any other book on
compilers. That should make you think once or twice.
>>[...] I guess they're pretty satisfied with the bignum
>>libs that exist, that provide assembly implementations for all important
>>platfor ms (and even a slow fallback for others). The reality is that
>>no-one seems to care except you, on this.

>The hardware people care that it exists, and not about the form of its
>existenc e, so long as it gets used. For anyone who wants to *USE* this
>functional ity, though, they are stuck with assembly, or third party libraries.

What's so bad about using third-party libraries?

From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive).


How's that? How are you going to improve things by adding a carry operator?

In any way I choose -- that's the point. Right now the barrier to entry is
very high, because I would have to implement a competitive library in assembly
language. I'm not kidding when I tell you these libraries are pure excrement.
Not in terms of performance -- they both perform reasonably well -- I mean in
terms of how the heck do I use these things?


Can't speak for RSA's library, but I don't see what your problem is with
GMP.
You see, with language support, an intrepid developer could sacrifice *some*
performance for portability and demonstrate a much nicer and more easily usable
interface. This *competition* might *force* these popular vendors for fix
their interfaces so that they could deliver both the performance and usability
benefits. But because both of these have wads of assembly language for various
architectures, is very difficult and expensive to compete with them. Even if I
made a single x86 implementation, it would be difficult for me to beat their
performance by any measurable margin, and my benefit of having possibly a
better interface would have to be balanced against the fact that I have no
portability, while GMP does (by virtue of having been ported.)

None of them are portable to platforms which are in
developmen t (you know like might be a concern for things like *SMART CARDS*.)
Now there's an application where the economy of scale suggest doing a
hand-written assembly routine, if I ever saw one...

About a year ago, Slumberger gave an amazing talk about implementing pico-java
bytecode into a smart card. It was incredible, because it was obvious that
they would be able to use all the Java tools/libraries, they could simulate it
almost perfectly in software -- mostly inheriting the pico-java architecture,
and because of Java's bignum class, it meant that exposing a widening multiply
and carry propogating add functionality in their smart card would be enough to
automatically have all the software they need for RSA all ready to go.


....So the smartcard presumes it's inserted into a machine that has these
classes available? They don't reside on the card itself.... Well, they
should be using Java then, I suppose. No need to augment C.
Do you understand? Although C compiler porting frameworks exist, the lack of a
standard for widening multiply and carry propogating add means they would have
to expose a mechanism themselves, then write all the software themselves just
get to the point of having a bignum library (which Java basically comes with)
before proceeding to their RSA implementation.

"load on typical e-commerce transaction" != "load on e-commerce server".
I meant the server. Clients have cycles to burn, obviously the client
performanc e doesn't matter on anything less than Half-life 2.


The question remains: how much time is your typical e-commerce server
involved in an actual transaction? Howe much transactions per second can
a machine handle now? I would suppose that most of the time spent in a
transaction is actually a database lookup at a creditcard company, not a
powermod.

No. Its the powermod. One instruction -- 30% of all processor time. The rest
-- including hitting a disk or talking to some authentication mechanism over a
network fits into that the 70% (though obviously you would expect some amount
of DMA overlapping.) Get it? Powermod is slower than disk swapping -- that's
what Intel was measuring, and from my own calculations, their claim is quite
credible. As I recall, you only need a about a hundred simultaneous
transactions for this to show up. Remember, Intel's marketing point is that
they claim Itanium will save you money by using fewer systems per number of
maximum sustained transactions. In a very real sense, their claim is very
credible, based solely on the fact that they have dual fully pipelined widening
multipliers.

[...] The frontend webserver is just waiting for the bank I think.
But these are just half-educated guesses.

No -- servers obviously amortize this over multiple threads, and performing
queued up block transfers. While these things take a long time, they are all
queued, blocked, buffered etc. Serial solutions are not exceptable. Your
guess is way off the mark. This wouldn't take any appreciable webserver CPU
power at all.

If the transaction is simply setup session/send credit-card
informati on/close session, this could be right. However, a jiffy ago we
were talking SSL.
>One single assembly instruction (one *bundle* on Itanium) holds a whole server
>down for 30% of its computation,

Again, you're equating server load with transaction load.

Uhh ... both sides have to do the bignum operation.


I don't mean "client load" when I write "transactio n load", I'm still
strictly speaking about the server. I could be convinced that the amount
of time spent in powermod for a 1-shot transaction could be as high as
30%, but I'd be hard-pressed to believe a front-end webserver is busy
for more than 10% doing actual transactions.

No. You misunderstood. 30% of *THE WHOLE THING* is in just the powermod. I'm
talking about serving static webpages while some consumer is staring at
advertising or whatever else before they decide to buy whatever. Just their
action of pressing the "checkout" button to buy the stuff in their shopping
card just pegs the server's CPU.

I'm not at all surprised you have a hard time believing this, as your grasp of
multitasking seems tenuous at best.


Hmmm... sweet :-)
[...] I think the latency of an
e-commerce transaction is mostly caused by frontend<->bank
communication , and intra-bank database lookup. But I cannot back this up.

The *LATENCY* is something you measure from the client. The latency is not
relevant to the real bottlenecks in the webserver. The client sees the latency
of waiting for the bank, but the server does *NOT* have code like:

while (HeyBankAreYouD oneYet()) {
runSetiAtHome() ;
}

in it. At the very least there is a blocking operation like "Sleep (1)" in the
place of "runSetiAtHome( )". (Hint: the blocking is when a thread comes in
where powermod takes over and spikes the CPU usage.)


Ok.
>>Hang on, are we talking about "overflow" or "carry" here? These are two
>>differe nt things with signed numbers.
>>What happens if a is signed and b is unsigned?

>My intend was for the operation to follow the semantics of the x86 ADC assembly
>instructio n.

Fun thing is: it handles both signed and unsigned numbers. Not-so-funny
is the fact that the x86 supports (as do most microprocessors ) both an
overflow flag and a carry flag. So, armed with this, please elaborate on
the meaning.

Yes, but I never mentioned/introduced overflow. I only considered carry. For
bignum implementation, OF is not relevant. Only the CF is.


But for signed numbers, carry is almost meaningless. So what remains of
the semantics in this case?

Why is the carry meaningless? Ok, you obviously don't know the details of how
bignum is implemented.


You're really better off not inferring what I know and don't know :-)
Ok, a large 2s complement integer is filled with an
array of unsigned sub-integers, except for the one at the top. *That* one is
signed, and that's how you know if the bignum integer is negative or not. The
problem is that the act of addition can cause a roll around at the top word
which requires an expansion of bignum by an extra integer in size. The logic
you use to determine this is just related to examining the carry flag.
Great. Now what has all this to do with defining a sensible signed <+
semantics for use in C (you know, a general purpose language, not tied
to your particular pet peeve problem).
It all just comes down to the carry flag. The other flags are not necessary
for bignum libraries. Maybe there's a way to transform the OF into something
that can be used in a bignum library, but I am not aware of it.
Because, we're talking about expanding the C language here (not about
your fairy-tale operator introduction hocus-pocus). You need decent
semantics.
But if you want the OF, then why not have another operator for that?
?
>>>- var is set to the result of the addition; the remainder if a carry occurs.
>>
>>What happens if the signedness of var, a, and b are not equal?

>It just behaves like the ADC x86 assembly instruction, the details of which I
>will not regurgitate here.

So on adding a signed plus unsigned.... The carry gets a value "as if"
the signed value was actually unsigned?

Yes. Note that x86 does not have a version of ADC that considers the sign of
its operands in any special way that is reflected in the CF.


Of course, but you have to know whether your operands are signed or
unsigned to see whether the CF or the OF is the important flag.


OF is never the important flag. Its not relevant at all.


I guess that's why everything since the 4004 supports it.

You must really consider that there's more to life than multiplying big
numbers (and that's coming from one of the few people that sees the fun
in it). We're not going to extend C with an operator to fit this (or
ANY) one purpose, however important it may be.
I believe that
taking into account the signedness of the result and/or the operands and one of
these flags is redudant with the other flag (I'm not 100% sure, but the last
time I cared to look at OF, I believe I resolved never to look at it again for
precisely this reason). So while you could map between them, one suddenly
doesn't become more important in different contexts. I *KNOW* the solution for
implementing bignum with a CF alone (and I believe its the standard way
everyone does it), so OF is just not relevant.

[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.
It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)


My x86 is not too strong, but I am quite well-versed in 6502, 68k and
PowerPC, which enables me to think about these things. They all share
carry and overflow bits in their designs, with little difference. After
performing an add-with-carry, both CF and OF are set (although the V
flag may not be reset, as in the 6502), but the CF generally does not
provide useful information when adding two numbers that are signed. So I
don't see how

c +< v = a + b

would provide useful information in c, when both a and b are signed, if
it is to copy the carry status.

In a bignum library, the signedness of each sub-word is *dynamic* (the top word
is signed, the rest are unsigned.) So regardless of the declaration of the
you want the carry flag of an "as if the operands were unsigned".

Ok. That just leaves the case of two signed values, then.

See above. Carry is not a signedness sensitive semantic.

I'm still wondering about the high_order_valu e() and low_order_value ()
semantics I mentioned in my previous post, actually the part which I was
most curious about.

I wasn't able to suss out a coherent question from your post, so I just deleted
this discussion.


....And there I was thinking you had trouble answering it. What was I
thinking?
In this case, I am just copying the x86's MUL -> EDX:EAX or
AMD64's MUL -> RDX:RAX semantics.
So again (as with the carry vs. overflow), you're going to perform an
instruction that works on UNsigned integers even on signed integers.
Pardon my French, but that's just plain stupid. Perhaps you should be a
little less confident about your own understanding of bit-twiddling...
But other RISC CPUs usually have MULHI and
MULLO instructions -- they usually use some trickery by which if you issue
these instructions back to back with the same source parameters, then its
actually computed in one widening ALU operation (so that its comparable,
hardware-wise, to what the x86 does.)


Best regards,

Sidney

Nov 14 '05 #140

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
3456
by: Anthony Martin | last post by:
I've been reading the Java Language Specification, and in Chapter 16 there's an interesting topic called Definite Assignment. http://tinyurl.com/3fqk8 I'm wondering about the idea of "Deferred Final Automatic Variables" like the following: void unflow(boolean flag) { final int k;
14
23153
by: Medi Montaseri | last post by:
Hi, I think my problem is indeed "how to implement something like java's final in C++" The long version.... I have an abstract base class which is inherited by several concrete classes. I have a group of methods that I'd like to implement in the base class
48
8734
by: David J Patrick | last post by:
I'm trying to rewrite the CSS used in http://s92415866.onlinehome.us/files/ScreenplayCSSv2.html. using the w3.org paged media standards as described at http://www.w3.org/TR/REC-CSS2/page.html The ScreenplayCSS is flawed, for several reasons; -overuse of <div id= tags -doesn't scale screen resolutions (convert from px to in, pt ?) -no media="print" (how much coule be shared between "screen" & "print") -no automatic page breaks (with...
10
5126
by: Bezalel Bareli | last post by:
I know I have seen some threads on the subject long time ago and it was using a virtual base class ... in short, what is the nicest way to implement the Java final class in c++ Thanks.
14
1780
by: My4thPersonality | last post by:
Has the fact that both Java and C# are garbage collected, and C++ in not, anything to do with the fact that there is no language item to prevent a class from being inherired from? I once read that Java and C# implement this feature for preformance, but the C++ creators said it was not worse the effort. So because Java and C# are garbage collected, in their case is it worse the effort? What is the connection?
1
8625
by: silverburgh.meryl | last post by:
I am trying to convert this code from java to c++: public final class Type { public static final int DEFAULT = 1; private static int index = 2; public static final int COLUMN1 = (int) Math.pow(2, index++); public static final int COLUMN2 = (int) Math.pow(2, index++); public static final int COLUMN3 = (int) Math.pow(2, index++); public static final int COLUMN4 = (int) Math.pow(2, index++);
5
1403
by: Anthony Baxter | last post by:
On behalf of the Python development team and the Python community, I'm happy to announce the release of Python 2.4.3 (final). Python 2.4.3 is a bug-fix release. See the release notes at the website (also available as Misc/NEWS in the source distribution) for details of the more than 50 bugs squished in this release, including a number found by the Coverity Scan project. Assuming no major bugs pop up, the next release of Python will be...
14
3004
by: Rahul | last post by:
Hi Everyone, I was searching for final class in c++, and i came across few links which suggests to have the constructor of the, to be final class, as private so that any derived class's constructors can't access the same. class C { private:
1
1710
by: Rajib | last post by:
Not that this serves any real purpose, but gcc allows me to do some hack like this: class hide_A { public: class A { public: virtual int final() { return 42; } };
0
9970
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9810
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11207
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10896
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9612
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
8000
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7153
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
2
4251
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3259
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.