Hi Lew and Grumble!

void main(void)

{

int a = 0x1234;

int b = 0xABCD;

long int result_1 = (a * b);

Yes, so? You perform 16bit multplication, then explicitly cast the

results to long int. Your (long int) cast doesn't affect precision of

the multiplication, it affects the precision of the expression of the

results of the integer multiplication. In other words, your (long int)

cast simply converts the results of an int multiplication to long.

This is exactly, what I have observed and what I explained in my question.

If you wanted to increase the precision of the results, you should

have cast the two values to long /before/ you multiplied. In other

words, you should have

long int result_3 = (long int)a * (long int)b;

or even

long int result_4 = (long int)a * b;

or as Grumble said: http://www.eskimo.com/~scs/C-faq/q3.14.html

So long as one of the two operands is of long int precision, the

multiplication will be performed with long int precision, and the

results will be of long int precision. The cast(s) on the operand(s)

simply coerce one or both of the operands into long int precision,

thus leading to long int multiplication, and a suitable long int

result.

Is this the only way to compute a (16 bits) * (16 bits) = (32 bits)

multiplication? I can't imagine, that this *really silly* workaround is

the only way.

Let me explain again:

If one of the multiplicants is of type "long int", the other one will be

converted to "long int" too (if not already been done). This is a basic

principle in C when handling arithmetic operations.

But now -both operands are "long int"- truly a full (32 bits) * (32

bits) = (64 bits) operation is computed and the upper 32 bits of the

result are thrown away.

Again - for clarity: I can see, that the hardware multiplier computes

*two times* a multiplication (2 times a 16 bit multiplications, that

result in a 32 bit multiplication.). But the correct result of the 16

bit multiplication can be computed with only one 16 bit multiplication.

As I am a hardware engineer, I will translate the problem to software:

If I don't have any hardware multiplier, the multiplication is done in

Software (some conditional accumulations). This means

long int result_3 = (long int)a * (long int)b; // and derivatives

results in truely a complete 32 bit multiplication. All nessecary steps

and the correct result of a complete 32 bit multiplication are computed

(64 bits wide). (I can see the correct result in the registers.) But

then, after all this unnessecary computing, the upper half of the result

is moved to trash.

Both realisations -with hardware multiplier or in software- result in

the *doubled* computing load. I proved it with studying the assembler code.

Back to the 16 bit multiplication: I can see, studying the assembler

code, that the correct result of the "int" * "int" multiplication is

visible in the registers (both, if a hardware multiplier is used and if

all is done in software), but after this, the upper half part of the

result is destroyed.

Conclusion: On a n bit machine, where type "int" is n bits wide, AND

there exists a integer data type with 2n bithwidth (let's call it a

"int2"), the doubled and unnessecary computing load is done, if one

wants to have the result of a multiplication as wide as "int2".

I am not sure, but this problem should occur on a standart 32 bit x86

machine, if one wants to compute the "int" * "int" = "qword"

multiplication, too. (qword is 64 bits wide, isn't it?)

O.k. - today nearly everybodes does not care, if there are 1 or 2

multiplications computed, but only one is nessecary, but I do!

Especially for low power and even for high speed applications on small

(embedded) systems, the (low) CPU power has a major influence and I do

not want to waste it.

So is coding the multiplication in assembler the only solution?

Thanks for your comments.

Ralf