Hi.

(Xposted to both comp.lang.c++ and comp.programming since I've got

questions related to both C++ language and general programming)

I've got the following C++ code. The first routine runs in like 65% of

the time of the second routine. Yet both do the same thing. However,

the second one seems better in terms of the way the code is written

since it helps encapsulate the transformation in the inner loop better

making it easier to read, at least in my opinion. Maybe it's not,

maybe the former is "better" that way and I should go with it, but if

the latter is "better" in that respect should I just ditch it anyway

and tradeoff for performance since I want this thing to be fast???

What each routine does is multiply two arbitrary-precision integers

together. The second one though uses an additional "slice" type that

provides a window enabling the "multiply and add" operation to be

performed on a limited range of digits, which can then be advanced

across the number, making clear that part of the algorithn.

I'm using the simple "grade school" multiply algorithm. Note how the

second routine more easily outlines this algorithm, while in the first

it is a little more difficult to see. Which would you prefer, exactly?

For brevity, class definitions and other members have been omitted.

However it should not be too difficult to figure out the necessary

information.

Also, could I lose the typecasts in this code or do I need them?

The reason why I'm asking is because I remembered getting told earlier

by someone here (Alf P. Steinbach, group: comp.lang.c++) about how my

last implementation of my program (this is for a fractal generator)

had some sort of bad, bug-inducing "abstraction gap" yet I was unable

to get enough elaboration responses about it so I've been more or less

shooting around trying to figure out how to make it better (although

he did give me some *examples* of where there were problems, which I

have since fixed.). But it seems I'm losing performance and that's not

a good thing for my application. Not to mention also that my original

bignum implementation was called "silly" as well("...although I didn't

look at the silly bignum class, whatever it's fault is..." ref:

http://groups.google.com/group/comp....9?dmode=source)

without any explanation as to what exactly was so silly about it. So I

had to start dark-shooting there too trying to figure out what I had

done wrong.

First version (fastest):

---

void RawDigit::Mul(const RawDigit &a, const RawDigit &b)

{

/* Check to see if we're doing an in-place multiply */

if(&a == this)

{

MulInPlace(b);

return;

} else if(&b == this) {

MulInPlace(a);

return;

}

/* Get lengths. */

std::size_t rLength(GetLength());

std::size_t aLength(a.GetLength());

std::size_t bLength(b.GetLength());

/* Zeroize this. */

Zeroize();

/* Do the multiplication. */

TwoDigit64 carry(0);

for(std::size_t i(0);i<aLength && i<rLength;i++)

{

carry = 0;

for(std::size_t j(0);j<bLength && (i+j)<rLength;j++)

{

TwoDigit64

tmp((static_cast<TwoDigit64>(a.digits[i])*b.digits[j]) +

digits[i+j] + carry);

carry = tmp >DIGIT_BITS;

tmp -= carry << DIGIT_BITS;

digits[i+j] = tmp;

}

if(i+bLength < rLength)

digits[i+bLength] = carry;

}

}

---

Second version (slower):

---

*** Slice manipulation.

inline Digit32 MulAddOp(Digit32 a, Digit32 b, Digit32 c, Digit32

&carry)

{

TwoDigit64 sum(a + (static_cast<TwoDigit64>(b)*c) + carry);

carry = sum >DIGIT_BITS;

return(sum & MAX_DIGIT);

}

inline Digit32 MulAddCarryPropOpL(Digit32 a, Digit32 &carry)

{

Digit32 sum(a + carry);

carry = sum < carry;

return(sum);

}

inline Digit32 MulAddCarryPropOpR(Digit32 b, Digit32 c, Digit32

&carry)

{

TwoDigit64 sum((static_cast<TwoDigit64>(b)*c) + carry);

carry = sum >DIGIT_BITS;

return(sum & MAX_DIGIT);

}

Digit32 RawDigitSlice::MulAdd(const ConstRawDigitSlice &a,

const Digit32 b,

const ConstRawDigitSlice &c)

{

/* Set up iterators */

DigitIterator32 ri(GetStartIterator());

ConstDigitIterator32 ai(a.GetConstStartIterator());

ConstDigitIterator32 ci(c.GetConstStartIterator());

/* Get minimum length of a and c. */

std::size_t minLength(std::min(std::min(a.length, c.length),

length));

/* Do the combined multiply + add */

Digit32 carry(0);

std::size_t i(0);

for(i;i<minLength;++i,++ri,++ai,++ci)

*ri = MulAddOp(*ai, b, *ci, carry);

/* Handle the remaining part(s) of a or b. */

int largerLength(std::min(std::max(a.length, c.length), length));

if(a.length >= c.length)

{

for(i;i<largerLength;++i,++ri,++ai)

*ri = MulAddCarryPropOpL(*ai, carry);

} else {

for(i;i<largerLength;++i,++ri,++ci)

*ri = MulAddCarryPropOpR(b, *ci, carry);

}

/* Finish carry propagation */

if(largerLength < length)

{

for(i;i<length;++i,++ri)

*ri = MulAddCarryPropOpL(0, carry);

}

/* Done! */

return(carry);

}

*** This next routine is in a different source file, the one

implementing the full "RawDigit" class.

*** The former would be in another file that implements the

"RawDigitSlice" class.

void RawDigit::Mul(const RawDigit &a, const RawDigit &b)

{

/* Check to see if we're doing an in-place multiply */

if(&a == this)

{

MulInPlace(b);

return;

} else if(&b == this) {

MulInPlace(a);

return;

}

/* Get lengths. */

std::size_t rLength(GetLength());

std::size_t aLength(a.GetLength());

std::size_t bLength(b.GetLength());

/* Zeroize this. */

Zeroize();

/* Set up slices. */

RawDigitSlice runningSum(*this, 0, aLength + 1); /* explanation:

(<RawDigit object>, <origin digit idx>, <nbr of digits in slice>) */

ConstRawDigitSlice as(a);

/* Do the multiplication. */

for(std::size_t i(0);i<bLength && i<rLength;++i,++runningSum)

acc.MulAdd(runningSum, b[i], as);

}

---