Anupam wrote:

Eric Sosman <Er*********@su n.com> wrote

There was a thread about three years ago regarding

implementations of integer square root functions without

resorting to a floating-point conversion. [...]

I would be very much interested in the article. Could you

provide a link to that thread?

Anupam: There's a wonderful service on the Internet,

known as Google. Among other things, Google maintains

searchable archives of Usenet newsgroups, including our

own comp.lang.c. Now, what do you suppose you find if

you go to http://groups.google.com and search for "integer

square root" in the comp.lang.c archives?

And if you can, then the

speeded-up code too.

It's just a few simple modifications to the code in

the first article of the cited thread. I also threw in

a whole lot of comments to describe what was going on;

it was essentially a "demonstrat ion piece." Here it is,

except that I've blotted some E-mail addresses out of

the source citations so as not to increase anyone's

exposure to spam:

/* HISTORY --

* 04-Jul-2002: Renamed the header file.

* 29-Jun-2000: Added a lot of comments.

* 19-May-2000: Original (see DESCRIPTION).

*/

/* DESCRIPTION --

* This function is a slight improvement on code posted to comp.lang.c by

* Lawrence Kirby (xx**@xxxxxxx.x xx), which seems to have been based on code

* by Dann Corbit (xx*****@xxxxxx xxxxx.xxx), which he in turn traces back to

* an 80386 assembly routine by one Arne Steinarson. Arne got the method from

* a fragment of temple wall retrieved from sunken Atlantis by Coq Jasteau;

* the stony text refers to the method as a gift from the "Ancient Astronauts"

* and hints of dire consequences to impious Atlanteans who affronted the

* Heavens by converting the method to binary from its original base seven.

*

* My principal improvement was to use a table of rounded rather than

* truncated first approximations; this reduced the number of cases requiring

* two Newton-Raphson iterations instead of just one. I also eliminated some

* needless additions of unity here and there, and got rid of a pessimization

* which tested for x >= 65535*65535. Finally, I rearranged the range tests

* along the lines suggested by Hermann Kremer (xx***********@ xxxxxx.xx),

* although whether this is a good idea or not depends on what sort of input

* distribution is anticipated.

*/

#include <limits.h>

#include "imath.h"

#if UINT_MAX != 0xFFFFFFFF

#error "This code needs a 32-bit unsigned integer"

#endif

unsigned int

isqrt(unsigned int x)

{

/* This table holds the rounded square roots of the integers 0..255, each

* expressed as a fixed-point binary number with four bits to the left and

* four bits to the right of the binary point. For example, sqrt(2) is

* 1.414... decimal or 1.01101... binary, which is rounded to 1.0111 and

* represented in the table as 0x17 == 23.

*/

static unsigned char est[0x100] = {

0, 16, 23, 28, 32, 36, 39, 42, 45, 48, 51, 53, 55, 58, 60, 62, 64, 66,

68, 70, 72, 73, 75, 77, 78, 80, 82, 83, 85, 86, 88, 89, 91, 92, 93, 95,

96, 97, 99, 100, 101, 102, 104, 105, 106, 107, 109, 110, 111, 112, 113,

114, 115, 116, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128,

129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 139, 140, 141,

142, 143, 144, 145, 146, 147, 148, 148, 149, 150, 151, 152, 153, 153,

154, 155, 156, 157, 158, 158, 159, 160, 161, 162, 162, 163, 164, 165,

166, 166, 167, 168, 169, 169, 170, 171, 172, 172, 173, 174, 175, 175,

176, 177, 177, 178, 179, 180, 180, 181, 182, 182, 183, 184, 185, 185,

186, 187, 187, 188, 189, 189, 190, 191, 191, 192, 193, 193, 194, 195,

195, 196, 197, 197, 198, 199, 199, 200, 200, 201, 202, 202, 203, 204,

204, 205, 206, 206, 207, 207, 208, 209, 209, 210, 210, 211, 212, 212,

213, 213, 214, 215, 215, 216, 216, 217, 218, 218, 219, 219, 220, 221,

221, 222, 222, 223, 223, 224, 225, 225, 226, 226, 227, 227, 228, 229,

229, 230, 230, 231, 231, 232, 232, 233, 234, 234, 235, 235, 236, 236,

237, 237, 238, 238, 239, 239, 240, 241, 241, 242, 242, 243, 243, 244,

244, 245, 245, 246, 246, 247, 247, 248, 248, 249, 249, 250, 250, 251,

251, 252, 252, 253, 253, 254, 254, 255, 255 };

unsigned int r;

if (x >= 1U << 14) {

/* Some values in this range require at least one Newton-Raphson step

* to refine the initial estimate.

*/

if (x >= 1U << 30) {

r = est[x >> 24] << 8;

/* Some values in this range require two Newton-Raphson iterations

* instead of just one, so perform the first one here.

*/

r = (r + x / r) / 2;

}

else if (x >= 1U << 28)

r = est[x >> 22] << 7;

else if (x >= 1U << 26)

r = est[x >> 20] << 6;

else if (x >= 1U << 24)

r = est[x >> 18] << 5;

else if (x >= 1U << 22)

r = est[x >> 16] << 4;

else if (x >= 1U << 20)

r = est[x >> 14] << 3;

else if (x >= 1U << 18)

r = est[x >> 12] << 2;

else if (x >= 1U << 16)

r = est[x >> 10] << 1;

else

r = est[x >> 8] << 0;

/* Refine the estimate with one (or one more) Newton-Raphson step.

*/

r = (r + x / r) / 2;

}

else {

/* In this range, the initial estimate (after scaling and rounding)

* is very close and needs at most a small final tweak.

*/

if (x >= 1U << 12)

r = (est[x >> 6] + 1) >> 1;

else if (x >= 1U << 10)

r = (est[x >> 4] + 2) >> 2;

else if (x >= 1U << 8)

r = (est[x >> 2] + 4) >> 3;

else {

/* In this range, the initial estimate is exact and requires no

* tweaking at all.

*/

return est[x >> 0] >> 4;

}

}

/* Final tweak: The estimate (original or once- or twice-refined) is either

* correct as it stands or is one too large. (Don't worry about overflow

* in the multiplication; `r' is strictly less than 65536 here.)

*/

if (r * r > x)

--r;

return r;

}

--

Er*********@sun .com