435,106 Members | 2,335 Online + Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,106 IT Pros & Developers. It's quick & easy.

Long Num speed

 P: n/a For a while now i have been "playing" with a little C program to compute the factorial of large numbers. Currently it takes aboy 1 second per 1000 multiplications, that is 25000P1000 will take about a second. It will be longer for 50000P1000 as expected, since more digits will be in the answer. Now, on the Num Analyses forum/Group there is a post reaporting that that person wrot java code that computed 1000000! in about a second. That is about 10000 times faste than I would expect my code to do it. So the two possiblilities are: 1) I am doing something terribly wrong 2) The othr person is lying At the moment i am inclined to believe that its number 1. I am posing my code below, I would like to hear your opinions about why it is slow and how i can improove its speed. I know that there are public BIGNUM libraries which are already optimized for such calculations, but I dont want to use them, bcause i want to approach this problem on a lower level. I am mostly interested to find out how to get this code perform faster or what alternative algorythms i should consider. The factorial calculation is just a test program. ===================start paste========================== #include #include #include #define al 1024*20 #define base 1000 typedef long int IntegerArrayType; struct AEI{ IntegerArrayType data[al]; long int digits; }; void pack(IntegerArrayType i, struct AEI *N1); void Amult(struct AEI * A, struct AEI * B, struct AEI * C); void AEIprintf(struct AEI * N1); void AEIfprintf(FILE * fp, struct AEI * N1); int main(void) { struct AEI *N1, *MO, *Ans; long i = 0, j = 0, ii, NUM, iii; FILE *ff; N1=malloc(sizeof(struct AEI)); MO=malloc(sizeof(struct AEI)); Ans=malloc(sizeof(struct AEI)); while (i < al){ N1->data[i] = 0; MO->data[i] = 0; Ans->data[i]=0; i++; } printf("Enter integer to Factorialize: "); scanf("%ld", &NUM); pack(1, N1); pack(1, Ans); ff = fopen("Results.txt", "w"); printf("you entered: %ld", NUM); i=1; while(i < NUM ){ iii=0; while(iiidata[ii] = 0; ii++; } pack((i+iii), MO); Amult(N1, MO, N1); iii++; } i+=iii; Amult(Ans, N1, Ans); printf("\nProgress is: %d",i); pack(1, N1); } if(ff!=NULL){ fprintf(ff,"\n%d\n",i-1); AEIfprintf(ff, Ans); } fclose(ff); printf("\nProgress: 100\%"); return 0; } void AEIprintf(struct AEI *N1){ float fieldLength; double temp; char format1; long j, FL0; j = N1->digits-1; FL0=(long)log10((float)base); fieldLength = (float)log10((float)base); temp = modf(fieldLength, &fieldLength); format1 = '%'; format1 = '0'; format1 = fieldLength + 48; format1 = 'd'; format1 = 0x00; printf("%*d", FL0, N1->data[j]); j--; while (j >= 0) { printf(format1, N1->data[j]); j--; } return; } void AEIfprintf(FILE * fp, struct AEI *N1){ long j = N1->digits-1; double fieldLength, temp; char format0, format1; fieldLength = (int)log10(base); temp = modf(fieldLength, &fieldLength); format0 = '%'; format0 = fieldLength + 48; format0 = 'd'; format0 = 0x00; format1 = '%'; format1 = '0'; format1 = fieldLength + 48; format1 = 'd'; format1 = 0x00; fprintf(fp,format0, N1->data[j]); j--; while (j >= 0){ fprintf(fp, format1, N1->data[j]); j--; } return; } void pack(IntegerArrayType i, struct AEI * N1) { long t = 1, i1, j = 0; while (t == 1){ i1 = i % base; N1->data[j] = i1; i = (i - i1) / base; j++; if (i == 0) t = 0; } N1->digits=j; return; } void Amult(struct AEI * A, struct AEI * B, struct AEI * C){ /*C = A * B; */ long i, ii,d, result, carry=0, digits=0; struct AEI *Ans; Ans=malloc(sizeof(struct AEI)); i=0; d= (A->digits+B->digits-1); while(idata[i]=carry; carry=0; ii=0; while(ii<=i){ if(B->data[ii]!=0){ Ans->data[i]+=A->data[i-ii]*B->data[ii]; carry+=Ans->data[i]/base; Ans->data[i]=Ans->data[i]%base; } ii++; } carry+=Ans->data[i]/base; Ans->data[i]=Ans->data[i]%base; i++; } if(carry!=0){ d++; Ans->data[i]=carry; } C->digits=d; i=0; while(idata[i]=Ans->data[i]; i++; } return; } ====================end paste============================ I tried to indent the code with spaces instead of tabs, but if some parts end up not properly indented, I hope no one will hold it against me. Thanks ahead Nov 1 '06 #1
35 Replies

 P: n/a fermineutron wrote: For a while now i have been "playing" with a little C program to compute the factorial of large numbers. Currently it takes aboy 1 second per 1000 multiplications, that is 25000P1000 will take about a second. It will be longer for 50000P1000 as expected, since more digits will be in the answer. Now, on the Num Analyses forum/Group there is a post reaporting that that person wrot java code that computed 1000000! in about a second. That is about 10000 times faste than I would expect my code to do it. So the two possiblilities are: 1) I am doing something terribly wrong 2) The othr person is lying At the moment i am inclined to believe that its number 1. I am posing my code below, I would like to hear your opinions about why it is slow and how i can improove its speed. Before anyone reviews the code, have you profiled it? If not, why not? If you have, where were the bottlenecks? -- Ian Collins. Nov 1 '06 #2

 P: n/a fermineutron: Now, on the Num Analyses forum/Group there is a post reaporting that that person wrot java code that computed 1000000! in about a second. That is about 10000 times faste than I would expect my code to do it. So the two possiblilities are: 1) I am doing something terribly wrong 2) The othr person is lying The burden of proof is on the Java dude. Probability suggests that he's lying, because only a very small proportion of proficient programmers waste their time on mickey-mouse hold-my-hand languages such as Java. Also, Java is slow. Either that or the algorithm's something stupid like: char const *Func(void) { return "2349507239573258471257975093275092375093275923875 09"; } Ask the Java dude if you can see his code. If he refuses, assume that he's a liar, then egg his house. -- Frederick Gotham Nov 1 '06 #3

 P: n/a Ian Collins wrote: Before anyone reviews the code, have you profiled it? If not, why not? If you have, where were the bottlenecks? I profilled it, but there were no obvious bottlenecks which i would not anticipate to be there by design. here is the profiler output http://igorpetrusky.awardspace.com/Temp/RunStats.html I was thinking that maybe there is some other algorythm that is better than mine for the long int arithemetic? Factorial calculation is just a driver for the multiplication function. Nov 2 '06 #4

 P: n/a Barry Schwarz said: On 1 Nov 2006 14:28:20 -0800, "fermineutron" wrote: >>float fieldLength; double temp;char format1; long j, FL0; j = N1->digits-1;FL0=(long)log10((float)base);fieldLength = (float)log10((float)base);temp = modf(fieldLength, &fieldLength); This is a constraint violation. The second argument should be a double*, not a float*. During execution it will invoke undefined behavior. Didn't your compiler complain about a type mismatch? In my experience, fermineutron pays no heed to correctness issues, and is concerned only with speed. He has a track record of ignoring corrections. I think you're wasting your time, Barry. -- Richard Heathfield "Usenet is a strange place" - dmr 29/7/1999 http://www.cpax.org.uk email: rjh at above domain (but drop the www, obviously) Nov 2 '06 #6

 P: n/a Hi The math software in my ti89 calculator with a 12 MHz 68k CPU can calculate all 613 digits of 299! in 1 second. It then takes 5 seconds to convert that to a string for display. But for giggles I once wrote my own display routine that does the integer to string conversion in 1/3 of a second. Frederick Gotham wrote: fermineutron: Now, on the Num Analyses forum/Group there is a post reaporting that that person wrot java code that computed 1000000! in about a second. That is about 10000 times faste than I would expect my code to do it. So the two possiblilities are: 1) I am doing something terribly wrong 2) The othr person is lying The burden of proof is on the Java dude. Probability suggests that he's lying, because only a very small proportion of proficient programmers waste their time on mickey-mouse hold-my-hand languages such as Java. Also, Java is slow. Either that or the algorithm's something stupid like: char const *Func(void) { return "2349507239573258471257975093275092375093275923875 09"; } Ask the Java dude if you can see his code. If he refuses, assume that he's a liar, then egg his house. -- Frederick Gotham Nov 2 '06 #7

 P: n/a fermineutron wrote: Ian Collins wrote: Before anyone reviews the code, have you profiled it? If not, why not? If you have, where were the bottlenecks? Its a factorial calculation. The bottleneck is in the big integer multiply, which itself should have a bottleneck in the platform's multiply. I profilled it, but there were no obvious bottlenecks which i would not anticipate to be there by design. here is the profiler output http://igorpetrusky.awardspace.com/Temp/RunStats.html I was thinking that maybe there is some other algorythm that is better than mine for the long int arithemetic? Probably so. Consider that you are doing nothing more than computing the straight product of the numbers using no arithmetic short cuts at all. So here's what comes off the top of my head: Ask yourself the following problem. How many factors of 2 are there in 1000000! ? Certainly every other number is even. But every 4th number has 2 factors of 2, and every 8th number has 3 factors of two in it. So the answer is: f(2) = floor(1000000/2) + floor(1000000/4) + floor(1000000/8) + ... Similarly we can figure out the number of factors of 3s, 5s, 7s, 11s, and all the primes less than 1000000, as f(3), f(5), etc. Then the result you are looking for is: 1000000! = pow(2,f(2))*pow(3,f(3))*pow(5,f(5))*... Now, the question is -- what makes us think this will be any faster? Well, the pow() function can be computed with successive squaring tricks. Squaring faster than straight multiplying because (a*q^r + b) ^ 2 = a^2*q^(2*r) + b^2 + 2*a*b*(q^r). And the resulting big number multiplies that you have to perform here can be accelerated using any number of big number multiply acceleration tricks (Karatsuba, Toom-Cook, or DFTs.). I don't know how much faster, if any, doing things this way would be. If you find a faster way in the literature, I would be interested in know it. -- Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/ Nov 2 '06 #8

 P: n/a fermineutron wrote: Ian Collins wrote: Before anyone reviews the code, have you profiled it? If not, why not? If you have, where were the bottlenecks? Its a factorial calculation. The bottleneck is in the big integer multiply, which itself should have a bottleneck in the platform's multiply. I profilled it, but there were no obvious bottlenecks which i would not anticipate to be there by design. here is the profiler output http://igorpetrusky.awardspace.com/Temp/RunStats.html I was thinking that maybe there is some other algorythm that is better than mine for the long int arithemetic? Probably so. Consider that you are doing nothing more than computing the straight product of the numbers using no arithmetic short cuts at all. So here's what comes off the top of my head: Ask yourself the following problem. How many factors of 2 are there in 1000000! ? Certainly every other number is even. But every 4th number has 2 factors of 2, and every 8th number has 3 factors of two in it. So the answer is: f(2) = floor(1000000/2) + floor(1000000/4) + floor(1000000/8) + ... Similarly we can figure out the number of factors of 3s, 5s, 7s, 11s, and all the primes less than 1000000, as f(3), f(5), etc. Then the result you are looking for is: 1000000! = pow(2,f(2))*pow(3,f(3))*pow(5,f(5))*... Now, the question is -- what makes us think this will be any faster? Well, the pow() function can be computed with successive squaring tricks. Squaring faster than straight multiplying because (a*q^r + b) ^ 2 = a^2*q^(2*r) + b^2 + 2*a*b*(q^r). And the resulting big number multiplies that you have to perform here can be accelerated using any number of big number multiply acceleration tricks (Karatsuba, Toom-Cook, or DFTs.). I don't know how much faster, if any, doing things this way would be. If you find a faster way in the literature, I would be interested in knowing it. -- Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/ Nov 2 '06 #9

 P: n/a Frederick Gotham wrote: Probability suggests that he's lying, because only a very small proportion of proficient programmers waste their time on mickey-mouse hold-my-hand languages such as Java. Also, Java is slow. Can we drop the insults, please? -- Chris "unhashedup hashed up hashing" Dollin "I'm still here and I'm holding the answers" - Karnataka, /Love and Affection/ Nov 2 '06 #10

 P: n/a we******@gmail.com writes: [...] So here's what comes off the top of my head: Ask yourself the following problem. How many factors of 2 are there in 1000000! ? Certainly every other number is even. But every 4th number has 2 factors of 2, and every 8th number has 3 factors of two in it. So the answer is: f(2) = floor(1000000/2) + floor(1000000/4) + floor(1000000/8) + ... Similarly we can figure out the number of factors of 3s, 5s, 7s, 11s, and all the primes less than 1000000, as f(3), f(5), etc. Then the result you are looking for is: 1000000! = pow(2,f(2))*pow(3,f(3))*pow(5,f(5))*... Now, the question is -- what makes us think this will be any faster? Well, the pow() function can be computed with successive squaring tricks. Squaring faster than straight multiplying because (a*q^r + b) ^ 2 = a^2*q^(2*r) + b^2 + 2*a*b*(q^r). And the resulting big number multiplies that you have to perform here can be accelerated using any number of big number multiply acceleration tricks (Karatsuba, Toom-Cook, or DFTs.). Hmm, interesting approach. But does it really speed things up? If you want to compute, say, 3**1024 with extended-precision integers, successive squaring reduces the number of multiplications, but is it faster than doing the 1023 multiplications? I'm thinking that multiplying, say, a 499-digit number by a 1-digit number might be quicker than multiplying two 250-digit numbers. I haven't really thought this through (feel free to say that that's obvious). I don't know how much faster, if any, doing things this way would be. If you find a faster way in the literature, I would be interested in know it. -- Keith Thompson (The_Other_Keith) ks***@mib.org San Diego Supercomputer Center <* We must do something. This is something. Therefore, we must do this. Nov 2 '06 #11

 P: n/a Chris Dollin said: Frederick Gotham wrote: >Probability suggests that he's lying, because only a very smallproportion of proficient programmers waste their time on mickey-mousehold-my-hand languages such as Java. Also, Java is slow. Can we drop the insults, please? That would at least be a start, but he still needs to make amends for past insults. -- Richard Heathfield "Usenet is a strange place" - dmr 29/7/1999 http://www.cpax.org.uk email: rjh at above domain (but drop the www, obviously) Nov 2 '06 #12

 P: n/a Keith Thompson said: If you want to compute, say, 3**1024 with extended-precision integers, successive squaring reduces the number of multiplications, but is it faster than doing the 1023 multiplications? I'm thinking that multiplying, say, a 499-digit number by a 1-digit number might be quicker than multiplying two 250-digit numbers. Sure, but obtaining the 499-digit number in the first place is likely to be slower. Measure, measure, measure! -- Richard Heathfield "Usenet is a strange place" - dmr 29/7/1999 http://www.cpax.org.uk email: rjh at above domain (but drop the www, obviously) Nov 2 '06 #13

 P: n/a Keith Thompson wrote: we******@gmail.com writes: [...] So here's what comes off the top of my head: Ask yourself the following problem. How many factors of 2 are there in 1000000! ? Certainly every other number is even. But every 4th number has 2 factors of 2, and every 8th number has 3 factors of two in it. So the answer is: f(2) = floor(1000000/2) + floor(1000000/4) + floor(1000000/8) + ... Similarly we can figure out the number of factors of 3s, 5s, 7s, 11s, and all the primes less than 1000000, as f(3), f(5), etc. Then the result you are looking for is: 1000000! = pow(2,f(2))*pow(3,f(3))*pow(5,f(5))*... Now, the question is -- what makes us think this will be any faster? Well, the pow() function can be computed with successive squaring tricks. Squaring faster than straight multiplying because (a*q^r + b) ^ 2 = a^2*q^(2*r) + b^2 + 2*a*b*(q^r). And the resulting big number multiplies that you have to perform here can be accelerated using any number of big number multiply acceleration tricks (Karatsuba, Toom-Cook, or DFTs.). Hmm, interesting approach. But does it really speed things up? If you want to compute, say, 3**1024 with extended-precision integers, successive squaring reduces the number of multiplications, but is it faster than doing the 1023 multiplications? I'm thinking that multiplying, say, a 499-digit number by a 1-digit number might be quicker than multiplying two 250-digit numbers. I haven't really thought this through (feel free to say that that's obvious). I'll resist the temptation. Especially, since I only went half way. :) Anyhow, the issue is not whether or not the squaring trick is saving you overall performance (it does -- this is not controvertial.) The real controversy with my approach is that its breaking numbers down into its constituent factors first, then recombining them. I.e., its faster to multiply 72 by 91 directly then to first break it down into 8 * 9 * 7 * 13. The hope, however, is that the number of primes are sufficiently less than n (in this case there are about 87K primes under 1 million) to compensate for the cost of computing the power computation. Notice that the vast majority of the primes will have a power less than, say, 6, and that in fact the most common power is 1. -- Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/ Nov 2 '06 #14

 P: n/a Frederick Gotham wrote: fermineutron: Now, on the Num Analyses forum/Group there is a post reaporting that that person wrot java code that computed 1000000! in about a second. That is about 10000 times faste than I would expect my code to do it. So the two possiblilities are: 1) I am doing something terribly wrong 2) The othr person is lying The burden of proof is on the Java dude. Probability suggests that he's lying, because only a very small proportion of proficient programmers waste their time on mickey-mouse hold-my-hand languages such as Java. Also, Java is slow. Either that or the algorithm's something stupid like: char const *Func(void) { return "2349507239573258471257975093275092375093275923875 09"; } 1 million factorial is a very very very very very very big number. I assure you that they did not solve the problem in the way you suggest unless they are rigging against a very specific benchmark. Can you imagine a library that's 100s of megs larger than necesary just to hold a table of factorials? Ask the Java dude if you can see his code. If he refuses, assume that he's a liar, then egg his house. Right, because nobody ever writes proprietary code. In the case of bignum libraries, while I don't know this first hand, the specification and architecture of the Java language itself suggests very strongly to me that their bignum performance should be equal to the best bignum libraries in existence, and faster than any portable pure C bignum library. Again, I don't know this from direct first-hand knowledge, but there is very strong possibility that you are completely wrong on this point. Java may be very slow at many things, but if its fast at anything, bignum is probably one of the prime candidates. -- Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/ Nov 2 '06 #15

 P: n/a * fermineutron

 P: n/a we******@gmail.com wrote: fermineutron wrote: >Ian Collins wrote: >>Before anyone reviews the code, have you profiled it?If not, why not? If you have, where were the bottlenecks? Its a factorial calculation. The bottleneck is in the big integer multiply, which itself should have a bottleneck in the platform's multiply. >I profilled it, but there were no obvious bottlenecks which iwould not anticipate to be there by design.here is the profiler outputhttp://igorpetrusky.awardspace.com/Temp/RunStats.htmlI was thinking that maybe there is some other algorythm that isbetter than mine for the long int arithemetic? Probably so. Consider that you are doing nothing more than computing the straight product of the numbers using no arithmetic short cuts at all. So here's what comes off the top of my head: Ask yourself the following problem. How many factors of 2 are there in 1000000! ? Certainly every other number is even. But every 4th number has 2 factors of 2, and every 8th number has 3 factors of two in it. So the answer is: f(2) = floor(1000000/2) + floor(1000000/4) + floor(1000000/8) + ... Similarly we can figure out the number of factors of 3s, 5s, 7s, 11s, and all the primes less than 1000000, as f(3), f(5), etc. Then the result you are looking for is: 1000000! = pow(2,f(2))*pow(3,f(3))*pow(5,f(5))*... Now, the question is -- what makes us think this will be any faster? Well, the pow() function can be computed with successive squaring tricks. Squaring faster than straight multiplying because (a*q^r + b) ^ 2 = a^2*q^(2*r) + b^2 + 2*a*b*(q^r). And the resulting big number multiplies that you have to perform here can be accelerated using any number of big number multiply acceleration tricks (Karatsuba, Toom-Cook, or DFTs.). I don't know how much faster, if any, doing things this way would be. If you find a faster way in the literature, I would be interested in know it. This doesn't answer the 'faster' question, but your method is basically the same as my algorithm for computing large factorials in limited size registers, which I published here over 3 years ago. /* compute factorials, extended range on a 32 bit machine this can reach fact(15) without unusual output formats. With the prime table shown overflow occurs at 101. Public domain, by C.B. Falconer. 2003-06-22 */ #include #include #include /* 2 and 5 are handled separately Placing 2 at the end attempts to preserve such factors for use with the 5 factor and exponential notation */ static unsigned char primes[] = {3,7,11,13,17,19,23,29,31,37, 41,43,47,53,57,59,61,67,71, /* add further primes here -->*/ 2,5,0}; static unsigned int primect[sizeof primes]; /* = {0} */ static double fltfact = 1.0; /* ----------------- */ static unsigned long int fact(unsigned int n, unsigned int *zeroes) { unsigned long val; unsigned int i, j, k; #define OFLOW ((ULONG_MAX / j) < val) /* This is a crude mechanism for passing back values */ for (i = 0; i < sizeof primes; i++) primect[i] = 0; for (i = 1, val = 1UL, *zeroes = 0; i <= n; i++) { fltfact *= i; /* approximation */ j = i; /* extract exponent of 10 */ while ((0 == (j % 5)) && (!(val & 1))) { j /= 5; val /= 2; (*zeroes)++; } /* Now try to avoid any overflows */ k = 0; while (primes[k] && OFLOW) { /* remove factors primes[k] */ while (0 == (val % primes[k]) && OFLOW) { val /= primes[k]; ++primect[k]; } while (0 == (j % primes[k]) && OFLOW) { j /= primes[k]; ++primect[k]; } k++; } /* Did we succeed in the avoidance */ if (OFLOW) { #if DEBUG fprintf(stderr, "Overflow at %u, %lue%u * %u\n", i, val, *zeroes, j); #endif val = 0; break; } val *= j; } return val; } /* fact */ /* ----------------- */ int main(int argc, char *argv[]) { unsigned int x, zeroes; unsigned long f; if ((2 == argc) && (1 == sscanf(argv, "%u", &x))) { if (!(f = fact(x, &zeroes))) { fputs("Overflow\n", stderr); return EXIT_FAILURE; } printf("Factorial(%u) == %lu", x, f); if (zeroes) printf("e%u", zeroes); for (x = 0; primes[x]; x++) { if (primect[x]) { printf(" * pow(%d,%d)", primes[x], primect[x]); } } putchar('\n'); printf("or approximately %.0f.\n", fltfact); return 0; } fputs("Usage: fact n\n", stderr); return EXIT_FAILURE; } /* main */ -- Chuck F (cbfalconer at maineline dot net) Available for consulting/temporary embedded and systems. Nov 2 '06 #17

 P: n/a fermineutron wrote: For a while now i have been "playing" with a little C program to compute the factorial of large numbers. You asked this question before, and I gave you the answer. You're doing one digit at a time. Doing it with 32 bits at a time will be at least 400,000,000 times faster. Why,? because with 32 bits you can store numbers up to about 4 billion, AND you don't need to do a divide and a mod each time. Nov 2 '06 #19

 P: n/a Ancient_Hacker wrote: You asked this question before, and I gave you the answer. You're doing one digit at a time. Doing it with 32 bits at a time will be at least 400,000,000 times faster. Why,? because with 32 bits you can store numbers up to about 4 billion, AND you don't need to do a divide and a mod each time. I am doing 3 digits at the time, see the base constant at the top of a file. Theoretically i could use 5 digits, or if to make the code only compatible with C99 hence using 64bit long long int i could increase base even higher. The restraint on the base is that base^2 must fit in largest int type avaiable. Nov 2 '06 #20

 P: n/a >Barry Schwarz wrote: Is there a reason you did not do the work in C directly (and avoid the time spent in malloc and copying *Ans to *C)? I do not work in C directly to allow the caller to use the same argument for one of the multipliers as the result. Doing calculations in C directly will screw up the math. Where do you free Ans? You are causing repeated memory leaks. OOPS, it did not copy over. but i did have it in the code. Nov 2 '06 #21

 P: n/a fermineutron wrote: Ancient_Hacker wrote: You asked this question before, and I gave you the answer. You're doing one digit at a time. Doing it with 32 bits at a time will be at least 400,000,000 times faster. Why,? because with 32 bits you can store numbers up to about 4 billion, AND you don't need to do a divide and a mod each time. I am doing 3 digits at the time Good! Then the speedup will only be by a factor of 400,000 times. I would approach it this way: First go to using base 2. Shouldnt be hard, just change all your 10's to 2's. Then get the code going as well as before, only using base 2. Then you can try stepping up the number of digits from 3 up to whatever will fit in your datatype. Better stop a few bits short the first time to avoid problems with the sign bit. Then notice that all your divides and mod's by 2^digits can be replaced by shifts and masks. Although a good compiler will do this for you. So by using most of the datatype width, and removing all the divs and mods, things should get super speedy. Nov 2 '06 #22

 P: n/a Ancient_Hacker wrote: First go to using base 2. Shouldnt be hard, just change all your 10's to 2's. Then get the code going as well as before, only using base 2. Then you can try stepping up the number of digits from 3 up to whatever will fit in your datatype. Better stop a few bits short the first time to avoid problems with the sign bit. Then notice that all your divides and mod's by 2^digits can be replaced by shifts and masks. Although a good compiler will do this for you. So by using most of the datatype width, and removing all the divs and mods, things should get super speedy. Hmm, i forgot about super speedy shifts, going to rework the code now. I know there are better agorithms for computing factorial then straight up multiplication, but it is the bignum arithemetic that i am mostly curious about so factoriial calculations is just a driver/benchmark. Nov 2 '06 #23

 P: n/a "fermineutron" Then notice that all your divides and mod's by 2^digits can be replacedby shifts and masks. Although a good compiler will do this for you. Hmm, i forgot about super speedy shifts, going to rework the code now. [...] I predict that he will ignore the "Although a good compiler will do this for you" statement, and just manually replace all his multiplications and divisions by shift without first checking whether it's necessary. (I could be wrong, of course.) -- Keith Thompson (The_Other_Keith) ks***@mib.org San Diego Supercomputer Center <* We must do something. This is something. Therefore, we must do this. Nov 2 '06 #25

 P: n/a "fermineutron" >Barry Schwarz wrote:Is there a reason you did not do the work in C directly (and avoid thetime spent in malloc and copying *Ans to *C)? I do not work in C directly to allow the caller to use the same argument for one of the multipliers as the result. Doing calculations in C directly will screw up the math. For those tuning in late, "C" is the name of a variable. I suggest that, like "a", "C" is a really bad variable name. -- Keith Thompson (The_Other_Keith) ks***@mib.org San Diego Supercomputer Center <* We must do something. This is something. Therefore, we must do this. Nov 2 '06 #26

 P: n/a fermineutron wrote: For a while now i have been "playing" with a little C program to compute the factorial of large numbers. What's the largest factorial you've been able to compute with your program? Nov 2 '06 #27

 P: n/a John Smith wrote: What's the largest factorial you've been able to compute with your program? with old version the larges one i did in real time is 150000 after i swiched to base 1024 i have not tried yet, still stuck on printing function, i seem to be unable to come up with algoryth to go from base "base" to base 10 in a long array Standard aproach such as NumInBase10=NumInBaseB*(B/10)^N will not work since N can be several thousand easy. Nov 2 '06 #28

 P: n/a Here is the function i have for printing to the screen an array encoded integer, but it does not work, would someone please fix it. I dont usually ask people to write code for me completely but i fried my brain trying to figure this one out today. Dont know whats wrong with me today, but i just seem to be unable to figure this one out today. here is the code =====================Start paste===================== void AEIprintf(struct AEI *p) { int fieldLength = (int)log10(base); long fieldbase=pow(10,fieldLength); char format; size_t j = p->digits; long dpf, dpf1; sprintf(format, "%%0%dd", fieldLength); dpf = 0; dpf1 = (p->data[j] + dpf)/fieldbase; dpf = (p->data[j] + dpf)%fieldbase; printf("%*d", fieldLength, dpf1); for (; j 0; j--) { dpf1 = (p->data[j] + dpf)/fieldbase; dpf = (p->data[j] + dpf)%fieldbase; printf(format, dpf1); } return; } =====================End Paste===================== Nov 2 '06 #29

 P: n/a can such a printing operation be done at all? I think there may be a way if to 1st reencode the array into a decimal base and then print, but honestly speaking i have no idea. Nov 3 '06 #30

 P: n/a "fermineutron" San Diego Supercomputer Center <* We must do something. This is something. Therefore, we must do this. Nov 3 '06 #31

 P: n/a fermineutron wrote: John Smith wrote: >What's the largest factorial you've been able to compute withyour program? with old version the larges one i did in real time is 150000 after i swiched to base 1024 i have not tried yet, still stuck on printing function, i seem to be unable to come up with algoryth to go from base "base" to base 10 in a long array Standard aproach such as NumInBase10=NumInBaseB*(B/10)^N will not work since N can be several thousand easy. I suspect this may be beyond you, but take a look at dubldabl. -- Chuck F (cbfalconer at maineline dot net) Available for consulting/temporary embedded and systems. Nov 3 '06 #32

 P: n/a CBFalconer wrote: I suspect this may be beyond you, but take a look at dubldabl. Correct me if i am wrong but, I dont think that this will work, because in my case i have array of binary coded numbers but each element in the array corresponds to a base not equal to 2^bits per field. That is, my data looks like this 2^70 2^60 2^50 2^40 2^30 2^20 2^10 2^0 ----------------------------------------------------------------------------------------------------- | 0-1023 | 0-1023 | 0-1023 | 0-1023 | 0-1023 | 0-1023 | 0-1023 | 0-1023 | ----------------------------------------------------------------------------------------------------- |4 bytes |4 bytes |4 bytes |4 bytes |4 bytes |4 bytes |4 bytes |4 bytes | ----------------------------------------------------------------------------------------------------- theoretically the range of each field can be half the bits in the field. This has to be so to allow for an integer datatype to be able to hold a maximum product of 2 fields. so if i were to setup something like: -------------------------------------------------------------------------------------------- | ASCII Memory | Binary Memory | -------------------------------------------------------------------------------------------- and shift from right 2 left using algorythm you pointed me to, shouldn't the unused 2 left bytes of each element in binary memeory screw up the ASII memory? Before you posted the post to which i am replying here, i tried to print the numbers into char array, as follows: base = 1024 ===================Start Code======================= void AEIprintf(struct AEI *p) { int fieldLength = (int)log10(base); long fieldbase=pow(10,fieldLength+1), N=0, carry=0, tbl=0; char format, *buffer; long j = p->digits; long i=0; buffer=malloc(sizeof(struct AEI)*10); if(buffer!=NULL){ while(idata[i]+p->data[i+1]*base)%fieldbase; carry=(carry+p->data[i]+p->data[i+1]*base)/fieldbase; tbl+=sprintf(buffer+tbl,"%ld",N); i++; } printf("\nprinting buffer %d\n", strlen(buffer)); printf("\n%s\n",buffer); }else{ printf("\nCant Allocate memory\n"); } return; } ===================End Code======================== but because the base has integer value in it 1024 the contribution to units decemal field of resulting number in base 10 will come from all elements of array in base 1024 i would have to find (p->data[N])*1024^N which is impossible to do since N is several thousand may be even 10s of thousands. It is also the case that even if i were to requre the field size be half of the bits of the maximun int type, hence allowing to use all bits in the field to store data, to not wory about separate sign field, i casted the array as signed, i know its wastefull, but i have not thought of the way to handle it oherwise, i guess it can be done adding a sign field into the structure. what would you seggest about printing function? The more I am thinking about the more it seems that i should redo it using unsigned ints using full field width, but i have not yet gathereg my bravory to do that. Nov 3 '06 #33

 P: n/a On Thu, 2 Nov 2006, CBFalconer wrote (in response to fermineutron): > I suspect this may be beyond you, but take a look at dubldabl. Just for kicks, I implemented double-dabble from scratch based on your description and added a description and C implementation to Wikipedia: http://en.wikipedia.org/wiki/Double_dabble Any comments on the implementation from the nitpickers? (Other than "check calloc and realloc for NULL", that is. I consciously decided to omit the error-checking in a "reference" implementation, because it isn't relevant to the algorithm itself and would only distract readers who wanted to translate the algorithm to some other language. Remember this is meant to strike a harmonious mean between "real code" and "teaching code".) Suggestions of a better test driver are particularly solicited. Also, CBFalconer, do you know where the name "double-dabble" came from originally, and how long it's been around? Curiously, some online references use the name "double-dabble" to refer to an "algorithm" that's little more than an observation about positional number systems --- see the page history of "Double dabble" for that ridiculousness. Thanks, -Arthur Nov 3 '06 #34

 P: n/a Arthur J. O'Dwyer wrote: On Thu, 2 Nov 2006, CBFalconer wrote (in response to fermineutron): I suspect this may be beyond you, but take a look at dubldabl. Just for kicks, I implemented double-dabble from scratch based on your description and added a description and C implementation to Wikipedia: http://en.wikipedia.org/wiki/Double_dabble Any comments on the implementation from the nitpickers? (Other than "check calloc and realloc for NULL", that is. I consciously decided to omit the error-checking in a "reference" implementation, because it isn't relevant to the algorithm itself and would only distract readers who wanted to translate the algorithm to some other language. Remember this is meant to strike a harmonious mean between "real code" and "teaching code".) Suggestions of a better test driver are particularly solicited. Also, CBFalconer, do you know where the name "double-dabble" came from originally, and how long it's been around? Curiously, some online references use the name "double-dabble" to refer to an "algorithm" that's little more than an observation about positional number systems --- see the page history of "Double dabble" for that ridiculousness. Thanks, -Arthur Thanks Arthur, Your code works great, i was able to use it with no problems. Without YOUR help i would not have gotten it done. Good news is that the multiplication algo that i changed earlier works correctly and your code is what made it possible for me to check that. Thanks again. Nov 3 '06 #35

 P: n/a Arthur, Is there any chance you could add to the wikipedia the C code for reverse transformation of long integers? That is, given a null terminated string containing characters 0-9 how would i transfer it to array of integers in base 2^16 or 2^32? Thanks ahead. Nov 4 '06 #36

This discussion thread is closed

Replies have been disabled for this discussion. 