458,196 Members | 1,729 Online Need help? Post your question and get tips & solutions from a community of 458,196 IT Pros & Developers. It's quick & easy.

# perhaps a stack problem? Long calculations - strange error?

 P: n/a Hi, I've encountered a really, *really*, REALLY strange error :-) I have a for-loop and after 8 runs I get strange results...... I mean: A really strange result.... I'm calculating temperatures. T[j][i] = 20 degrees at all times.... The 2D T-array looks like this: | 2: 3: ..... i-direction goes to the right -> --+---------- 2: | 20 20 3: | 20 20 j-direction goes down (index 1 and 4 = 0, but they don't matter) The code *that works* looks like this: - - - - - - - - - - - - { /* copy 2D (old) temperatures to 1D-array */ count = 0; solved_temp = 0; /* number_of_interior_cells = 4 */ for(j=2, i=1, no=1; no<=(int) number_of_interior_cells; no++) { /* update columns (x-value) */ i++; solved_temp[no] = T[j][i]; solved_temp[no] += ( hx[j][i] * ( T[j][i-1]-T[j][i] ) ) / h0[j][i]; solved_temp[no] += ( hx[j][i+1] * ( T[j][i+1]-T[j][i] ) ) / h0[j][i]; solved_temp[no] += ( hy[j][i] * ( T[j-1][i]-T[j][i] ) ) / h0[j][i]; solved_temp[no] += ( hy[j+1][i] * ( T[j+1][i]-T[j][i] ) ) / h0[j][i]; /* below you see the code that *doesn't work* - isn't it the same??? */ //solved_temp[no] = T[j][i] + ( // hx[j][i] * ( T[j][i-1]-T[j][i] ) + hx[j][i+1] * ( T[j][i+1]-T[j][i] ) + // hy[j][i] * ( T[j-1][i]-T[j][i] ) + hy[j+1][i] * ( T[j+1][i]-T[j][i] ) // ) / h0[j][i]; if(i == nx-1) /* hits east boundary? */ { j++; /* next row */ i=1; /* reset x-coordinate */ } } } The expected result is that solved_temp = 20, solved_temp = 20, solved_temp = 20, solved_temp = 20 at all times. If I swap the above outcommented/commented sections with each other, I get what I expect the first 7 times. On the 8th time I get that the next result for solved_temp = -1.#IND000000000000, but the other results are okay.... What exactly is that -1.#IND000000000000 means? And isn't it exactly the same code??? I'm pretty confused... In case you're wondering, hx or hy is either 0 or 1 as you can see below and all the temperature differences is 0, so it shouldn't matter anyway... h0 = 2 but also that shouldn't change anything since everything gives 0 so I have something which is 0 divided by 2 and that ofcourse should give 0........ It all ends up in (20 + 0) = 20... Printing hx (step = 8): i=1: i=2: i=3: i=4: j=1: -> 0.00000 -> 0.00000 -> 0.00000 -> 0.00000 j=2: -> 0.00000 -> 0.00000 -> 1.00000 -> 0.00000 j=3: -> 0.00000 -> 0.00000 -> 1.00000 -> 0.00000 j=4: -> 0.00000 -> 0.00000 -> 0.00000 -> 0.00000 Printing hy (step = 8): i=1: i=2: i=3: i=4: j=1: -> 0.00000 -> 0.00000 -> 0.00000 -> 0.00000 j=2: -> 0.00000 -> 0.00000 -> 0.00000 -> 0.00000 j=3: -> 0.00000 -> 1.00000 -> 1.00000 -> 0.00000 j=4: -> 0.00000 -> 0.00000 -> 0.00000 -> 0.00000 (only 5 decimal digits shown, but it shouldn't change anything in the big picture I guess).... Why does it work when I calculate solved_temp[no] in 5 individual steps instead of in one long calculation? Even Microsoft Visual Studio 2005 debugger agrees with me, but then I step over the calculation line and find this stupid/strange result......! I hope somebody can explain this really weird behaviour... I hope there's a "natural" explanation like "stack overflow" or whatever....? I was a bit afraid of posting this if I didn't check my code properly but now I've looked at it for about 1-2 hours and when my debugger even agrees with me and I then choose F10 "step over" and get another result, then I think I need professional help :-) It works now (with 5 individual calculations), but I just want to be sure that this error doesn't come back later.... Best regards / Med venlig hilsen Martin Jørgensen -- --------------------------------------------------------------------------- Home of Martin Jørgensen - http://www.martinjoergensen.dk May 7 '06 #1
11 Replies

 P: n/a Martin Joergensen opined: Hi, I've encountered a really, *really*, REALLY strange error :-) I have a for-loop and after 8 runs I get strange results...... I mean: A really strange result.... I'm calculating temperatures. T[j][i] = 20 degrees at all times.... The 2D T-array looks like this: | 2: 3: ..... i-direction goes to the right -> --+---------- 2: | 20 20 3: | 20 20 j-direction goes down (index 1 and 4 = 0, but they don't matter) The code *that works* looks like this: - - - - - - - - - - - - { /* copy 2D (old) temperatures to 1D-array */ count = 0; solved_temp = 0; /* number_of_interior_cells = 4 */ for(j=2, i=1, no=1; no<=(int) number_of_interior_cells; no++) { /* update columns (x-value) */ i++; solved_temp[no] = T[j][i]; solved_temp[no] += ( hx[j][i] * ( T[j][i-1]-T[j][i] ) ) / h0[j][i]; solved_temp[no] += ( hx[j][i+1] * ( T[j][i+1]-T[j][i] ) ) / h0[j][i]; solved_temp[no] += ( hy[j][i] * ( T[j-1][i]-T[j][i] ) ) / h0[j][i]; solved_temp[no] += ( hy[j+1][i] * ( T[j+1][i]-T[j][i] ) ) / h0[j][i]; /* below you see the code that *doesn't work* - isn't it the same??? */ //solved_temp[no] = T[j][i] + ( // hx[j][i] * ( T[j][i-1]-T[j][i] ) + hx[j][i+1] * ( T[j][i+1]-T[j][i] ) + // hy[j][i] * ( T[j-1][i]-T[j][i] ) + hy[j+1][i] * ( T[j+1][i]-T[j][i] ) // ) / h0[j][i]; if(i == nx-1) /* hits east boundary? */ { j++; /* next row */ i=1; /* reset x-coordinate */ } } } The expected result is that solved_temp = 20, solved_temp = 20, solved_temp = 20, solved_temp = 20 at all times. If I swap the above outcommented/commented sections with each other, I get what I expect the first 7 times. On the 8th time I get that the next result for solved_temp = -1.#IND000000000000, but the other results are okay.... What exactly is that -1.#IND000000000000 means? My guess is that your commented out code overflows. The above looks like some IEEE error/overflow/NaN thingy. It's still too much weekend 'round here for a closer look. Also, you should have really posted a compilable example that shows the problem. And isn't it exactly the same code??? I'm pretty confused... Mathematically, it probably is. Computers, unfortunately don't have infinite precision. Therein may lie your undoing. In case you're wondering, hx or hy is either 0 or 1 as you can see below and all the temperature differences is 0, so it shouldn't matter anyway... h0 = 2 but also that shouldn't change anything since everything gives 0 so I have something which is 0 divided by 2 and that ofcourse should give 0........ It all ends up in (20 + 0) = 20... Printing hx (step = 8): i=1: i=2: i=3: i=4: j=1: -> 0.00000 -> 0.00000 -> 0.00000 -> 0.00000 j=2: -> 0.00000 -> 0.00000 -> 1.00000 -> 0.00000 j=3: -> 0.00000 -> 0.00000 -> 1.00000 -> 0.00000 j=4: -> 0.00000 -> 0.00000 -> 0.00000 -> 0.00000 Printing hy (step = 8): i=1: i=2: i=3: i=4: j=1: -> 0.00000 -> 0.00000 -> 0.00000 -> 0.00000 j=2: -> 0.00000 -> 0.00000 -> 0.00000 -> 0.00000 j=3: -> 0.00000 -> 1.00000 -> 1.00000 -> 0.00000 j=4: -> 0.00000 -> 0.00000 -> 0.00000 -> 0.00000 (only 5 decimal digits shown, but it shouldn't change anything in the big picture I guess).... Why does it work when I calculate solved_temp[no] in 5 individual steps instead of in one long calculation? Even Microsoft Visual Studio 2005 debugger agrees with me, but then I step over the calculation line and find this stupid/strange result......! I hope somebody can explain this really weird behaviour... I hope there's a "natural" explanation like "stack overflow" or whatever....? I was a bit afraid of posting this if I didn't check my code properly but now I've looked at it for about 1-2 hours and when my debugger even agrees with me and I then choose F10 "step over" and get another result, then I think I need professional help :-) It works now (with 5 individual calculations), but I just want to be sure that this error doesn't come back later.... I'd also recommend you find a good text on the pitfalls of using finite precision floating point arithmetic. One fairly good link has been posted here a couple of months back. The only thing I remember about it was that it was on the Sun site somewhere. -- Why use Windows, since there is a door? (By fa****@galileo.rhein-neckar.de, Andre Fachat) May 7 '06 #2

 P: n/a Vladimir Oka wrote: I'd also recommend you find a good text on the pitfalls of using finite precision floating point arithmetic. One fairly good link has been posted here a couple of months back. The only thing I remember about it was that it was on the Sun site somewhere. You are probably referring to the paper "What Every Computer Scientist Should Know About Floating-Point Arithmetic" by David Goldberg which is available on Sun's website at: among many other places. Robert Gamble May 7 '06 #3

 P: n/a Vladimir Oka wrote: Martin Joergensen opined: -snip- What exactly is that -1.#IND000000000000 means? My guess is that your commented out code overflows. The above looks like some IEEE error/overflow/NaN thingy. It's still too much weekend 'round here for a closer look. Also, you should have really posted a compilable example that shows the problem. You shouldn't try to code anything yourself... I just tried it but my code is pretty big. I mean: hx, hy and h0 is calculated in each step and dynamically allocated so the smallest possible code would probably still be some hundred lines. I just made a 100 line program that didn't calculate hx and hy and here there was no error.... Hmm. So I guess I have an error somewhere in my code... Still I can't explain the difference between adding 5 numbers and then get the right result vs. doing everything at once... So that's basically why I asked. And when Microsoft Visual studio 2005 debugger agrees with me, then I'm really confused because I couldn't imagine that it would lie to me about anything. -snip- It works now (with 5 individual calculations), but I just want tobe sure that this error doesn't come back later.... I'd also recommend you find a good text on the pitfalls of using finite precision floating point arithmetic. One fairly good link has been posted here a couple of months back. The only thing I remember about it was that it was on the Sun site somewhere. I looked at it... It seems like it takes pretty long time to read it all. Perhaps I'll look at it later. Thanks for the answer. Best regards / Med venlig hilsen Martin Jørgensen -- --------------------------------------------------------------------------- Home of Martin Jørgensen - http://www.martinjoergensen.dk May 7 '06 #4

 P: n/a Robert Gamble opined: Vladimir Oka wrote: I'd also recommend you find a good text on the pitfalls of using finite precision floating point arithmetic. One fairly good link has been posted here a couple of months back. The only thing I remember about it was that it was on the Sun site somewhere. You are probably referring to the paper "What Every Computer Scientist Should Know About Floating-Point Arithmetic" by David Goldberg which is available on Sun's website at: among many other places. Yup, that's the one! Thanks for the link (although I do remember printing it out at the time, and leaving it on the "to read" heap in the office ;-) ). -- Those who make peaceful revolution impossible will make violent revolution inevitable. -- John F. Kennedy May 7 '06 #5

 P: n/a Martin Joergensen wrote: for(j=2, i=1, no=1; no<=(int) number_of_interior_cells; no++) { /* update columns (x-value) */ i++; solved_temp[no] = T[j][i]; Check that T[] and solved_temp[] are allocated properly, remembering that arrays in C are zero-based, ie. you need: int solved_temp[number_of_interior_cells+1]; or int *solved_temp = malloc( sizeof *solved_temp * (number_of_interior_cells+1) ); and similarly for T. Also, your two divisions are different; in general (a+b)/2 is not the same as (a/2) + (b/2). For example if a,b are ints and a=3, b=5, then (a+b)/2 = 4 but a/2 + b/2 = 3. If they are floating point then there may still be minor errors due to rounding errors. And in both cases, the (a+b) might overflow. If you can provide a complete, compilable program, and the input data yo uare using, then people here will be able to reproduce and explain your results. May 8 '06 #7

 P: n/a Vladimir Oka wrote: Martin JÃ¸rgensen opined: -snip- I think once you read it, you may find it easier to imagine "the difference between adding 5 numbers and then get the right result vs. doing everything at once..." And I do mean it in the nicest possible way. I'm still prepared to be wrong, but I wouldn't be surprised that your problem lies in one of the intermediate terms in "everything at once" code overflowing, or misbehaving in a different way. That's also what I think.... Ok, thanks for your time. Have anyone in here experienced anything like that, when there's a difference between adding things together in multiple separate calculations versus in a single (longer) calculation? The worst thing is that by splitting the calculation up in 5 pieces, I thought I could just step into each of them and find the error easily... Since the "error" disappeared now, I can't find "the error".... And this is what's bugging me.... If I could just find the exact place and say: Ha! Got you... But I can't... Best regards / Med venlig hilsen Martin JÃ¸rgensen -- --------------------------------------------------------------------------- Home of Martin JÃ¸rgensen - http://www.martinjoergensen.dk May 8 '06 #8

 P: n/a Martin JÃ¸rgensen opined: Vladimir Oka wrote: Martin JÃ¸rgensen opined: -snip- I think once you read it, you may find it easier to imagine "the difference between adding 5 numbers and then get the right result vs. doing everything at once..." And I do mean it in the nicest possible way. I'm still prepared to be wrong, but I wouldn't be surprised that your problem lies in one of the intermediate terms in "everything at once" code overflowing, or misbehaving in a different way. That's also what I think.... Ok, thanks for your time. Not at all. Have anyone in here experienced anything like that, when there's a difference between adding things together in multiple separate calculations versus in a single (longer) calculation? Yes, in a different life, and a different language (FORTRAN). The worst thing is that by splitting the calculation up in 5 pieces, I thought I could just step into each of them and find the error easily... Since the "error" disappeared now, I can't find "the error".... And this is what's bugging me.... If I could just find the exact place and say: Ha! Got you... But I can't... Sounds to me, you *did* find your error. It's just that you still can't *explain* it properly. Did you try repeating the calculations by hand? -- You're dead, Jim. -- McCoy, "Amok Time", stardate 3372.7 May 8 '06 #9

 P: n/a Vladimir Oka wrote: Martin JÃ¸rgensen opined: -snip- Have anyone in here experienced anything like that, when there's adifference between adding things together in multiple separatecalculations versus in a single (longer) calculation? Yes, in a different life, and a different language (FORTRAN).The worst thing is that by splitting the calculation up in 5 pieces,I thought I could just step into each of them and find the erroreasily...Since the "error" disappeared now, I can't find "the error".... Andthis is what's bugging me.... If I could just find the exact placeand say: Ha! Got you... But I can't... Sounds to me, you *did* find your error. It's just that you still can't *explain* it properly. Did you try repeating the calculations by hand? By hand? It should give 20 degrees and not -1.#IND000000000000. The reason is that all other terms than the 20 gives 0... So there's nothing to "do by hand" really... Visual studio debugger is right. But once I step into the long expression it gives -1.#IND000000000000 instead of 20. When I wrote "the error disappeared now" I meant that instead of using the long expression it works by adding 5 individual smaller sums (swapping commented vs. uncommented code)... So I wouldn't exactly say I found the error - perhaps it comes back another place... But I suspect there's an error somewhere, perhaps in MS visual studio 2005... And perhaps also in my program??? It's totally insane... :-( Everything works like it should under linux.... The same code.... I'm not so experienced that I can understand the output from Valgrind but it seems to me like there's no big errors / memory leaks or the like... Now I'm considering continuing doing my project under linux (it's my bachelor project)... Today I got some help from a more experienced programmer than me so we tried www.ibm.com/software/awdtools/purify and updated it with the latest patch - this gives really strange warnings (lots of printf-errors??? and some "memory leaks") and it was clear that it doesn't work really well with Visual Studio 2005 yet - according to IBM. Best regards / Med venlig hilsen Martin JÃ¸rgensen -- --------------------------------------------------------------------------- Home of Martin JÃ¸rgensen - http://www.martinjoergensen.dk May 8 '06 #10

 P: n/a Old Wolf wrote: Martin Joergensen wrote: for(j=2, i=1, no=1; no<=(int) number_of_interior_cells; no++) { /* update columns (x-value) */ i++; solved_temp[no] = T[j][i]; Check that T[] and solved_temp[] are allocated properly, remembering that arrays in C are zero-based, ie. you need: int solved_temp[number_of_interior_cells+1]; or int *solved_temp = malloc( sizeof *solved_temp * (number_of_interior_cells+1) ); and similarly for T. Yep. But they're both type double (not important though). Also, your two divisions are different; in general (a+b)/2 is not the same as (a/2) + (b/2). For example if a,b are ints and a=3, b=5, then (a+b)/2 = 4 but a/2 + b/2 = 3. Yep, I know. But here's it all "double"-types. If they are floating point then there may still be minor errors due to rounding errors. But not anything that could go from 20 to -1.#IND000000000000, I guess. And in both cases, the (a+b) might overflow. These numbers are far to small for overflow to occur. If you can provide a complete, compilable program, and the input data yo uare using, then people here will be able to reproduce and explain your results. I can't make that. I tried to make a 100 line program (no dynamic allocation and fixed variables). Actually I think it's something that doesn't have to do with the C language itself... Perhaps compiler options/switches or linker options/switches or something... I'll try to see if I can isolate the error or else I'll switch to Linux where it works... Best regards / Med venlig hilsen Martin Jørgensen -- --------------------------------------------------------------------------- Home of Martin Jørgensen - http://www.martinjoergensen.dk May 8 '06 #11

 P: n/a Martin Joergensen wrote: Hi, I've encountered a really, *really*, REALLY strange error :-) I have a for-loop and after 8 runs I get strange results...... I mean: A really strange result.... I'm calculating temperatures. T[j][i] = 20 degrees at all times.... The 2D T-array looks like this: | 2: 3: ..... i-direction goes to the right -> --+---------- 2: | 20 20 3: | 20 20 j-direction goes down (index 1 and 4 = 0, but they don't matter) The code *that works* looks like this: - - - - - - - - - - - - { /* copy 2D (old) temperatures to 1D-array */ count = 0; solved_temp = 0; /* number_of_interior_cells = 4 */ for(j=2, i=1, no=1; no<=(int) number_of_interior_cells; no++) { /* update columns (x-value) */ i++; solved_temp[no] = T[j][i]; solved_temp[no] += ( hx[j][i] * ( T[j][i-1]-T[j][i] ) ) / h0[j][i]; solved_temp[no] += ( hx[j][i+1] * ( T[j][i+1]-T[j][i] ) ) / h0[j][i]; solved_temp[no] += ( hy[j][i] * ( T[j-1][i]-T[j][i] ) ) / h0[j][i]; solved_temp[no] += ( hy[j+1][i] * ( T[j+1][i]-T[j][i] ) ) / h0[j][i]; /* below you see the code that *doesn't work* - isn't it the same??? */ //solved_temp[no] = T[j][i] + ( // hx[j][i] * ( T[j][i-1]-T[j][i] ) + hx[j][i+1] * ( T[j][i+1]-T[j][i] ) + // hy[j][i] * ( T[j-1][i]-T[j][i] ) + hy[j+1][i] * ( T[j+1][i]-T[j][i] ) // ) / h0[j][i]; if(i == nx-1) /* hits east boundary? */ { j++; /* next row */ i=1; /* reset x-coordinate */ } } } The expected result is that solved_temp = 20, solved_temp = 20, solved_temp = 20, solved_temp = 20 at all times. If I swap the above outcommented/commented sections with each other, I get what I expect the first 7 times. On the 8th time I get that the next result for solved_temp = -1.#IND000000000000, but the other results are okay.... -snip- Update: The program works in "RELEASE" mode but not under "DEBUG" mode under the *exact* same conditions with the same source code, with the same input file... And again: it also works under linux (gcc). Does that ring a bell anywhere? I hope it does somewhere, because I'm still lost........ Compiler is Visual studio 2005... Since I guess it's not scrictly standard C anymore I've made xfut and fut to: comp.os.ms-windows.programmer.win32 but I'll also continue to read in comp.lang.c if anyone posts answers (here) there... Best regards / Med venlig hilsen Martin Jørgensen -- --------------------------------------------------------------------------- Home of Martin Jørgensen - http://www.martinjoergensen.dk May 10 '06 #12

### This discussion thread is closed

Replies have been disabled for this discussion. 