446,421 Members | 1,128 Online
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,421 IT Pros & Developers. It's quick & easy.

# Hidden read of indeterminate memory

 P: n/a [Annex J.2 Undefined behavior] - The value of the object allocated by the malloc function is used (7.20.3.3). - The value of any bytes in a new object allocated by the realloc function beyond the size of the old object are used (7.20.3.4). Something like this (include and checkings omitted): p = malloc(sizeof(*p) * 5); p[0] = 1; p[1] = 1; p[2] = 1; a = p[4]; is obviously undefined, because p[4] was not initialised. But what happens for the "hidden" read access when realloc'ing the memory: p = malloc(sizeof(*p) * 5); p[0] = 1; p[1] = 1; p[2] = 1; p = realloc(p, sizeof(*p) * 10); where p[3] and p[4] are moved to the new memory location, when it is a new location, of course? The same applies for memmove and memcpy. My assumptions are: - The values of the bytes are not "used" in the sense of the above definition for undefined behavior. - The library functions do their work on arrays of bytes, which is always well defined for any object type even for indeterminate values. OTOH, there is no exception in the above definition: "value of any bytes [...] are used". - Thinking of it, the same would apply for the padding inside structs that could only be initialised by accessing the struct as an array of bytes. OTOH, there could be a deeper reason for the existence of calloc()... What do you mean? Holger Nov 14 '05 #1
19 Replies

 P: n/a In article Holger Hasselbach writes: [some snippage and vertical compression]Something like this (include and checkings omitted): p = malloc(sizeof(*p) * 5); p[0] = 1; p[1] = 1; p[2] = 1; a = p[4];is obviously undefined, because p[4] was not initialised. But whathappens for the "hidden" read access when realloc'ing the memory: p = malloc(sizeof(*p) * 5); p[0] = 1; p[1] = 1; p[2] = 1; p = realloc(p, sizeof(*p) * 10);where p[3] and p[4] are moved to the new memory location, when it is anew location, of course? The same applies for memmove and memcpy. The C standard is, in a way, "of two minds" about bit-patterns in uninitialized memory, including memory from malloc(). (Uninitialized automatic objects have the same problem, along with -- as you mention -- padding inside initialized struct objects.) Reading them produces undefined behavior, yet memcpy() must necessarily be able to copy them. The solution to this dilemma lies in the properties of "unsigned char". There are no trap representations in "unsigned char", and the bytes that "unsigned char"s are -- even if this is more than 8 bits long -- cover all the bits in any sub-object. This implies, even if the standard does not say outright (and I have not checked to see whether it does), that even an uninitialized object's undefined-behavior values can be inspected in a defined-behavior manner simply by breaking the object down into its component "unsigned char" bytes. The bit patterns in each such byte never produce undefined behavior. Then, as long as mempcy() and friends -- including realloc() -- deal with these unintialized locations "as if" by using unsigned char to copy the bit patterns, no undefined behavior occurs. Of course, nothing can be said about the objects' undefined values staying the same across such copies. In this particular sense they are sort of Schroedinger's Cats of values: the only way to find out if they are unchanged across a memcpy() is to inspect them, and inspecting them gives undefined behavior, so that you no longer know if an uninspected-copy might have changed them after all. (And here we thought there was no quantum physics involved in C programming...! :-) ) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: forget about it http://web.torek.net/torek/index.html Reading email is like searching for food in the garbage, thanks to spammers. Nov 14 '05 #2

 P: n/a "Holger Hasselbach" wrote in message The library functions do their work on arrays of bytes, which is always well defined for any object type even for indeterminate values. OTOH, there is no exception in the above definition: "value of any bytes [...] are used". This is the answer. If you are writing memcpy() in C, then it is necessary to cast to an array of unsigned char to avoid trap representations. Of course, normally memcpy() won't be implemented in C, and it will be possible to move memory in greater quanties than one byte at a time without problems. Nov 14 '05 #3

 P: n/a Chris Torek wrote: In article Holger Hasselbach writes: [some snippage and vertical compression]Something like this (include and checkings omitted): p = malloc(sizeof(*p) * 5); p[0] = 1; p[1] = 1; p[2] = 1; a = p[4];is obviously undefined, because p[4] was not initialised. But whathappens for the "hidden" read access when realloc'ing the memory: p = malloc(sizeof(*p) * 5); p[0] = 1; p[1] = 1; p[2] = 1; p = realloc(p, sizeof(*p) * 10);where p[3] and p[4] are moved to the new memory location, when it is anew location, of course? The same applies for memmove and memcpy. The C standard is, in a way, "of two minds" about bit-patterns in uninitialized memory, including memory from malloc(). (Uninitialized automatic objects have the same problem, along with -- as you mention -- padding inside initialized struct objects.) Reading them produces undefined behavior, yet memcpy() must necessarily be able to copy them. The solution to this dilemma lies in the properties of "unsigned char". There are no trap representations in "unsigned char", and the bytes that "unsigned char"s are -- even if this is more than 8 bits long -- cover all the bits in any sub-object. This implies, even if the standard does not say outright (and I have not checked to see whether it does), that even an uninitialized object's undefined-behavior values can be inspected in a defined-behavior manner simply by breaking the object down into its component "unsigned char" bytes. The bit patterns in each such byte never produce undefined behavior. I don't think traps have anything to with it. If the standard committee doesn't care about what happens when you try to inspect uninitialized objects, then that's all that it takes to invoke undefined behavior from trying to inspect an uninitialized object. A mechanism involving other more familiar types of undefined behavior, is not required. N869 6.7.8 Initialization [#10] If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. new.c is undefined behavior, right ? /* BEGIN new.c */ #include int main(void) { unsigned char byte; printf("%u\n", (unsigned)byte); return 0; } /* END new.c */ -- pete Nov 14 '05 #4

 P: n/a On Wed, 17 Dec 2003 21:02:25 GMT, pete wrote in comp.lang.c: I don't think traps have anything to with it. If the standard committee doesn't care about what happens when you try to inspect uninitialized objects, then that's all that it takes to invoke undefined behavior from trying to inspect an uninitialized object. A mechanism involving other more familiar types of undefined behavior, is not required. N869 6.7.8 Initialization [#10] If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. new.c is undefined behavior, right ? /* BEGIN new.c */ #include int main(void) { unsigned char byte; printf("%u\n", (unsigned)byte); return 0; } /* END new.c */ No, it is not. Any object value, even one which is not a valid value representation for the type of object it inhabits, may be inspected freely as individual unsigned chars. Paragraph 4 of 6.2.6.1 (C99) makes this clear, before paragraph 5 introduces the concept of trap representations. And even there in paragraph 5, an exemption is specifically made for lvalues of character type. -- Jack Klein Home: http://JK-Technology.Com FAQs for comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html comp.lang.c++ http://www.parashift.com/c++-faq-lite/ alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq Nov 14 '05 #5

 P: n/a On Thu, 18 Dec 2003 11:18:04 GMT, pete wrote:I understand you to mean that the useof an indeterminate unsigned char value,is unspecified behavior.Is that right ? Please trim your posts, and no, you're not right. Using the value of any unsigned char is always defined behaviour. -- #include _ Kevin D Quitt USA 91387-4454 96.37% of all statistics are made up Per the FCA, this address may not be added to any commercial mail list Nov 14 '05 #8

 P: n/a pete wrote: Chris Torek wrote:Chris Torek wrote:> The C standard is, in a way, "of two minds" about bit-patterns in> uninitialized memory, including memory from malloc(). ... In article <3F***********@mindspring.com> pete writes:I don't think traps have anything to with it. Sure they do. I understand you to mean that the use of an indeterminate unsigned char value, is unspecified behavior. Is that right ? Also, are you saying that there is no such thing as an indeterminate unsigned char value ? -- pete Nov 14 '05 #10

 P: n/a Kevin D Quitt wrote: On Thu, 18 Dec 2003 11:18:04 GMT, pete wrote:I understand you to mean that the use of an indeterminate unsignedchar value, is unspecified behavior. Is that right ? Please trim your posts, and no, you're not right. Using the value of any unsigned char is always defined behaviour. I think that "unspecified" is right. For instance, consider reading from a union member of type unsigned char after writing to a different member (of type float, say). The bit pattern of the object must yield a valid value when interpreted as unsigned char, but /which/ value depends on the representation that the implementation uses for float. Jeremy. Nov 14 '05 #11

 P: n/a In article <3F***********@mindspring.com> pete writes:I understand you to mean that the useof an indeterminate unsigned char value,is unspecified behavior.Is that right ? This one I leave to the Tea-Leaf Readers in comp.std.c. :-) It is not a particularly useful thing to do; without some particularly good reason to try to tease "produces unspecified behavior" apart from "obtains (and thus in a sense determines) an indeterminate value", I am simply not going to bother -- particularly since I am still using a C99 draft, and the precise wording may have changed. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: forget about it http://web.torek.net/torek/index.html Reading email is like searching for food in the garbage, thanks to spammers. Nov 14 '05 #12

 P: n/a Chris Torek wrote: In article <3F***********@mindspring.com> pete writes:I understand you to mean that the useof an indeterminate unsigned char value,is unspecified behavior.Is that right ? This one I leave to the Tea-Leaf Readers in comp.std.c. :-) It is not a particularly useful thing to do; without some particularly good reason to try to tease "produces unspecified behavior" apart from "obtains (and thus in a sense determines) an indeterminate value", I am simply not going to bother -- particularly since I am still using a C99 draft, and the precise wording may have changed. According to: 1 the definition of undefined behavior 2 the fact that uninitialized objects have indeterminate value I don't think that a program is required to reserve storage for an uninitialized object, unless the address is of the object is taken. -- pete Nov 14 '05 #13

 P: n/a pete writes: I understand you to mean that the use of an indeterminate unsigned char value, is unspecified behavior. Is that right ? If the use of an indeterminate unsigned char value is undefined, instead of unspecified, then memcpy() of a structure type is generally undefined. Because this is an absurdity, I conclude that use of indeterminate unsigned char values is not undefined. -- int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz.\ \n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\ );while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\ );}return 0;} Nov 14 '05 #14

 P: n/a Ben Pfaff wrote: pete writes: I understand you to mean that the use of an indeterminate unsigned char value, is unspecified behavior. Is that right ? If the use of an indeterminate unsigned char value is undefined, instead of unspecified, then memcpy() of a structure type is generally undefined. Because this is an absurdity, I conclude that use of indeterminate unsigned char values is not undefined. The body of the definition of memcpy need only obey the rules of C, if it's written in C, and it need not be written in C. If it turns out that memcpy can't be portably written in C, it doesn't matter much. It doesn't change any C programs. Dan Pop has pointed out that the string functions in the standard library can't be written in portable C. But that only makes a difference if you're trying to write a library string function in portable C. I don't know how important that is; I don't know anybody who is trying to do that. -- pete Nov 14 '05 #16

 P: n/a pete writes: Ben Pfaff wrote: pete writes: I understand you to mean that the use of an indeterminate unsigned char value, is unspecified behavior. Is that right ? If the use of an indeterminate unsigned char value is undefined, instead of unspecified, then memcpy() of a structure type is generally undefined. Because this is an absurdity, I conclude that use of indeterminate unsigned char values is not undefined. The body of the definition of memcpy need only obey the rules of C, if it's written in C, and it need not be written in C. If it turns out that memcpy can't be portably written in C, it doesn't matter much. It doesn't change any C programs. Dan Pop has pointed out that the string functions in the standard library can't be written in portable C. Really. Why not? -- char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[] ={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa6 7f6aaa,0xaa9aa9f6,0x1f6},*p= b,x,i=24;for(;p+=!*p;*p/=4)switch(x=*p&3)case 0:{return 0;for(p--;i--;i--)case 2:{i++;if(1)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}} Nov 14 '05 #17

 P: n/a Ben Pfaff wrote: pete writes: Ben Pfaff wrote: pete writes: > I understand you to mean that the use > of an indeterminate unsigned char value, > is unspecified behavior. > Is that right ? If the use of an indeterminate unsigned char value is undefined, instead of unspecified, then memcpy() of a structure type is generally undefined. Because this is an absurdity, I conclude that use of indeterminate unsigned char values is not undefined. The body of the definition of memcpy need only obey the rules of C, if it's written in C, and it need not be written in C. If it turns out that memcpy can't be portably written in C, it doesn't matter much. It doesn't change any C programs. Dan Pop has pointed out that the string functions in the standard library can't be written in portable C. Really. Why not? Because, according to the definition of 'string', it's possible for a string to span several unrelated arrays, in which case, walking a pointer along such a string, would overrun an array boundary. At the time, Dan Pop was specifically refering to string functions which dealt with strings. Those functions take pointers to strings as arguments, instead of pointers to arrays. -- pete Nov 14 '05 #18

 P: n/a Jun Woong wrote: "pete" wrote in message news:3F***********@mindspring.com... Chris Torek wrote: In article <3F***********@mindspring.com> pete writes: >I understand you to mean that the use >of an indeterminate unsigned char value, >is unspecified behavior. >Is that right ? The undefined behavior caused by the use of an indeterminate value is due to possible trap representations. But the unsigned char is guaranteed to have no trap representation. This one I leave to the Tea-Leaf Readers in comp.std.c. :-) I understand that there are times when the bytes of objects with indeterminate values, can be accessed as unsigned char. But If an object declared as unsigned char, is not initialized and it's address is not taken in the program, then is the program required to reserve storage for the object ? The program is always required to reserve storage to hold an object USED in it. Are you talking about optimization? Does the standard say that the call to printf in indeterminate.c, has an argument within the range of unsigned char, Yes. But your code is not a s.c. program whose output depends on an unspecified value. or is the behavior undefined ? There is no wording to say that it's undefined in C99; C90 says that all uses of an indeterminate value result in undefined behavior, which differs from C99. /* BEGIN indeterminate.c */ #include int main(void) { unsigned char byte; printf("%u\n", (unsigned)byte); return 0; } /* END indeterminate.c */ Thank you. -- pete Nov 14 '05 #19

 P: n/a In article Holger Hasselbach writes:As far as I understand it now, thanks to the excellent explanationsfrom Chris, the C system of object storage can be expressed in threelayers. Layer 3: Interpretation - Used values Layer 2: Representation - Pure binary, unsigned char Layer 1: Storage - Hardware This is certainly a workable (and nice, and concise) model that fits with what the Standard requires. Whether it is the only possible model, i.e., completely isomorphic to what the C99 standard says, I am not sure -- but it is much more *usable* than the C99 wording. :-) [rest snipped] [Incidentally, the usual contraction for a ternary digit is "trit". Theoretically, the "most efficient" hardware representation might be to use base e, and 3 is closer to e (3-2.71828... < 0.3) than it is to 2 (2.71828...-2 > 0.7), so base-3 (trinary or ternary) computers might be "more efficient" than base-2 (binary). It is not clear whether C would be a good language for ternary computing, though.] -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: forget about it http://web.torek.net/torek/index.html Reading email is like searching for food in the garbage, thanks to spammers. Nov 14 '05 #20

### This discussion thread is closed

Replies have been disabled for this discussion.