By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,199 Members | 1,742 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,199 IT Pros & Developers. It's quick & easy.

casting X* to char*

P: n/a
A colleague asked me something along the lines of the following today.

For some type X he has:

X* px = new X[sz];

Then he wants to convert px to a char* (I'm guessing for the purpose of
serializing the object array).

I can think of three ways to do this:

char* pc = (char*) px;
char* pc = static_cast<char*> (static_cast<void*> (px));
char* pc = reinterpret_cast<char*> (px);

From my reading of the standard it seems that the results of these
casts are all unspecified.

Is this true?

In practice, would these casts ever do anything besides the obvious?

Is there any portable way to access the bytes of an object's representation?

Thanks,
Mark
May 30 '06 #1
Share this Question
Share on Google+
33 Replies


P: n/a
Mark P wrote:
A colleague asked me something along the lines of the following today.

For some type X he has:

X* px = new X[sz];

Then he wants to convert px to a char* (I'm guessing for the purpose of
serializing the object array).
What for? Why not provide your own operator<< and operator>> for type X?
Lets consider what happens if type X is composed of primitive types,
containers, pointers and references.

Doesn't it make sense that its a better solution to stream those
componants rather than carying out a byte-by-byte transfer of the
object? What if the target of the transfer uses a different architecture
(or a different compiler). Why would you want to have to deal with the
type's padding?

The proposed solution above and below therefore constitute a good
example of undefined behaviour. Not to mention unneccesary bits being
transferred and having to detect the source and target architectures as
well as writing a new function to handle each possibility.
Why do it the hard way with undefined results when writing a simple
operator rids you of all the headaches?

std::ostream& operator<<(std::ostream& os, const X& x)
{
// stream the relevent componants into os
// return os;
}

and make the above function a friend of type X.
It doesn't get any simpler. And its portable.

I can think of three ways to do this:

char* pc = (char*) px;
char* pc = static_cast<char*> (static_cast<void*> (px));
char* pc = reinterpret_cast<char*> (px);

From my reading of the standard it seems that the results of these
casts are all unspecified.
I don't see a cast, i see a non-portable hack. The above is in fact
guarenteed to fail. Don't casting unless you develop a healthy respect
for what they do.

Is this true?

In practice, would these casts ever do anything besides the obvious?

Is there any portable way to access the bytes of an object's
representation?


Of course there is. The key is to access the *relevent* bytes. What is
relevent can very well depend on the requirements and needs.
You are assuming than an object will occupy in memory the sum of the
allocations of its componants. If your computer did not rely on
segment+index addressing schemes it would slow to a crawl. In C++ its
critical to provide code that is transparent to the platform its running
on. Your code needs to stream the object's components with no regards to
the architecture/platform underneath. Thats what C++ is all about.

Lets slap together a dumb example.

#include <iostream>
#include <ostream>

class X
{
char c;
int n;
public:
X() : c(' '), n(0) { }
~X() { }
};

int main()
{
X x;
std::cout << "sizeof(x) = " << sizeof(x) << std::endl;
}

/*
sizeof(x) = 8
*/

Interestingly enough on my system a char is 1 byte and an integer is 4
bytes (your mileage may well vary). So why the size of 8 bytes? Answer:
padding. Why would you ever want to pay an 8 byte transfer when you can
simply stream the char and integer? No hacking required. With a portable
result too.

Lets prove it, how about streaming type X objects to your standard
output? After all, it uses a std::ostream too, doesn't it? You can
replace std::cout with any interface that can accept a standard output
stream. All you need is a simple operator<< to stream your precious type
in any fashion you desire.

#include <iostream>
#include <ostream>

class X
{
char c;
int n;
public:
X() : c(' '), n(0) { } // default ctor
X(char c_, int n_) : c(c_), n(n_) { }
~X() { }
/* friend operator<< */
friend std::ostream& operator<<(std::ostream& os, const X& r_x)
{
os << "c = " << r_x.c; // stream the char
os << "; n = " << r_x.n; // stream the integer
return os << std::endl;
}
};

int main()
{
X xa('a', 0);
X xb('b', 1);

std::cout << "xa: " << xa;
std::cout << "xb: " << xb;

}

/*
xa: c = a; n = 0
xb: c = b; n = 1
*/

Are you now seeing the simplicity and power in the design? What if the
private member integer was in fact another class? No problem, write an
op<< for it too. What if i needed a container of 1000 X elements? What
if you needed to stream the whole container of 1000 X elements to
standard output?

Your way you would need hundreds of lines of code. And it would still
not be portable. I can create, load and stream a container of 1000 X
elements in 3 lines of code excluding includes. Yes: 3. Completely
portable and reusable.

hint: std::vector< X > vn(1000); // and std::copy(...) to std::cout

May 31 '06 #2

P: n/a

Mark P wrote:
A colleague asked me something along the lines of the following today.

For some type X he has:

X* px = new X[sz];

Then he wants to convert px to a char* (I'm guessing for the purpose of
serializing the object array). .... Is there any portable way to access the bytes of an object's representation?


Yes. The correct way is indeed to cast it to a char*. There is one
catch.
The object must have a POD type. POD is Plain Old Data, which roughly
means any old C object that can be memcpy'd as bytes. Almost all C++
features will make a type non-POD, see the standard or any advanced
book.

Of course, it may be a portable way to *access* the bytes of a POD, but
that
still doesn't mean those *bytes* are portable. And the bytes of a
pointer are
notoriously unusable later on.

HTH,
Michiel Salters

May 31 '06 #3

P: n/a
Salt_Peter wrote:
Mark P wrote:
A colleague asked me something along the lines of the following today.

For some type X he has:

X* px = new X[sz];

Then he wants to convert px to a char* (I'm guessing for the purpose of
serializing the object array).


What for? Why not provide your own operator<< and operator>> for type X?
Lets consider what happens if type X is composed of primitive types,
containers, pointers and references.

[snip]

Good advice. See also these FAQs:

http://www.parashift.com/c++-faq-lit...alization.html

Cheers! --M

May 31 '06 #4

P: n/a
Mark P posted:

char* pc = (char*) px;

Pefectly okay.

char* pc = static_cast<char*> (static_cast<void*> (px));

Perfectly okay.

char* pc = reinterpret_cast<char*> (px);

Pefectly okay.

From my reading of the standard it seems that the results of these
casts are all unspecified.

Incorrect. EVERY object is made up of bytes, regardless of its type, and
regardless of whether it qualifies as a POD. The following code is
perfectly okay:

#include <string>
#include <iostream>

template<class T>
void PrintObjectBytes( const T &obj )
{
const unsigned char * const p_last_byte =
reinterpret_cast<const unsigned char *>(&obj) + ( sizeof(obj) - 1
);

for( const unsigned char *p = reinterpret_cast<const unsigned char *>
(&obj);
/* Nothing Condition */;
++p )
{
std::cout << static_cast<unsigned>(*p) << '\n';

if ( p == p_last_byte ) break;
}
}
#include <cstdlib>

int main()
{
std::string str("Hello World!");

PrintObjectBytes( str );

std::system("PAUSE");
}
-Tomás
May 31 '06 #5

P: n/a
posted:

Yes. The correct way is indeed to cast it to a char*. There is one
catch.
The object must have a POD type. POD is Plain Old Data, which roughly
means any old C object that can be memcpy'd as bytes. Almost all C++
features will make a type non-POD, see the standard or any advanced
book.

You're incorrect.

Find me the fanciest, most advanced class you can find... and I guarantee
you it's made up of bytes.

A type doesn't have to be a POD in order for you to access its bytes. See
my post elsewhere in thread for an example.
-Tomás
May 31 '06 #6

P: n/a
Tomás wrote:
EVERY object is made up of bytes, regardless of its type, and
regardless of whether it qualifies as a POD.
Sure, but the meaning of those bytes might be different than expected.
For instance, a virtual table might be included or the compiler might
have inserted padding between members. If one is serializing an object
(as the OP indicated), then those bytes are not necessarily meaningful
when unserialized at some later time or on some other machine.
The following code is
perfectly okay:

#include <string>
#include <iostream>

template<class T>
void PrintObjectBytes( const T &obj )
{
const unsigned char * const p_last_byte =
reinterpret_cast<const unsigned char *>(&obj) + ( sizeof(obj) - 1
);

for( const unsigned char *p = reinterpret_cast<const unsigned char *>
(&obj);
/* Nothing Condition */;
++p )
{
std::cout << static_cast<unsigned>(*p) << '\n';

if ( p == p_last_byte ) break;
Use the for-loop condition instead of this line, which unnecessarily
duplicates the functionality of the for-loop construct.
}
}


Cheers! --M

May 31 '06 #7

P: n/a
mlimber posted:

if ( p == p_last_byte ) break;


Use the for-loop condition instead of this line, which unnecessarily
duplicates the functionality of the for-loop construct.

I want the condition to be tested AFTER the loop body, sort of like how you
can have a "do loop" instead of a "while loop".

Alas, C++ doesn't provide a "do for" loop.
-Tomás
May 31 '06 #8

P: n/a
Tomás wrote:
mlimber posted:

if ( p == p_last_byte ) break;

Use the for-loop condition instead of this line, which unnecessarily
duplicates the functionality of the for-loop construct.

I want the condition to be tested AFTER the loop body, sort of like how you
can have a "do loop" instead of a "while loop".

Alas, C++ doesn't provide a "do for" loop.


do {
// loop body
} while (condition);
May 31 '06 #9

P: n/a
red floyd posted:
Tomás wrote:
mlimber posted:

if ( p == p_last_byte ) break;
Use the for-loop condition instead of this line, which unnecessarily
duplicates the functionality of the for-loop construct.

I want the condition to be tested AFTER the loop body, sort of like
how you can have a "do loop" instead of a "while loop".

Alas, C++ doesn't provide a "do for" loop.


do {
// loop body
} while (condition);

Please re-read my previous post before you post another example of "do
loop".
-Tomás

May 31 '06 #10

P: n/a
Tomás wrote:
Mark P posted:

char* pc = (char*) px;

Pefectly okay.

char* pc = static_cast<char*> (static_cast<void*> (px));

Perfectly okay.

char* pc = reinterpret_cast<char*> (px);

Pefectly okay.

From my reading of the standard it seems that the results of these
casts are all unspecified.

Incorrect. EVERY object is made up of bytes, regardless of its type, and
regardless of whether it qualifies as a POD. The following code is
perfectly okay:


[example code snipped]

OK, but what then to make of 5.2.10.7 describing reinterpret_cast, below?

"A pointer to an object can be explicitly converted to a pointer to an
object of different type.65) Except that converting an rvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are object
types and where the alignment requirements of T2 are no stricter than
those of T1) and back to its original type yields the original pointer
value, the result of such a pointer conversion is unspecified."

Doesn't this mean that the converted pointer may not point to the bytes
of the original object? I can't imagine why this would ever happen, but
it seems that the standard permits it.

-Mark
May 31 '06 #11

P: n/a
Salt_Peter wrote:
Mark P wrote:
A colleague asked me something along the lines of the following today.

For some type X he has:

X* px = new X[sz];

Then he wants to convert px to a char* (I'm guessing for the purpose of
serializing the object array).


What for? Why not provide your own operator<< and operator>> for type X?
Lets consider what happens if type X is composed of primitive types,
containers, pointers and references.


char *pc = (char *)px is reasonable for many cases if you want to
manipulate the bytes. In this scenario, X is plain old data type
(POD). Here is an example,
double *px = new double [100];
char *pc = (char *)px, so you can decode the floating point format by
manipulating the raw byte string.

But if the type X is not POD, it looks complicated to access the raw
byte string.

May 31 '06 #12

P: n/a
Tomás wrote:
mlimber posted:

if ( p == p_last_byte ) break;


Use the for-loop condition instead of this line, which unnecessarily
duplicates the functionality of the for-loop construct.

I want the condition to be tested AFTER the loop body, sort of like how you
can have a "do loop" instead of a "while loop".

Alas, C++ doesn't provide a "do for" loop.


I meant something more along these lines:

typedef unsigned char uchar;
const uchar * const end =
reinterpret_cast<const uchar*>(&obj) + sizeof(obj);
for( const uchar*p = reinterpret_cast<const uchar*>(&obj);
p != end;
++p )
{
std::cout << static_cast<unsigned>(*p) << '\n';
}

Cheers! --M

May 31 '06 #13

P: n/a
mlimber posted:
const uchar * const end = reinterpret_cast<const uchar*>(&obj) +
sizeof(obj);

I have a phobia of pointers to "one past the end". (In fact I've a phobia
of pointers which point to anything other than legitimate addresses.)

I know they're not taboo, but they just don't make sense to me.

What happens if an object is located near the "border", right near the end
of memory?

Things are extra hairy if you're dealing with very large objects.

The Standard doesn't say anything about what happens when pointer
arithmetic overflows.

-Tomás
May 31 '06 #14

P: n/a
Mark P posted:
[example code snipped]

OK, but what then to make of 5.2.10.7 describing reinterpret_cast,
below?

"A pointer to an object can be explicitly converted to a pointer to an
object of different type.65) Except that converting an rvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are
object types and where the alignment requirements of T2 are no
stricter than those of T1) and back to its original type yields the
original pointer value, the result of such a pointer conversion is
unspecified."
The reason they mention alignment requirements is as follows:

On a certain system, a char may be 8-Bit... however, the smallest amount
of memory that the CPU can access may be 16 bits. Therefore, a "char*"
would need an extra bit to indicate whether it's the first or last 8
bits.

For that reason the following expression may be false:

sizeof(char*) == sizeof(int*)
And accordingly, the following isn't guaranteed to work:

int main()
{
char k[5] = {};

int *p = reinterpret_cast<int*>( k + 1 );

char *pc = p;

assert( pc == k + 1 );
}

Doesn't this mean that the converted pointer may not point to the
bytes of the original object? I can't imagine why this would ever
happen, but it seems that the standard permits it.

A "char" has the least alignment requirements -- there's no problem.
-Tomás
May 31 '06 #15

P: n/a
Tomás wrote:
mlimber posted:
const uchar * const end = reinterpret_cast<const uchar*>(&obj) +
sizeof(obj);

I have a phobia of pointers to "one past the end". (In fact I've a phobia
of pointers which point to anything other than legitimate addresses.)

I know they're not taboo, but they just don't make sense to me.

What happens if an object is located near the "border", right near the end
of memory?


See below, but the implication seems to be that they cannot be
allocated too close to a "border."
Things are extra hairy if you're dealing with very large objects.
No. The "end" pointer above points to the address immediately following
the object, not sizeof(T) after that point.
The Standard doesn't say anything about what happens when pointer
arithmetic overflows.


Incorrect. 5.7 para. 5 says, "[i]f the expression P points to the last
element of an array object, the expression (P)+1 points one past the
last element of the array object, and if the expression Q points one
past the last element of an array object, the expression (Q)-1 points
to the last element of the array object. If both the pointer operand
and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce
an overflow; otherwise, the behavior is undefined."

Use them without fear!

Cheers! --M

May 31 '06 #16

P: n/a
mlimber posted:
Incorrect. 5.7 para. 5 says, "[i]f the expression P points to the last
element of an array object, the expression (P)+1 points one past the
last element of the array object, and if the expression Q points one
past the last element of an array object, the expression (Q)-1 points
to the last element of the array object. If both the pointer operand
and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce
an overflow; otherwise, the behavior is undefined."

My phobia is based more on disgust than logic.

Let's say we have the following structure which is used extensively
throughout our program:

struct Monkey {

long double settings[64];

unsigned long vars[128];

};

It doesn't seem very C++-ish (or even C-ish) to effectively waste the
last kilobyte or so of memory.

C++ is my favourite programming langauge because it's efficient in a
hardcore kind of way, but also has advanced, fancy features. Unions are a
brilliant example of such efficiency. Some things disgust me though,
namely "one past the end", and the way in which you can supply "delete"
and "free" with a null pointer... it would have been more efficient to
have:

template<class T>
inline void ndelete( T p ) { if (p) delete p; }
-Tomás
May 31 '06 #17

P: n/a
Tomás wrote:
Mark P posted:
[example code snipped]

OK, but what then to make of 5.2.10.7 describing reinterpret_cast,
below?

"A pointer to an object can be explicitly converted to a pointer to an
object of different type.65) Except that converting an rvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are
object types and where the alignment requirements of T2 are no
stricter than those of T1) and back to its original type yields the
original pointer value, the result of such a pointer conversion is
unspecified."


[alignment details snipped]

Doesn't this mean that the converted pointer may not point to the
bytes of the original object? I can't imagine why this would ever
happen, but it seems that the standard permits it.

A "char" has the least alignment requirements -- there's no problem.


I don't think you're reading this correctly. Compacting some of the
clauses, I can write that section as:

"A pointer to an object can be explicitly converted to a pointer to an
object of different type. Except that [a certain special case does a
special thing], the result of such a pointer conversion is unspecified."

That is, unless you're casting from T1* to *T2 and back to *T1 (with the
additional proviso about alignment), the result of this conversion is
unspecified.

-Mark
May 31 '06 #18

P: n/a
Tomás wrote:
mlimber posted:
Incorrect. 5.7 para. 5 says, "[i]f the expression P points to the last
element of an array object, the expression (P)+1 points one past the
last element of the array object, and if the expression Q points one
past the last element of an array object, the expression (Q)-1 points
to the last element of the array object. If both the pointer operand
and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce
an overflow; otherwise, the behavior is undefined."

My phobia is based more on disgust than logic.

Let's say we have the following structure which is used extensively
throughout our program:

struct Monkey {

long double settings[64];

unsigned long vars[128];

};

It doesn't seem very C++-ish (or even C-ish) to effectively waste the
last kilobyte or so of memory.


You're not wasting it; you just can't put a Monkey there (apparently).
You could, however, potentially put many other things in that memory.

Cheers! --M

May 31 '06 #19

P: n/a
Tomás wrote:
mlimber posted:
Incorrect. 5.7 para. 5 says, "[i]f the expression P points to the last
[snip]
It doesn't seem very C++-ish (or even C-ish) to effectively waste the
last kilobyte or so of memory.


PS, Footnote 75 in that same section says, "[A]n implementation need
only provide one extra byte (which might overlap another object in the
program) just after the end
of the object in order to satisfy the 'one past the last element'
requirements."

Cheers! --M

May 31 '06 #20

P: n/a
Tomás wrote:
mlimber posted:
const uchar * const end = reinterpret_cast<const uchar*>(&obj) +
sizeof(obj);


I have a phobia of pointers to "one past the end". (In fact I've a phobia
of pointers which point to anything other than legitimate addresses.)


This is ultimately open ranges versus closed ranges; [p, q) versus [p,
q]. Open ranges have a number of advantages over the closed ranges
[which you enspouse].

For one, note that open ranges are able to represent the 'empty range'.

When 'p == q' (or begin() == end()), [p, q) represents an empty range.
Iteration over it is naturally avoided; hell, the pointers don't even
have to be valid. Iterating from [0, 0) is perfectly safe.

There is no way, however, to represent the empty range with closed
ranges. Consider constructing your 'p_last_byte' over a range of /zero/
bytes [granted, I'm not certain it could happen in ISO C++]: p + 0 - 1 =
p-1! The range is now [p, p-1]. Not only will your iteration merrily
print out *p, it will find that p != p-1, p+1 != p-1, ... And if p == 0,
p-1 = ~0, so your range becomes [0, ~0).
If you look into the standard algorithms and they way they're used, open
ranges prevent a lot of unnecessary checks. For instance:

v.erase( remove_if(v.begin(), v.end(), X), v.end() );

is perfectly safe thanks to the magic of open ranges. If there are no
elements to be removed, remove_if returns v.end(), and v.erase sees the
empty range [v.end(), v.end()). If v.begin() == v.end(), remove_if
doesn't die, and simply does nothing, returning v.end().

If the STL didn't use open ranges for iterators, that line of code would
require two separate checks, one to ascertain that v was not empty (or
else remove_if would die) and one to determine that remove_if had found
elements to remove (or else erase would die).
Jack Saalweachter
May 31 '06 #21

P: n/a
dan2online wrote:
Salt_Peter wrote:
Mark P wrote:
A colleague asked me something along the lines of the following today.

For some type X he has:

X* px = new X[sz];

Then he wants to convert px to a char* (I'm guessing for the purpose of
serializing the object array).

What for? Why not provide your own operator<< and operator>> for type X?
Lets consider what happens if type X is composed of primitive types,
containers, pointers and references.


char *pc = (char *)px is reasonable for many cases if you want to
manipulate the bytes. In this scenario, X is plain old data type
(POD). Here is an example,
double *px = new double [100];
char *pc = (char *)px, so you can decode the floating point format by
manipulating the raw byte string.

But if the type X is not POD, it looks complicated to access the raw
byte string.


I'll say it again since you haven't yet got the picture. Even with a
POD, that raw byte string will often include padding. Consider a complex
Pod with componants that don't fit nicely together. ie: 2 chars and a
double.

The use of operator overloading is a far more powerful, efficient,
reusable, portable, maintainable, extendeable and safe way to transfer
bits around. Its a win-win bargain and bug free - no pointers involved.
Technically, op<< and op>> are universal in that any type can be
streamed efficiently with all padding striped away - guaranteed. Any
interface that can accept a std::stream& will do to swallow or send. And
remember that the overloaded operator is not a member function, the POD
need not be a class.

Imagine a complex PODA which is a member of a PODB type. There is no
need to write a new function to stream both PODB and its PODA member +
other members. The operator you wrote for PODA will do just fine, you
only need worry about PODB's immediate needs since PODA already knows
how to stream itself (its an object, not just a bunch of bytes).

Again, if you need a container of POD elements and you require streaming
the entire container's contents, you already have an operator for the
elements. Regardless of whether the container is sequential or not and
irrelevent of the padding constraints.

Programming the bit transfer becomes much, much easier, bullet proof and
with a lot less code.
May 31 '06 #22

P: n/a
mlimber posted:

PS, Footnote 75 in that same section says, "[A]n implementation need
only provide one extra byte (which might overlap another object in the
program) just after the end
of the object in order to satisfy the 'one past the last element'
requirements."

Now I see : ).
Pointer to one past the end it is!
-Tomás
Jun 1 '06 #23

P: n/a
Mark P posted:

That is, unless you're casting from T1* to *T2 and back to *T1 (with the additional proviso about alignment), the result of this conversion is
unspecified.

To be honest, I don't need to read anything from the Standard, because I
know I have to be right (I'm not being arogant, please bear with me...).
The Laws of Physics and The Laws of Mathematics over-rule anything that's
written in a programming language standard.

Firstly the Standard says that you can convert any pointer type to a
void*, e.g.:

void Func( double *p1, unsigned *p2, char* p3 )
{
void *p;

p = p1;

p = p2;

p = p3;
}

And it also says that you can convert back, and the original address
value will be perfectly preserved. Sample:

int main()
{
double k;

void *p1 = &k;

double *p2 = static_cast<double*>(p1);
*p2 = 45.372;
}
We all know that the smallest thing in C++ is a byte. No structure shall
be 8.5 bytes, or 2 and a third bytes, or one eight of a byte. Whole bytes
only.

Therefore, by simple logic, one can see that every object is made up of
bytes, and that every object can be accessed as simply an array of bytes.

At the end of the day we're only dealing with chips, and electrical
current, and bits and bytes, there's nothing mysterious.

One thing which puzzled me before is this:
Why was there a void* in C++ at all? A "char*" can reliably store any
address, so why did we need a "void*". If you'd like, here's the thread:
http://groups.google.ie/group/comp.s...d/7da690d52e6d
f286/86e2383d1f830ddb?tvc=1&q=void*+group%3Acomp.std.c% 2B%2B+author%
3ATom%C3%A1s&hl=en#86e2383d1f830ddb
-Tomás
Jun 1 '06 #24

P: n/a
Tomás wrote:
Mark P posted:

That is, unless you're casting from T1* to *T2 and back to *T1 (with the
additional proviso about alignment), the result of this conversion is
unspecified.

To be honest, I don't need to read anything from the Standard, because I
know I have to be right[...]


[snip]

Therefore, by simple logic, one can see that every object is made up of
bytes, and that every object can be accessed as simply an array of bytes.


Perhaps, but that wasn't my point. Would there be anything standard
non-conforming about an implementation which, for X not void and when
casting from void* to X* adds sizeof(X) to the address (modulo the
allowed range of addresses) and when casting from X* to void* subtracts
sizeof(X) from the address (modulo the allowed range of addresses)?

I'll repeat here the section of the standard I quoted earlier, with
added (by me) emphasis on the final clause:

"A pointer to an object can be explicitly converted to a pointer to an
object of different type. Except that converting an rvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are object
types and where the alignment requirements of T2 are no stricter than
those of T1) and back to its original type yields the original pointer
value, *the result of such a pointer conversion is unspecified*."

Is not the result of the single operation of converting a char* to a
void* then unspecified?

-Mark
Jun 1 '06 #25

P: n/a
Mark P posted:

Perhaps, but that wasn't my point. Would there be anything standard
non-conforming about an implementation which, for X not void and when
casting from void* to X* adds sizeof(X) to the address (modulo the
allowed range of addresses) and when casting from X* to void*
subtracts sizeof(X) from the address (modulo the allowed range of
addresses)?

See below.

"A pointer to an object can be explicitly converted to a pointer to an
object of different type.

Example:

double d; /* Source Type */

char *p = reinterpret_cast<char*>(&d); /* Destination Type */

Except that converting an rvalue of type
“pointer to T1”

T1 = double

to the type “pointer to T2”

T2 = char

(where T1 and T2 are
object types and where the alignment requirements of T2 are no
stricter than those of T1)

Nothing has less strict alignment requirements than a char.

and back to its original type yields the
original pointer value

Yippie, we've satisfied the conditions!

,*the result of such a pointer conversion is
unspecified*."


The two lines immediately above refer to when the conditions are NOT
satisfied.

We've gone from:

Strict alignment (double)

to:

Less strict alignment (char)
So we're okay. The Standard is actually giving us PLENTY of slack here!
For instance, the following will work perfectly if a long double has
stricter alignment requirements than an int:

long double ld;

int *p = reinterpret_cast<int*>(&ld);

long double *p2 = reinterpret_cast<long double*>(p);

*p2 = 453.235;
So there you have it: Anything can go to char* and then back to its
original pointer type.

-Tomás
Jun 1 '06 #26

P: n/a
Tomás posted:

"A pointer to an object can be explicitly converted to a pointer to an
object of different type.

Here's a little program I threw together for giving the different pointer
sizes on a given platform. On Windows XP, it gives 4 for every one.
#include <iostream>
#include <cstdlib>
#include <cstring>

/* The following are only use for their types */
#include <string>
#include <vector>
#include <typeinfo>
template<unsigned width>
const char* CentreHoriz( const char* const p_in )
{
/* NB:

(1) Uses static data, so be careful with sequence points.
(2) Doesn't check that string isn't too long.
*/
static char buffer[width + 1]; /* Automatic null terminator */
std::memset( buffer, ' ', width * sizeof(*buffer) );
unsigned const len = std::strlen(p_in);

std::memcpy( buffer + width / 2 - len / 2,
p_in,
len);

return buffer;
}
template<class T>
void PrintRow( const char* const p )
{
std::cout
<< '|'
<< CentreHoriz<36>(p)
<< "|| "
<< sizeof(T)
<< " |\n"

<<
"-------------------------------------------------------------\n";
}


int main()
{
std::cout <<
"================================================= ============\n"
"| How much memory does a particular pointer type consume? |\n"
"================================================= ============\n"
"| Type || Bytes |\n"
"================================================= ============\n";

PrintRow<char*>("char*");
PrintRow<short*>("short*");
PrintRow<int*>("int*");
PrintRow<long*>("long*");
PrintRow<float*>("float*");
PrintRow<double*>("double*");
PrintRow<long double*>("long double*");
PrintRow<bool*>("bool*");
PrintRow<wchar_t*>("wchar_t*");
PrintRow<std::string*>("std::string*");
PrintRow<std::vector<std::string>*>("std::vector<s td::string>*");

std::cout << '\n';

std::system("PAUSE");
}

-Tomás

Jun 1 '06 #27

P: n/a
Tomás wrote:
Mark P posted:

Perhaps, but that wasn't my point. Would there be anything standard
non-conforming about an implementation which, for X not void and when
casting from void* to X* adds sizeof(X) to the address (modulo the
allowed range of addresses) and when casting from X* to void*
subtracts sizeof(X) from the address (modulo the allowed range of
addresses)?


[irrelevant example snipped]
So we're okay. The Standard is actually giving us PLENTY of slack here!
[more cuts]

So there you have it: Anything can go to char* and then back to its
original pointer type.


That point has never been the issue even though you keep providing me
with examples to illustrate it. I think you're not understanding the
wording of the standard, so let me quote this yet again:

"A pointer to an object can be explicitly converted to a pointer to an
object of different type. Except that converting an rvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are object
types and where the alignment requirements of T2 are no stricter than
those of T1) and back to its original type yields the original pointer
value, the result of such a pointer conversion is unspecified."

Allow me to rearrange the second sentence without altering its meaning:

"The result of such a pointer conversion is unspecified except [for one
special situation]."

In other words, *all* reinterpret_cast pointer-to-pointer conversions
have unspecified behavior except for one special case where you convert
back and forth between two types and respect alignment.

You keep showing me examples of the special case and I have no
disagreement with you over that case but my point, again, is that for
all other cases the standard states that the result is unspecified. In
particular, the result of the one-way conversion from void* to char* is
unspecified.

Mark
Jun 1 '06 #28

P: n/a
* Mark P:
In other words, *all* reinterpret_cast pointer-to-pointer conversions
have unspecified behavior except for one special case where you convert
back and forth between two types and respect alignment.


Almost, but not quite.

A pointer to a POD-struct object can be converted via a "suitable"
reinterpret_cast to a pointer to the first member of that object (and of
course back), §9.2/17.

Yes, the standard is a bit inconsistent here, and in other places, which
is why it can be good fun to discuss what the standard really means...

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Jun 1 '06 #29

P: n/a
Mark P posted:

In
particular, the result of the one-way conversion from void* to char* is
unspecified.

Are you questioning the legality of the following?:

int main()
{
int val;

void *pvoid = &val; /* This is perfectly okay */
char *pchar = static_cast<char*>(pvoid); /* May get corrupted? */
pvoid = pchar; /* May not be reliable? */

int *p = static_cast<int*>(pvoid);

*p = 7;
}
Maybe the Standard doesn't state in plain English that this is legal...
but in my mind, it doesn't have too.

We all know that a "void*" can store ANY address reliably.

As every object is made up of bytes, we can also assume that a "char*"
can store ANY address reliably.

If two pointer types can store ANY address reliably, then it makes sense
that you can convert back and forth.

There have been many times when contemplating C++ that I thought I had
thought of everything... but then someone points out to me something that
I've overlooked. I can see no reason why a "char*" could not store any
address reliably, but nonetheless, an extra explicit paragraph in the
Standard wouldn't hurt.

All logic and reasoning aside, you could fill a warehouse with code that
stores arbitrary memory addresses in a "char*" (my own code included), so
it just wouldn't be appropriate make it illegal.

This situation of "everybody's doing it, so we better make it legal" has
propogated elsewhere. In C code, you'd commonly see the following to get
the address of the one-past-last element of an array:

int array[10];

int *p = &array[10];
The line immediately above gets turned into:

int *p = & *(array + 10);
As you can see, an invalid pointer gets dereferenced -- Undefined
Behaviour.

However, so many people do it in their code that the Standard committee
decided that an addressof operator followed immediately by a dereference
operator cancel each other out. Therefore the line of code becomes:

int *p = array + 10;
No more undefined behaviour. A good example of "bowing to what every
does".

But at the end of the day, I like to look at things from the perspective
of:

No matter how complicated or advanced or fancy a programming language
becomes, it's still built on bits, bytes and CPU instructions. If the
smallest addressible memory unit in C++ is going to be a char, then we
should be able to store ANY legitimate memory address in a "char*".
-Tomás
Jun 1 '06 #30

P: n/a
Tomás wrote:
Mark P posted:

In
particular, the result of the one-way conversion from void* to char* is
unspecified.

Are you questioning the legality of the following?:

int main()
{
int val;

void *pvoid = &val; /* This is perfectly okay */
char *pchar = static_cast<char*>(pvoid); /* May get corrupted? */
pvoid = pchar; /* May not be reliable? */

int *p = static_cast<int*>(pvoid);

*p = 7;
}


The issue is not "legality" in the sense of a well-formed program. The
issue, again, is that the behavior may be unspecified. In your specific
example I believe that this is *not* the case, however, since your
conversion sequence is: int* -> void* -> char* -> void* -> int*. In
particular, your conversion sequence "unwinds" itself and retraces its
steps back to the original type. Thus it falls under the special case
in the section of the standard that I have already quoted 4 times (and
won't repeat again for fear of violating copyright restrictions :) ).

Suppose instead you had offered:

int main ()
{
int val = 0;

void* pv = &val;
char* pc = static_cast<char*>(pv);
int* pi = static_cast<int*> (pc);

*pi = 7; // now, what is val?
}

However logical it may seem that val should be 7, the standard
nonetheless indicates that the value of val is unspecified.

Maybe the Standard doesn't state in plain English that this is legal...
but in my mind, it doesn't have too.

We all know that a "void*" can store ANY address reliably.
Only you know what you mean by reliably.

As every object is made up of bytes, we can also assume that a "char*"
can store ANY address reliably.
Ditto.

If two pointer types can store ANY address reliably, then it makes sense
that you can convert back and forth.
And where did you get the idea that the standard always makes sense? :)

There have been many times when contemplating C++ that I thought I had
thought of everything... but then someone points out to me something that
I've overlooked. I can see no reason why a "char*" could not store any
address reliably, but nonetheless, an extra explicit paragraph in the
Standard wouldn't hurt.

All logic and reasoning aside, you could fill a warehouse with code that
stores arbitrary memory addresses in a "char*" (my own code included), so
it just wouldn't be appropriate make it illegal.


It's clearly not illegal. Depending how it's used it may be unspecified
(though it's hard to imagine that an implementation would go out of its
way to make this not work as one would assume).

-Mark
Jun 1 '06 #31

P: n/a

Salt_Peter wrote:
I'll say it again since you haven't yet got the picture. Even with a
POD, that raw byte string will often include padding. Consider a complex
Pod with componants that don't fit nicely together. ie: 2 chars and a
double.
How will the padding bytes affect the manipulation of raw byte string?
In many cases, we need to look inside the internal format of the raw
byte string.

The use of operator overloading is a far more powerful, efficient,
reusable, portable, maintainable, extendeable and safe way to transfer
bits around. Its a win-win bargain and bug free - no pointers involved.
Technically, op<< and op>> are universal in that any type can be
streamed efficiently with all padding striped away - guaranteed. Any
interface that can accept a std::stream& will do to swallow or send. And
remember that the overloaded operator is not a member function, the POD
need not be a class.

It will depend on your application.
Imagine a complex PODA which is a member of a PODB type. There is no
need to write a new function to stream both PODB and its PODA member +
other members. The operator you wrote for PODA will do just fine, you
only need worry about PODB's immediate needs since PODA already knows
how to stream itself (its an object, not just a bunch of bytes).

Again, if you need a container of POD elements and you require streaming
the entire container's contents, you already have an operator for the
elements. Regardless of whether the container is sequential or not and
irrelevent of the padding constraints.

Programming the bit transfer becomes much, much easier, bullet proof and
with a lot less code.


It is true for most cases, but not universal.

Jun 1 '06 #32

P: n/a
Mark P posted:
In your specific example I believe that this is *not* the case,
however, since your conversion sequence is: int* -> void* -> char* ->
void* -> int*.
The point of my code is that I go from T* to char* and then back to T*. I
draw an analogy with other types:

double a = 56.253; /* Here's our original value */

int b = a; /* We store it in a different type */

double c = b; /* Now we bring it back to the original type */

assert( a == c ); /* Will the value have been preserved? */
In the example immediately above, "information will be lost" when we go
from double to int. Even though we finally go back to double, the
"corruption" has already taken place. Now let's look at it with pointers:

int a;

int *pint = &a; /* Here's our original value */

void *pvoid = pint; /* We store it in a different type */

int *pint2 = pvoid; /* We bring it back to the original type */

assert( pint2 == pint ); /* Will the value have been preserved? */
The above code snippet is guaranteed to work because you can store any
object's address in a "void*" and there won't be any "corruption".

As you quoted several times, the Standard also specifies that you can
reliably go from T1* to T2* without "corruption", but only if the
alignment requirements of T2 are no stricter. As "char" is the smallest
and most simple type we have in C++, it should have the least alignment
requirements (if not none). Therefore the conversion from any legitimate
pointer value to "char*" should go off without a hitch. Example:

int a;

int *pint = &a; /* Here's our original value */

char *pchar =
static_cast<char*>(pint); /* We store it in a different type */

int *pint2 =
static_cast<int*>(pchar); /* We bring it back to the original type */

assert( pint2 == pint ); /* Will the value have been preserved? */
The above should be perfectly okay.

You have gone on to say, notwithstanding any of the above, that the
conversion from "void*" to "char*" may be unspecified. However, if you
consider that a "void*" (assuming it contains a legitimate address) had
to start off as some other pointer value, you can see how there should be
no problem with going to "char*", given that the original pointer value
would have been able to go directly to "char*". That is to say, if the
following is possible:

T* to char*

Then the following should also be possible:

T* to void* to char*

In particular, your conversion sequence "unwinds"
itself and retraces its steps back to the original type.

But as I demonstrated with my "double" example, the "corruption" has
already taken place.

Suppose instead you had offered:

int main ()
{
int val = 0;

void* pv = &val;

"pv" should hold val's address without any corruption.

char* pc = static_cast<char*>(pv);

"pc" should hold the address stored in pv without any corruptino.

int* pi = static_cast<int*> (pc);

Back to the original type. Shouldn't be any corruption.

*pi = 7; // now, what is val?

Should work perfectly.

However logical it may seem that val should be 7, the standard
nonetheless indicates that the value of val is unspecified.

I suppose we have to decide just how pedantic the Standard has to be.
Should it be enough for us to presume that it works (because there's
about ten voices in my head shouting "For God's sake it works!"), or
should we be thinking, "The Standard has to state it explicitly in plain
English"?

Maybe the Standard doesn't state in plain English that this is
legal... but in my mind, it doesn't have too.

We all know that a "void*" can store ANY address reliably.


Only you know what you mean by reliably.

reliably = no corruption, the original value is preserved perfectly.
If two pointer types can store ANY address reliably, then it makes
sense that you can convert back and forth.


And where did you get the idea that the standard always makes sense?
:)

Sometimes that's the only hope we have.

There have been many times when contemplating C++ that I thought I
had thought of everything... but then someone points out to me
something that I've overlooked. I can see no reason why a "char*"
could not store any address reliably, but nonetheless, an extra
explicit paragraph in the Standard wouldn't hurt.

All logic and reasoning aside, you could fill a warehouse with code
that stores arbitrary memory addresses in a "char*" (my own code
included), so it just wouldn't be appropriate make it illegal.


It's clearly not illegal. Depending how it's used it may be
unspecified (though it's hard to imagine that an implementation would
go out of its way to make this not work as one would assume).

I would never thing twice about any "dangers" of using "char*". I see it
as a "universal pointer type", just like "void*".
-Tomás
Jun 2 '06 #33

P: n/a
Tomás wrote:

As you quoted several times, the Standard also specifies that you can
reliably go from T1* to T2* without "corruption", but only if the
alignment requirements of T2 are no stricter.
No! That is not what it says. Read it again if it's not clear. It
says that you can [reliably] go from T1* to T2* and back to T1* subject
to alignment constraints. The "and back" clause is not optional; the
standard only guarantees the result when both casts are performed. This
does *not* imply that you can, for example, go from T1* to T2* to T3* to
T1*, which was exactly the example of my previous post.

I suppose we have to decide just how pedantic the Standard has to be.
Should it be enough for us to presume that it works (because there's
about ten voices in my head shouting "For God's sake it works!"), or
should we be thinking, "The Standard has to state it explicitly in plain
English"?


Insufficient pedantry of the Standard is not the issue here. In fact I
would argue it's the opposite. The Standard makes a point of stating
that the result of these casts is unspecified. Had the Standard said
nothing I might agree with you that it's been left for sensible people
to infer the obvious, but if the Standard explicitly tells us that the
result is unspecified, then you really can't make a case that we're
meant to infer the *opposite*.

-Mark
Jun 2 '06 #34

This discussion thread is closed

Replies have been disabled for this discussion.