By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,201 Members | 922 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,201 IT Pros & Developers. It's quick & easy.

directly serializing structs

P: n/a
Greetings,

When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Thank you

Jun 23 '07 #1
Share this Question
Share on Google+
12 Replies


P: n/a
Cagdas Ozgenc wrote:
Greetings,

When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)?
Yes. You also risk portability problems because different compilers (or
platforms) having different formats for individual data items. Byte
ordering for integers often varies across platforms, floating point
formats can even vary for different compilers on the same platform.

Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?
Yes, see above.

If this is a concern then consider using text, it's much more portable.
>
Thank you
Jun 23 '07 #2

P: n/a
Cagdas Ozgenc wrote:
Greetings,

When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?
Yes, it does. Internal layout is implementation defined.

--
Ian Collins.
Jun 23 '07 #3

P: n/a

"Cagdas Ozgenc" <ca***********@gmail.comwrote in message
news:11*********************@w5g2000hsg.googlegrou ps.com...
Greetings,

When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?
Does your software/application require that much portability? Why struggle
with "write once, compile anywhere" if you're only targeting one platform or
even only one machine, for instances?

I wouldn't call what you described above "serializing" though. To me,
"serializing" has the connotation that you indeed are looking into structs
and the sizes of data members, their padding etc. or using ASN.1/BER for
over the wire transmission, for another example, rather than just doing a
struct-sized write. The recommended practice of streaming everything at
every boundary (disk, wire) seems unnatural and tedious to me also. I guess
a layer at the boundaries that does the streaming on the non-primary
platform and doesn't do anything on the primary platform isn't that bad to
implement.

I can think of 3 issues that prevent the the "blast struct all over" concept
from working: endianess, padding/alignment, datatype sizes. The first one is
the party spoiler. Guaranteed width integers helps for the last issue.
Byte-aligning data (no padding) is probably available on most compilers (?).
Endianess though, well there's not much you can do about that to make the
concept work. Luckily, the users of big endian machines are mostly
categorizably different from little endian machine users, so you can just
pick your target users and tailor your software to them. Or else do the
conversions:

struct on Intel going over wire to a Sparc -no change to struct
struct coming into Sparc from Intel -convert struct endianess
struct on Sparc going to disk -convert struct endianess
struct on Sparc going to Intel -convert struct endianess
struct coming into Intel from Sparc -no change to struct
stuct on Intel going to disk -no change to struct

(The above scenario assumes platform-independent files are desired. If not,
fewer conversions required).
(Yes, before anyone quips, I do know that "network byte order" is big
endian. There's also more Windows machines than Unix).

(Issue 4: size of a byte).

John

Jun 23 '07 #4

P: n/a
On Fri, 22 Jun 2007 22:55:38 -0700, Cagdas Ozgenc <ca***********@gmail.com>
wrote:
>When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?
Serialization is a complex issue that has so far eluded a truly general
solution, primarily because the needs of each developer varies so much. There
are several "classes" of techniques, though. They are described quite well in
the C++ FAQ Lite pages:
http://www.parashift.com/c++-faq-lit...alization.html

When I want to do general-purpose, cross-platform, binary-compatible
exchanges, I generally:

1. Pack data structures to the byte (using #pragmas, most times)
2. Use fixed-width integer types
3. Choose an endian representation and provide conversion facilities
4. Use IEEE representation for floating point numbers, else use fixed point
notation
5. Serialize PODs and structs only, not class hierarchies

An adaptation library with conditional compilation switches can be made for
items 1-4 that allows you to encapsulate the compiler- or platform-specific
behaviors.

-dr
Jun 23 '07 #5

P: n/a

"Dave Rahardja" <dr****************************@pobox.comwrote in message
news:9g********************************@4ax.com...
On Fri, 22 Jun 2007 22:55:38 -0700, Cagdas Ozgenc
<ca***********@gmail.com>
wrote:
>>When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Serialization is a complex issue that has so far eluded a truly general
solution, primarily because the needs of each developer varies so much.
There
are several "classes" of techniques, though. They are described quite well
in
the C++ FAQ Lite pages:
http://www.parashift.com/c++-faq-lit...alization.html

When I want to do general-purpose, cross-platform, binary-compatible
exchanges, I generally:

1. Pack data structures to the byte (using #pragmas, most times)
I think "no padding" may indeed be a feature that a new language could
exploit.
2. Use fixed-width integer types
3. Choose an endian representation and provide conversion facilities
That's the key one. If there were one gift that the hardware vendors good
give, it would be to standardize endianess. IMO. OK, it's little endian from
now on. Let's move on! LOL! (Oh wait, can I have a standard definition of
"byte" also?).
4. Use IEEE representation for floating point numbers, else use fixed
point
notation
5. Serialize PODs and structs only, not class hierarchies
By "class hierarchies", I think you mean "derived structs". If there were
more guarantee (or I was so assured) that struct B derived from struct A
would be exactly like a struct containing the data members of A followed
immediately by data members of B, I'd be eventually OK with those
compositions.
>
An adaptation library with conditional compilation switches can be made
for
items 1-4 that allows you to encapsulate the compiler- or
platform-specific
behaviors.
Grouping those hides the "severity" of 3.

Even with your 1-5, all bets are still off because sizeof(char) could be
different somewhere else (right?).

John

Jun 24 '07 #6

P: n/a
JohnQ wrote:
>
"Dave Rahardja" <dr****************************@pobox.comwrote in
message news:9g********************************@4ax.com...
>On Fri, 22 Jun 2007 22:55:38 -0700, Cagdas Ozgenc
<ca***********@gmail.com>
wrote:
>>When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Serialization is a complex issue that has so far eluded a truly general
solution, primarily because the needs of each developer varies so
much. There
are several "classes" of techniques, though. They are described quite
well in
the C++ FAQ Lite pages:
http://www.parashift.com/c++-faq-lit...alization.html

When I want to do general-purpose, cross-platform, binary-compatible
exchanges, I generally:

1. Pack data structures to the byte (using #pragmas, most times)

I think "no padding" may indeed be a feature that a new language could
exploit.
Not if the hardware doesn't support it, or even supports it with a
significant performance hit.

--
Ian Collins.
Jun 24 '07 #7

P: n/a
On Jun 23, 7:55 am, Cagdas Ozgenc <cagdas.ozg...@gmail.comwrote:
When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?
Very much so. Even changing the compile flags can cause
problems. About the only time this works is for temporary
files, which are read and written by the same binary imagine.

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 24 '07 #8

P: n/a
On Jun 23, 11:58 am, "JohnQ" <johnqREMOVETHISprogram...@yahoo.com>
wrote:
"Cagdas Ozgenc" <cagdas.ozg...@gmail.comwrote in message
news:11*********************@w5g2000hsg.googlegrou ps.com...
When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Does your software/application require that much portability? Why struggle
with "write once, compile anywhere" if you're only targeting one platformor
even only one machine, for instances?
And one version of one compiler with one set of compiler options.

I guess he's a professional.

[...]
I can think of 3 issues that prevent the the "blast struct all
over" concept from working: endianess, padding/alignment,
datatype sizes.
Representation in general. For floating point, it's a real
problem, even today. For integers, there is also at least one
machine on the market which uses 36 bit ones complement
integers, but it's not very wide spread, and many people can
afford to ignore it.

Just be aware of the restriction, and document it, so that some
maintenance programmer in the future doesn't get bitten. And
whatever you do, document all external formats, so a maintenance
programmer has a chance of implementing them on some future
material.
The first one is the party spoiler.
I'd say that the different representations are even worse.
(Note too that "endianness" isn't a good word, since it suggests
two possible arrangements. At least three are widespread.)

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 24 '07 #9

P: n/a

"James Kanze" <ja*********@gmail.comwrote in message
news:11*********************@p77g2000hsh.googlegro ups.com...
On Jun 23, 11:58 am, "JohnQ" <johnqREMOVETHISprogram...@yahoo.com>
wrote:

(Note too that "endianness" isn't a good word, since it suggests
two possible arrangements. At least three are widespread.)

But that one is called "middle ENDIAN" right? If so, that makes "endianness"
seem OK.

John

Jun 26 '07 #10

P: n/a

"Ian Collins" <ia******@hotmail.comwrote in message
news:5e*************@mid.individual.net...
JohnQ wrote:
>>
"Dave Rahardja" <dr****************************@pobox.comwrote in
message news:9g********************************@4ax.com...
>>On Fri, 22 Jun 2007 22:55:38 -0700, Cagdas Ozgenc
<ca***********@gmail.com>
wrote:

When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Serialization is a complex issue that has so far eluded a truly general
solution, primarily because the needs of each developer varies so
much. There
are several "classes" of techniques, though. They are described quite
well in
the C++ FAQ Lite pages:
http://www.parashift.com/c++-faq-lit...alization.html

When I want to do general-purpose, cross-platform, binary-compatible
exchanges, I generally:

1. Pack data structures to the byte (using #pragmas, most times)

I think "no padding" may indeed be a feature that a new language could
exploit.
Not if the hardware doesn't support it, or even supports it with a
significant performance hit.
That would be a nice table to see: CPUs and the supported compiler/language
properties. Writing code that will run on all platforms is a waste of effort
when it is known that the software will never be deployed on those other
platforms. Layering on top of C++ to abstract away what needn't be bothered
with on a daily coding basis is the way to go. Just because C++ is "close to
the hardware" doesn't mean you have to program at that low level all of the
time.

John

Jun 26 '07 #11

P: n/a

"James Kanze" <ja*********@gmail.comwrote in message
news:11*********************@q69g2000hsb.googlegro ups.com...
On Jun 27, 4:03 pm, Dave Rahardja
<drahardja_atsign_pobox_dot_...@pobox.comwrote:
On Tue, 26 Jun 2007 03:30:54 -0500, "JohnQ"
<johnqREMOVETHISprogram...@yahoo.comwrote:
"James Kanze" <james.ka...@gmail.comwrote in message
news:11*********************@p77g2000hsh.googlegro ups.com...
On Jun 23, 11:58 am, "JohnQ" <johnqREMOVETHISprogram...@yahoo.com>
wrote:
(Note too that "endianness" isn't a good word, since it suggests
two possible arrangements. At least three are widespread.)
But that one is called "middle ENDIAN" right? If so, that makes
"endianness"
seem OK.
"I've never heard it called anything:-). It just is. (There are
also word addressed machines, where it makes no sense to speak
of "endian".)"

Well if saying "endian" suggests to would-be/will-be hardware designers that
there are only two, that would be a good thing. Even a better thing if they
choose to deprecate the less ubiquitous perversions.

John

Jun 28 '07 #12

P: n/a

"James Kanze" <ja*********@gmail.comwrote in message
news:11*********************@p77g2000hsh.googlegro ups.com...
On Jun 23, 11:58 am, "JohnQ" <johnqREMOVETHISprogram...@yahoo.com>
wrote:
"Cagdas Ozgenc" <cagdas.ozg...@gmail.comwrote in message
news:11*********************@w5g2000hsg.googlegrou ps.com...
When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Does your software/application require that much portability? Why struggle
with "write once, compile anywhere" if you're only targeting one platform
or
even only one machine, for instances?
"And one version of one compiler with one set of compiler options.

I guess he's a professional."

Seems like massochism rather than professionalism.

[...]

"(Note too that "endianness" isn't a good word, since it suggests
two possible arrangements. At least three are widespread.)"

Perhaps you'd like to update http://en.wikipedia.org/wiki/Endianness. (Yes,
they list 3 endian arrangements).

John

Jun 29 '07 #13

This discussion thread is closed

Replies have been disabled for this discussion.