directly serializing structs

Cagdas Ozgenc

Greetings,

When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Thank you

Jun 23 '07 #1

Subscribe Post Reply

4878

John Harrison

Cagdas Ozgenc wrote:

Greetings,

When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)?

Yes. You also risk portability problems because different compilers (or
platforms) having different formats for individual data items. Byte
ordering for integers often varies across platforms, floating point
formats can even vary for different compilers on the same platform.

Once the file is

serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Yes, see above.

If this is a concern then consider using text, it's much more portable.

>
Thank you

Jun 23 '07 #2

Ian Collins

Cagdas Ozgenc wrote:

Greetings,

When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Yes, it does. Internal layout is implementation defined.

--
Ian Collins.

Jun 23 '07 #3

JohnQ

"Cagdas Ozgenc" <ca***********@gmail.comwrote in message
news:11*********************@w5g2000hsg.googlegrou ps.com...

Greetings,

When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Does your software/application require that much portability? Why struggle
with "write once, compile anywhere" if you're only targeting one platform or
even only one machine, for instances?

I wouldn't call what you described above "serializing" though. To me,
"serializing" has the connotation that you indeed are looking into structs
and the sizes of data members, their padding etc. or using ASN.1/BER for
over the wire transmission, for another example, rather than just doing a
struct-sized write. The recommended practice of streaming everything at
every boundary (disk, wire) seems unnatural and tedious to me also. I guess
a layer at the boundaries that does the streaming on the non-primary
platform and doesn't do anything on the primary platform isn't that bad to
implement.

I can think of 3 issues that prevent the the "blast struct all over" concept
from working: endianess, padding/alignment, datatype sizes. The first one is
the party spoiler. Guaranteed width integers helps for the last issue.
Byte-aligning data (no padding) is probably available on most compilers (?).
Endianess though, well there's not much you can do about that to make the
concept work. Luckily, the users of big endian machines are mostly
categorizably different from little endian machine users, so you can just
pick your target users and tailor your software to them. Or else do the
conversions:

struct on Intel going over wire to a Sparc -no change to struct
struct coming into Sparc from Intel -convert struct endianess
struct on Sparc going to disk -convert struct endianess
struct on Sparc going to Intel -convert struct endianess
struct coming into Intel from Sparc -no change to struct
stuct on Intel going to disk -no change to struct

(The above scenario assumes platform-independent files are desired. If not,
fewer conversions required).
(Yes, before anyone quips, I do know that "network byte order" is big
endian. There's also more Windows machines than Unix).

(Issue 4: size of a byte).

John

Jun 23 '07 #4

Dave Rahardja

On Fri, 22 Jun 2007 22:55:38 -0700, Cagdas Ozgenc <ca***********@gmail.com>
wrote:

>When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Serialization is a complex issue that has so far eluded a truly general
solution, primarily because the needs of each developer varies so much. There
are several "classes" of techniques, though. They are described quite well in
the C++ FAQ Lite pages:
http://www.parashift.com/c++-faq-lit...alization.html

When I want to do general-purpose, cross-platform, binary-compatible
exchanges, I generally:

1. Pack data structures to the byte (using #pragmas, most times)
2. Use fixed-width integer types
3. Choose an endian representation and provide conversion facilities
4. Use IEEE representation for floating point numbers, else use fixed point
notation
5. Serialize PODs and structs only, not class hierarchies

An adaptation library with conditional compilation switches can be made for
items 1-4 that allows you to encapsulate the compiler- or platform-specific
behaviors.

-dr

Jun 23 '07 #5

JohnQ

"Dave Rahardja" <dr****************************@pobox.comwrote in message
news:9g********************************@4ax.com...

On Fri, 22 Jun 2007 22:55:38 -0700, Cagdas Ozgenc
<ca***********@gmail.com>
wrote:

>>When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Serialization is a complex issue that has so far eluded a truly general
solution, primarily because the needs of each developer varies so much.
There
are several "classes" of techniques, though. They are described quite well
in
the C++ FAQ Lite pages:
http://www.parashift.com/c++-faq-lit...alization.html

When I want to do general-purpose, cross-platform, binary-compatible
exchanges, I generally:

1. Pack data structures to the byte (using #pragmas, most times)

I think "no padding" may indeed be a feature that a new language could
exploit.

2. Use fixed-width integer types
3. Choose an endian representation and provide conversion facilities

That's the key one. If there were one gift that the hardware vendors good
give, it would be to standardize endianess. IMO. OK, it's little endian from
now on. Let's move on! LOL! (Oh wait, can I have a standard definition of
"byte" also?).

4. Use IEEE representation for floating point numbers, else use fixed
point
notation
5. Serialize PODs and structs only, not class hierarchies

By "class hierarchies", I think you mean "derived structs". If there were
more guarantee (or I was so assured) that struct B derived from struct A
would be exactly like a struct containing the data members of A followed
immediately by data members of B, I'd be eventually OK with those
compositions.

>
An adaptation library with conditional compilation switches can be made
for
items 1-4 that allows you to encapsulate the compiler- or
platform-specific
behaviors.

Grouping those hides the "severity" of 3.

Even with your 1-5, all bets are still off because sizeof(char) could be
different somewhere else (right?).

John

Jun 24 '07 #6

Ian Collins

JohnQ wrote:

>
"Dave Rahardja" <dr****************************@pobox.comwrote in
message news:9g********************************@4ax.com...
>On Fri, 22 Jun 2007 22:55:38 -0700, Cagdas Ozgenc
<ca***********@gmail.com>
wrote:

>>When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Serialization is a complex issue that has so far eluded a truly general
solution, primarily because the needs of each developer varies so
much. There
are several "classes" of techniques, though. They are described quite
well in
the C++ FAQ Lite pages:
http://www.parashift.com/c++-faq-lit...alization.html

When I want to do general-purpose, cross-platform, binary-compatible
exchanges, I generally:

1. Pack data structures to the byte (using #pragmas, most times)

I think "no padding" may indeed be a feature that a new language could
exploit.

Not if the hardware doesn't support it, or even supports it with a
significant performance hit.

--
Ian Collins.

Jun 24 '07 #7

James Kanze

On Jun 23, 7:55 am, Cagdas Ozgenc <cagdas.ozg...@gmail.comwrote:

When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Very much so. Even changing the compile flags can cause
problems. About the only time this works is for temporary
files, which are read and written by the same binary imagine.

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 24 '07 #8

James Kanze

On Jun 23, 11:58 am, "JohnQ" <johnqREMOVETHISprogram...@yahoo.com>
wrote:

"Cagdas Ozgenc" <cagdas.ozg...@gmail.comwrote in message
news:11*********************@w5g2000hsg.googlegrou ps.com...
When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Does your software/application require that much portability? Why struggle
with "write once, compile anywhere" if you're only targeting one platformor
even only one machine, for instances?

And one version of one compiler with one set of compiler options.

I guess he's a professional.

[...]

I can think of 3 issues that prevent the the "blast struct all
over" concept from working: endianess, padding/alignment,
datatype sizes.

Representation in general. For floating point, it's a real
problem, even today. For integers, there is also at least one
machine on the market which uses 36 bit ones complement
integers, but it's not very wide spread, and many people can
afford to ignore it.

Just be aware of the restriction, and document it, so that some
maintenance programmer in the future doesn't get bitten. And
whatever you do, document all external formats, so a maintenance
programmer has a chance of implementing them on some future
material.

The first one is the party spoiler.

I'd say that the different representations are even worse.
(Note too that "endianness" isn't a good word, since it suggests
two possible arrangements. At least three are widespread.)

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 24 '07 #9

JohnQ

"James Kanze" <ja*********@gmail.comwrote in message
news:11*********************@p77g2000hsh.googlegro ups.com...
On Jun 23, 11:58 am, "JohnQ" <johnqREMOVETHISprogram...@yahoo.com>
wrote:

(Note too that "endianness" isn't a good word, since it suggests
two possible arrangements. At least three are widespread.)

But that one is called "middle ENDIAN" right? If so, that makes "endianness"
seem OK.

John

Jun 26 '07 #10

JohnQ

"Ian Collins" <ia******@hotmail.comwrote in message
news:5e*************@mid.individual.net...

JohnQ wrote:
>>
"Dave Rahardja" <dr****************************@pobox.comwrote in
message news:9g********************************@4ax.com...
>>On Fri, 22 Jun 2007 22:55:38 -0700, Cagdas Ozgenc
<ca***********@gmail.com>
wrote:

When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Serialization is a complex issue that has so far eluded a truly general
solution, primarily because the needs of each developer varies so
much. There
are several "classes" of techniques, though. They are described quite
well in
the C++ FAQ Lite pages:
http://www.parashift.com/c++-faq-lit...alization.html

When I want to do general-purpose, cross-platform, binary-compatible
exchanges, I generally:

1. Pack data structures to the byte (using #pragmas, most times)

I think "no padding" may indeed be a feature that a new language could
exploit.

Not if the hardware doesn't support it, or even supports it with a
significant performance hit.

That would be a nice table to see: CPUs and the supported compiler/language
properties. Writing code that will run on all platforms is a waste of effort
when it is known that the software will never be deployed on those other
platforms. Layering on top of C++ to abstract away what needn't be bothered
with on a daily coding basis is the way to go. Just because C++ is "close to
the hardware" doesn't mean you have to program at that low level all of the
time.

John

Jun 26 '07 #11

JohnQ

"James Kanze" <ja*********@gmail.comwrote in message
news:11*********************@q69g2000hsb.googlegro ups.com...
On Jun 27, 4:03 pm, Dave Rahardja
<drahardja_atsign_pobox_dot_...@pobox.comwrote:

On Tue, 26 Jun 2007 03:30:54 -0500, "JohnQ"

<johnqREMOVETHISprogram...@yahoo.comwrote:

"James Kanze" <james.ka...@gmail.comwrote in message
news:11*********************@p77g2000hsh.googlegro ups.com...
On Jun 23, 11:58 am, "JohnQ" <johnqREMOVETHISprogram...@yahoo.com>
wrote:

(Note too that "endianness" isn't a good word, since it suggests
two possible arrangements. At least three are widespread.)

But that one is called "middle ENDIAN" right? If so, that makes
"endianness"
seem OK.

"I've never heard it called anything:-). It just is. (There are
also word addressed machines, where it makes no sense to speak
of "endian".)"

Well if saying "endian" suggests to would-be/will-be hardware designers that
there are only two, that would be a good thing. Even a better thing if they
choose to deprecate the less ubiquitous perversions.

John

Jun 28 '07 #12

JohnQ

"James Kanze" <ja*********@gmail.comwrote in message
news:11*********************@p77g2000hsh.googlegro ups.com...
On Jun 23, 11:58 am, "JohnQ" <johnqREMOVETHISprogram...@yahoo.com>
wrote:

"Cagdas Ozgenc" <cagdas.ozg...@gmail.comwrote in message
news:11*********************@w5g2000hsg.googlegrou ps.com...
When directly serializing C++ structures to a file with the standard
library functions giving the address of the data and length of
structure using the sizeof operator, do I risk portability because of
different compilers packing structures into different sizes or
components of this structure to different address boundaries (for
example placing in multiples of 4 on a 32bit system)? Once the file is
serialized, does the same code compiled by another compiler or even
the same compiler but a different version carry the risk of not
reading the contents properly?

Does your software/application require that much portability? Why struggle
with "write once, compile anywhere" if you're only targeting one platform
or
even only one machine, for instances?

"And one version of one compiler with one set of compiler options.

I guess he's a professional."

Seems like massochism rather than professionalism.

[...]

"(Note too that "endianness" isn't a good word, since it suggests
two possible arrangements. At least three are widespread.)"

Perhaps you'd like to update http://en.wikipedia.org/wiki/Endianness. (Yes,
they list 3 endian arrangements).

John

Jun 29 '07 #13

by: Bruno Jouhier | last post by:

Is there a way to serialize a graph of objects and get the output as an XML document, without first serializing to text and then parsing the text (I know how to do this but I find it really silly...

.NET Framework

"serializing" structs in C

by: copx | last post by:

I want to save a struct to disk.... as plain text. At the moment I do it with a function that just writes the data using fprintf. I mean like this: fprintf(fp, "%d %d", my_struct.a, my_struct.b)...

C / C++

Serializing DataColumn

by: Chris | last post by:

I'm having trouble Serializing a System.Data.DataColumn object. When I try to serialize it, I get the following: System.NotSupportedException: Cannot serialize member...

C# / C Sharp

Serializing?

by: Tobias Zimmergren | last post by:

Hi, just wondering what serializing really is, and howto use it? Thanks. Tobias __________________________________________________________________ Tobias ICQ#: 55986339 Current ICQ status: +...

C# / C Sharp

Serializing array of struct

by: Olav Langeland | last post by:

I have a structure like this: public struct MyStruct { public long lVal; public short sVal; } The structure is used like this:

C# / C Sharp

Serializing a list of different object types.

by: chaitanyag | last post by:

Hi, I have my data stored in a set of classes (or structs, doesn't matter), which I am trying to serialize. These classes are stored in an ArrayList, which serializes ok when all the objects in...

C# / C Sharp

serializing a class & performance -- advice?

by: Jason Shohet | last post by:

We are thinking of serializing an object & passing it toseveral functions on web service. This will happen about 35 times as the page loads. The class has about 20 attributes. We're not sure...

ASP.NET

Question about Serializing classes

by: Simon | last post by:

I'm developing a new application and want to use serialization as a way to save my data. But as I add new variables to my classes, how will serializing cope with that? For example, suppose I have...

Visual Basic .NET

Naming structs with a variable

by: Marty | last post by:

I am new to C# and to structs so this could be easy or just not possible. I have a struct defined called Branch If I use Branch myBranch = new Branch(i); // everything works If I use Branch...

C# / C Sharp

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

directly serializing structs

Similar topics