473,748 Members | 9,416 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

serializing an arbitrary data structure into a flat buffer (raw contiguousmemor y block)

Hi,

I am writing a messaging library which will allow me to send a generic
message structure with custom "payloads".

In many cases, a message must store a non-linear data structure (i.e.
"payload") using pointers. Examples of these are binary trees, hash
tables etc. Thus, the message itself contains only a pointer to the
actual data. When the message is sent to the same processor, these
pointers point to the original locations, which are within the address
space of the same processor. However, when such a message is sent to
other processors, these pointers will point to invalid locations.

I need a way to ``serialize'' (or pack) my message structures into a
contiguous raw memory block (and then be able to de-serialize or
"unpack" them at the other end.

I just need a simple example, using a simple structure that contains
pointers (say a ptr to another struct, or a char*) so that I can build
on from that.

Searches on Google over the last few days have yielded nothig useful.

Thanks

Nov 15 '05 #1
5 4797
Alfonso Morra wrote:
Hi,

I am writing a messaging library which will allow me to send a generic
message structure with custom "payloads".

In many cases, a message must store a non-linear data structure (i.e.
"payload") using pointers. Examples of these are binary trees, hash
tables etc. Thus, the message itself contains only a pointer to the
actual data. When the message is sent to the same processor, these
pointers point to the original locations, which are within the address
space of the same processor. However, when such a message is sent to
other processors, these pointers will point to invalid locations.

I need a way to ``serialize'' (or pack) my message structures into a
contiguous raw memory block (and then be able to de-serialize or
"unpack" them at the other end.

I just need a simple example, using a simple structure that contains
pointers (say a ptr to another struct, or a char*) so that I can build
on from that.

Searches on Google over the last few days have yielded nothig useful.

Thanks

A typical solution would be to store first all common data for the
message structure. Next, an "integer" (whichever integral type you
prefer) with the count of elements contained. Then, for every element,
the common part of it, followed by a count element, followed bu every
sub-element... Thus you will have no pointers and all (relevant) data
contained in the message.

In some cases, when elements have different structures, you may need to
prefix them with a type tag and/or a size field.

For instance:

<Total-size><message structure fields><number of elements>
{for every element}
<Element size><element type tag><element fields><number of
sub-elements>
{for every sub-element}
...
<some kind of element checksum>
<some kind of total checksum>
Nov 15 '05 #2


Zara wrote:
Alfonso Morra wrote:
Hi,

I am writing a messaging library which will allow me to send a generic
message structure with custom "payloads".

In many cases, a message must store a non-linear data structure (i.e.
"payload") using pointers. Examples of these are binary trees, hash
tables etc. Thus, the message itself contains only a pointer to the
actual data. When the message is sent to the same processor, these
pointers point to the original locations, which are within the address
space of the same processor. However, when such a message is sent to
other processors, these pointers will point to invalid locations.

I need a way to ``serialize'' (or pack) my message structures into a
contiguous raw memory block (and then be able to de-serialize or
"unpack" them at the other end.

I just need a simple example, using a simple structure that contains
pointers (say a ptr to another struct, or a char*) so that I can build
on from that.

Searches on Google over the last few days have yielded nothig useful.

Thanks

A typical solution would be to store first all common data for the
message structure. Next, an "integer" (whichever integral type you
prefer) with the count of elements contained. Then, for every element,
the common part of it, followed by a count element, followed bu every
sub-element... Thus you will have no pointers and all (relevant) data
contained in the message.

In some cases, when elements have different structures, you may need to
prefix them with a type tag and/or a size field.

For instance:

<Total-size><message structure fields><number of elements>
{for every element}
<Element size><element type tag><element fields><number of
sub-elements>
{for every sub-element}
...
<some kind of element checksum>
<some kind of total checksum>


Thanks - but this is not what I'm looking for. Your code looks like some
kind of markup language. What I want is a byte stream (i.e. binary data).

For those reading - I am not concerned with endianess and other low
level details (its not necessary for my purposes).

Nov 15 '05 #3
Alfonso Morra wrote:


Zara wrote:
Alfonso Morra wrote:
Hi,

I am writing a messaging library which will allow me to send a
generic message structure with custom "payloads".
(...)
I need a way to ``serialize'' (or pack) my message structures into a
contiguous raw memory block (and then be able to de-serialize or
"unpack" them at the other end.
(...)
For instance:

<Total-size><message structure fields><number of elements>
{for every element}
<Element size><element type tag><element fields><number of
sub-elements>
{for every sub-element}
...
<some kind of element checksum>
<some kind of total checksum>


Thanks - but this is not what I'm looking for. Your code looks like some
kind of markup language. What I want is a byte stream (i.e. binary data).

For those reading - I am not concerned with endianess and other low
level details (its not necessary for my purposes).


Well, although my instance is full of < and >, it is not a mark-up
language. Suppose this:

struct node {
char *name;
node * next;
};

struct structure {
char *string;
node *list;
} my_structure;

and let it be:

my_structure
"Root node"
list---------->node 1
"It's me"
next------------>node 2
"It's I"
next->NULL

the message, in binary, could look something like:

00 23 00 09 52 6f 6f 54 20 6e 6f 64 65 00 02 00
0a 01 00 07 49 54 27 53 20 6d 65 00 09 01 00 06
49 54 27 53 20 49

Which has all of the data contained in it, except the checksums (I don`t
feel like putting them in), and it supposes a little-endian, ASCII machine.

Now, if you bother with looking at it, you will see it fits with your
specs, and is described by my former message

Regards

Nov 15 '05 #4


Zara wrote:
Alfonso Morra wrote:


Zara wrote:
Alfonso Morra wrote:

Hi,

I am writing a messaging library which will allow me to send a
generic message structure with custom "payloads".
(...)

I need a way to ``serialize'' (or pack) my message structures into a
contiguous raw memory block (and then be able to de-serialize or
"unpack" them at the other end.

(...)

For instance:

<Total-size><message structure fields><number of elements>
{for every element}
<Element size><element type tag><element fields><number of
sub-elements>
{for every sub-element}
...
<some kind of element checksum>
<some kind of total checksum>

Thanks - but this is not what I'm looking for. Your code looks like
some kind of markup language. What I want is a byte stream (i.e.
binary data).

For those reading - I am not concerned with endianess and other low
level details (its not necessary for my purposes).


Well, although my instance is full of < and >, it is not a mark-up
language. Suppose this:

struct node {
char *name;
node * next;
};

struct structure {
char *string;
node *list;
} my_structure;

and let it be:

my_structure
"Root node"
list---------->node 1
"It's me"
next------------>node 2
"It's I"
next->NULL

the message, in binary, could look something like:

00 23 00 09 52 6f 6f 54 20 6e 6f 64 65 00 02 00
0a 01 00 07 49 54 27 53 20 6d 65 00 09 01 00 06
49 54 27 53 20 49

Which has all of the data contained in it, except the checksums (I don`t
feel like putting them in), and it supposes a little-endian, ASCII machine.

Now, if you bother with looking at it, you will see it fits with your
specs, and is described by my former message

Regards


You've completely lost me now. I have no idea how you arived at the hex
dump from your two structures. What I'm really after is a simple example
(or a link to a site where I can see an example of serializing a simple
struct containing pointers). I have searchedGoogle over the last three
days - to no avail.

It does not have to be anything too complicated. Simply so that I can
build on it and use it as the starting point for serializing my
stuctures - although I have a rough idea of what you're doing, I am
unfortunately, unable to build on your examples thus far.

Nov 15 '05 #5

In article <dh**********@n wrdmz02.dmz.ncs .ea.ibs-infra.bt.com>, Alfonso Morra <sw***********@ the-ring.com> writes:

In many cases, a message must store a non-linear data structure (i.e.
"payload") using pointers. Examples of these are binary trees, hash
tables etc. Thus, the message itself contains only a pointer to the
actual data. When the message is sent to the same processor, these
pointers point to the original locations, which are within the address
space of the same processor. However, when such a message is sent to
other processors, these pointers will point to invalid locations.

I need a way to ``serialize'' (or pack) my message structures into a
contiguous raw memory block (and then be able to de-serialize or
"unpack" them at the other end.
This is not a trivial problem, but it's not a particularly difficult
one, either.

In some cases the original data structure, or a suitable facsimile,
can be reconstructed from the data alone. This is often the case
with hash tables, sorted lists, binary trees, and so forth. The
sending side simply sends the nodes and the receiving side inserts
them into the appropriate data structure.

In the general case, however, you need a mechanism for preserving
associations between pieces of data. Pointers are such a mechanism,
but they (almost always[1]) represent a system-specific, and usually
process-specific, mapping, and in any case the information that a
portable C program can extract from them is limited.

So the obvious solution is to have your serialization process replace
the pointers with some portable representation of the associations
between items. One very simple approach is to serialize all of the
items into a single block of malloc'd memory (using a pointer to
unsigned char), and replace the pointers with offsets into that
block. The deserializer extracts items, remembering the locations it
has extracted them to, and converts from offsets back into pointers.

A better scheme is probably to label each item with a unique
identifier and replace each pointer with the identifier of the object
it points to. This is essentially the same as the "offset" scheme
except that it makes the mapping explicit (offsets are really just
unique IDs). That increases the information available to the
deserializer, which makes it more robust - it's easier for it to
detect malformed input. Transporting data and converting it among
representations are fragile, vulnerable operations, and you want to
make them as robust as possible.
I just need a simple example, using a simple structure that contains
pointers (say a ptr to another struct, or a char*) so that I can build
on from that.


It's difficult to provide a robust, portable, short example, because
this is not a problem that lends itself to short, portable code.
Portable data representations require marshalling and unmarshalling
from and to the local system's representation. Furthermore, to
really handle the general case, you have to keep a map from object
addresses to IDs while serializing (so that each pointer can be
converted to its ID), and a reverse map while deserializing.

Here's an outline for the serializer:

- Walk the data, creating a unique ID for each item and mapping
it to the item's address. You'll have to choose what data structure
to use for the map; a hash table (keyed by address) is an obvious
choice, but might not be worth the overhead and complexity.

- As each item is serialized, prefix it with its ID (and, presumably,
type information and any other metadata your system needs to provide).

- In the serialized representation, replace each pointer field with
the ID of the pointed-to object.

This two-pass approach is simpler than a single pass, which would
have to remember the locations in the serialized data of pointer/ID
fields that referred to objects that hadn't yet been assigned an ID,
so you could fill those in later.

The deserializer would use a similar two-pass process, first
allocating areas for each item and building a map between area and ID
in the process, then deserializing each item into its area and
setting pointer fields using the map.
1. There are esoteric architectures which use "fat" pointers that
contain more information than simply an offset into address space,
but that's an implementation detail that's not useful in portable C
programming.

--
Michael Wojcik mi************@ microfocus.com

Against all odds, over a noisy telephone line, tapped by the tax authorities
and the secret police, Alice will happily attempt, with someone she doesn't
trust, whom she can't hear clearly, and who is probably someone else, to
fiddle her tax return and to organise a coup d'etat, while at the same time
minimising the cost of the phone call. -- John Gordon
Nov 15 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
3977
by: Matthias Czapla | last post by:
Hi! Whats the canonical way for handling raw data. I want to read a file without making any assumption about its structure and store portions of it in memory and compare ranges with constant byte sequences. _I_ would read it into arrays of unsigned char and use C's memcmp(), but as you see Im a novice C++ programmer and think that theres some better, typically used, way. Regards
0
412
by: Alfonso Morra | last post by:
Hi, I am writing a messaging library which will allow me to send a generic message structure with custom "payloads". In many cases, a message must store a non-linear data structure (i.e. "payload") using pointers. Examples of these are binary trees, hash tables etc. Thus, the message itself contains only a pointer to the actual data. When the message is sent to the same processor, these pointers point to the original locations, which...
10
8306
by: copx | last post by:
I want to save a struct to disk.... as plain text. At the moment I do it with a function that just writes the data using fprintf. I mean like this: fprintf(fp, "%d %d", my_struct.a, my_struct.b) This way I have to write another "serializing" function for every new kind of struct I want to write, though. Is there a way to write functions that can write/read any struct to/from plain text format in a portable way?
0
329
by: Ken Allen | last post by:
I am relatively new to C# and .Net in general, but have been programming in many other languages for more years than I want to recall. I have a problem where an existing set of code (actually a Windows kernel component developed by a company I do some work for) is returning information to user space in the buffer from an IOCTL call in a 'streaming' format that mixes variable length strings, 8-, 16-, 32-, and 64-bit integer values. The...
0
1413
by: olsonchris | last post by:
Hello all, I have what appears to be a simple question but after quite a bit of research I can't seem to find an answer. Basically, I am serializing a class into an XML document. This is creating a XML doc that includes the following: <Document:File> <buffer>JKDIDIEJ</buffer> </Document:File>
13
5257
by: Leszek Taratuta | last post by:
Hello, I have several drop-down lists on my ASP.NET page. I need to keep data sources of these lists in Session State. What would be the most effective method to serialize this kind of data structures? Thanks for any hints, Leszek Taratuta
6
2727
by: | last post by:
Hi all, is there a better way to stream binary data stored in a table in sql 2005 to a browser in .net 2.0? Or is the code same as in .net 1.1? We noticed that in certain heavy load scenarios, every now and then the client would timeout and have to re-initiate the request... TIA!
1
2752
by: tony.fountaine | last post by:
I am working on a project to read a Bosch Measurement Data File (MDF). The file contains a number of blocks that can be read from the file using a baisc structure. For example the ID BLOCK is as follows, (Data Type) (Number of Elements) (Description) CHAR 8 File identifier, CHAR 8 Format identifier, CHAR 8 Program identifier,
0
3583
by: george585 | last post by:
Hello! I am new to network programming, and understand just basics. Using some sample code, and having read documentation, I managed to create a simple app in C# and VB.NET. The application is supposed to do the following: monitor ALL INCOMING TCP traffic on the local computer, and save certain parts of it as files - not log files though, but actual files that are sent to the computer as part of http or ftp. Basically if a user browse a page...
1
9782
by: starter08 | last post by:
Hi, I have a C++ routine(client-side) which uploads an xml file to a web server by making a socket connection and sending all the post request through that socket. On the server side I have a cgi script which receives all the data and creates a file in the specified directory. If I am uploading only the file all works well, however I want to send data of other fields too (field1, field2 ..etc), this fails the post request and even the file is...
0
8984
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9530
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9363
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9238
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6793
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6073
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4864
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2775
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2206
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.