473,320 Members | 1,936 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

endian conversion - composite type



Data stored on a storage device is byte swapped. The data is big
endian and my PC is little. At issue: There's a composite type ( a
header ) at the front of the files that I'm trying to read in. I'm
trying to _simulate_ the endian conversion in code below but I'm just
wondering if there's an ideal way to do this besides what's shown?
Padding produces some interesting results. Notice how the parameter d
is different in the print outs . Serializing the data - at the
present time - is not an option.
An aside: Matlab is my prime analysis tool. With matlab I could pass
a parameter to the fopen call and all's well. I'm trying to write
code to do something similar. Thanks in advance

#include <cstdio>
#include <iostream>

typedef unsigned char uc_type ;

#define c( x ) ByteSwap( (unsigned char *) &x, sizeof( x ) )
void ByteSwap( unsigned char * b, int n)
{
register int i = 0;
register int j = n - 1;
while ( i < j )
{
std::swap( b[ i ], b[ j ] );
i++, j--;
}
}
struct foo { // lets try a simple struct
short a; // works
short b; // works
unsigned d ; // introduced padding
//char test [ 5 ] ; // swap these
//double dd ;
//float ar ;
};
void showBytes( foo *barp )
{
size_t i;
unsigned char *cp = (unsigned char *)barp;

for (i = 0 ; i < sizeof(*barp) ; ++i ) {
printf("0x%02X ", (unsigned int)cp[i]);
}
std::cout << std::endl;
}

void showBytes( foo& barp )
{
std::cout << barp.a << std::endl;
std::cout << barp.b << std::endl;
std::cout << barp.d << std::endl;
}

int main()
{
foo bar = {0x0102, 0x0304, 0x2030 };

showBytes( &bar );
ByteSwap ( ( unsigned char*) &bar.a, sizeof ( bar.a ) ) ;
ByteSwap ( ( unsigned char*) &bar.b, sizeof ( bar.b ) ) ;
ByteSwap ( ( unsigned char*) &bar.d, sizeof ( bar.d ) ) ;

//showBytes( bar ) ;
showBytes( &bar );

return 0;
}
/*
0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00
0x01 0x02 0x03 0x04 0x00 0x00 0x20 0x30
Press any key to continue
*/

Jan 10 '07 #1
8 5299
ma740988 wrote:
Data stored on a storage device is byte swapped. The data is big
endian and my PC is little. At issue: There's a composite type ( a
header ) at the front of the files that I'm trying to read in. I'm
trying to _simulate_ the endian conversion in code below but I'm just
wondering if there's an ideal way to do this besides what's shown?
The best way to read binary files is to use an unsigned char buffer and
convert from this buffer to the structure you use in the program for that
data. You make the conversion as complex as your goal of portability are,
considering endianess, type of sign enconding used...

A bit more code to write at first, but avoids the need to worry about
padding and many other issues.

--
Salu2
Jan 10 '07 #2
Julián Albo wrote:
The best way to read binary files is to use an unsigned char buffer and
convert from this buffer to the structure you use in the program for that
data. You make the conversion as complex as your goal of portability are,
considering endianess, type of sign enconding used...

A bit more code to write at first, but avoids the need to worry about
padding and many other issues.
To clarify, the converting code needs to worry about padding inserted in
the byte stream because the source wrote entire structs.

I suggest making it look like a stream filter reading chars from an
underlying stream so you won't ever deal with the buffer and boundary
conditions. Each function to read a particular type needs to a) skip
padding bytes that the source would have inserted to align that type;
b) read and assemble the bytes of the object; c) perhaps do something
really hard for floating-point data using a different representation,
or for bitfield data; d) pick up the value as the correct type and
return it. Sometimes you'll find shortcuts, as when 32 bit data only
needs 16 bit alignment so can be fetched by two calls to the 16 bit
fetcher.

I would add separate functions to mark the beginning and end of each
struct as there is additional padding there not related to the type of
the next member. This will require you to analyze the struct so you
can pass in the alignment the source machine will have assumed for the
struct as a whole. At least you won't have to make every single pad
explicit.

Once, when faced with too much foreign data, I wrote functions to take
a dense character string description of a struct like "ssslccl" and
convert to and from the foreign form, knowing the padding requirements
of both forms.

I consider this a defect in the language. I should be able to declare
the interface properties of the struct (padding, byte order, FP format)
in a standard way and let the compiler choose to implement it or reject
it or maybe half-implement it so special functions could be applied to
the members that can't be accessed normally. We do it anyway for device
drivers with memory-mapped I/O and for MMU structures, but fighting the
compiler every step of the way.
Jan 10 '07 #3

ma740988 wrote:
Data stored on a storage device is byte swapped. The data is big
endian and my PC is little. At issue: There's a composite type ( a
header ) at the front of the files that I'm trying to read in. I'm
trying to _simulate_ the endian conversion in code below but I'm just
wondering if there's an ideal way to do this besides what's shown?
Padding produces some interesting results. Notice how the parameter d
is different in the print outs . Serializing the data - at the
present time - is not an option.
An aside: Matlab is my prime analysis tool. With matlab I could pass
a parameter to the fopen call and all's well. I'm trying to write
code to do something similar. Thanks in advance

#include <cstdio>
#include <iostream>

typedef unsigned char uc_type ;

#define c( x ) ByteSwap( (unsigned char *) &x, sizeof( x ) )
void ByteSwap( unsigned char * b, int n)
{
register int i = 0;
register int j = n - 1;
while ( i < j )
{
std::swap( b[ i ], b[ j ] );
i++, j--;
}
}
struct foo { // lets try a simple struct
short a; // works
short b; // works
unsigned d ; // introduced padding
//char test [ 5 ] ; // swap these
//double dd ;
//float ar ;
};
void showBytes( foo *barp )
{
size_t i;
unsigned char *cp = (unsigned char *)barp;

for (i = 0 ; i < sizeof(*barp) ; ++i ) {
printf("0x%02X ", (unsigned int)cp[i]);
}
std::cout << std::endl;
}

void showBytes( foo& barp )
{
std::cout << barp.a << std::endl;
std::cout << barp.b << std::endl;
std::cout << barp.d << std::endl;
}

int main()
{
foo bar = {0x0102, 0x0304, 0x2030 };

showBytes( &bar );
ByteSwap ( ( unsigned char*) &bar.a, sizeof ( bar.a ) ) ;
ByteSwap ( ( unsigned char*) &bar.b, sizeof ( bar.b ) ) ;
ByteSwap ( ( unsigned char*) &bar.d, sizeof ( bar.d ) ) ;

//showBytes( bar ) ;
showBytes( &bar );

return 0;
}
/*
0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00
0x01 0x02 0x03 0x04 0x00 0x00 0x20 0x30
Press any key to continue
*/
why can't you just do a ntohs, ntohl once you read data off your
storage device. If your pc is little endian, so the ntohl/ntohs
shouldn't be a no-op, and they will swap the bytes for you. The only
problem you may encounter is if your composite header uses nibbles in
order to store data... each nibble would need to be manually swapped
before you recompose your header.

Jan 10 '07 #4
Robert Mabee wrote:
>The best way to read binary files is to use an unsigned char buffer and
convert from this buffer to the structure you use in the program for that
data. You make the conversion as complex as your goal of portability are,
considering endianess, type of sign enconding used...

A bit more code to write at first, but avoids the need to worry about
padding and many other issues.

To clarify, the converting code needs to worry about padding inserted in
the byte stream because the source wrote entire structs.
From the reader point of view this is unimportant. The padding from the
writer's compiler can be seen the same as a FILLER in Cobol, a part of the
organization of the file.
I suggest making it look like a stream filter reading chars from an
underlying stream so you won't ever deal with the buffer and boundary
conditions. Each function to read a particular type needs to a) skip
padding bytes that the source would have inserted to align that type;
Is doable, but may be difficult to evaluate the padding conditions.
c) perhaps do something really hard for floating-point data using a
different representation, or for bitfield data;
Yes, because of that I said that more or less effort will be needed
depending of the portability goal.
Once, when faced with too much foreign data, I wrote functions to take
a dense character string description of a struct like "ssslccl" and
convert to and from the foreign form, knowing the padding requirements
of both forms.
Some time ago I wrote a program that takes a description of the record and
displayed the content of a file according to it. The same can be done
inside a program, or in a program that generates code to be used in the
program that deals with the data.
I consider this a defect in the language. I should be able to declare
the interface properties of the struct (padding, byte order, FP format)
in a standard way and let the compiler choose to implement it or reject
it or maybe half-implement it so special functions could be applied to
the members that can't be accessed normally.
There is no need to make part of the language a thing perfectly doable
without direct language support. This is a general design principle of C++.

--
Salu2
Jan 10 '07 #5

Julián Albo wrote:
ma740988 wrote:
Data stored on a storage device is byte swapped. The data is big
endian and my PC is little. At issue: There's a composite type ( a
header ) at the front of the files that I'm trying to read in. I'm
trying to _simulate_ the endian conversion in code below but I'm just
wondering if there's an ideal way to do this besides what's shown?

The best way to read binary files is to use an unsigned char buffer and
convert from this buffer to the structure you use in the program for that
data. You make the conversion as complex as your goal of portability are,
considering endianess, type of sign enconding used...
Do you know of/have an example of this anywhere I could peruse?

Jan 11 '07 #6
ma740988 wrote:
>The best way to read binary files is to use an unsigned char buffer and
convert from this buffer to the structure you use in the program for that
data. You make the conversion as complex as your goal of portability are,
considering endianess, type of sign enconding used...
Do you know of/have an example of this anywhere I could peruse?
I posted a sample code some time ago in this group, you can try to find it
in google groups.

--
Salu2
Jan 11 '07 #7
ma740988 wrote:
Data stored on a storage device is byte swapped. The data is big
endian and my PC is little.
foo bar = {0x0102, 0x0304, 0x2030 };

0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00
Is it memory dump? Are you shure "0x30 0x20 0x00 0x00 " is little
endian?

0x2030 = = 0x00002030 is not the same as 0x20300000

"0x30 0x20" - low 16 bit big-endian word was placed befor "0x00 0x00" -
high 16 bit big-endian word
It looks like mixed endian (google sad - middle-endian(PDP-endian)). In
the case you can not swap bytes in the same manner as words.

for 0x50607080

big endian is:
word: low byte , high byte
dword: low word, high word

" 0x80, 0x70, 0x60, 0x50 "

little endian must have been:
word: high byte, low byte
dword: high word, low word

" 0x00, 0x00, 0x20, 0x30 "

Use:
?#include <netinet/in.h>
htons(), htonl(), ntohs(), ntohl() - POSIX functions.

Jan 14 '07 #8
Grizlyk wrote:

Fuu, sorry, I see, i have mixed all in my poor head with the huge
number of "endians" applied everywhere.

I have replaced your PC's "endians" and your data's "endians", who is
what and simultaneously replaced "little-endian" and "big-endian" names
for byte order.
ma740988 wrote:
Data stored on a storage device is byte swapped. The data is big
endian and my PC is little.
foo bar = {0x0102, 0x0304, 0x2030 };

0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00

Is it memory dump? Are you shure "0x30 0x20 0x00 0x00 " is little
endian?
Yes, it is correct little endian data on little endian PC.
"0x30 0x20" - low 16 bit big-endian word was placed befor "0x00 0x00" -
high 16 bit big-endian word
No, "0x30 0x20" - low 16 bit little-endian word was placed befor "0x00
0x00" - high 16 bit little-endian word, was correct placed for
little-endian 32 bit dword.
It looks like mixed endian
No, this is wrong
for 0x50607080

big endian is:
word: low byte , high byte
dword: low word, high word

" 0x80, 0x70, 0x60, 0x50 "
No, this is little endian
little endian must have been:
word: high byte, low byte
dword: high word, low word

" 0x00, 0x00, 0x20, 0x30 "
" 0x50, 0x60, 0x70, 0x80 "
No, this is big endian

It seems to me, the "endians" distribution are more correct. Or no?

Jan 14 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Perception | last post by:
Hello all, If I have a C-like data structure such that struct Data { int a; //16-bit value char; //3 ASCII characters int b; //32-bit value int c; //24-bit value }
5
by: glueless | last post by:
I have to read files from UNIX systems on my PC. The problem is that these binary files are in big endian and I need to convert them. I saw that there are functions ntohl for my visual C++ (4.0),...
2
by: Mehta Shailendrakumar | last post by:
Hi, I am sending this question again as new question rather than reply to old question Please refer below: struct raw_data { unsigned char x; unsigned char y; }; union full_data
1
by: Marquee | last post by:
Hello, I have a class that I would like to serialize to non .NET TCP/IP computers. Therefore the serial data has to endian aware i.e. has to be converted from Intel little endian to network...
14
by: dave.dolan | last post by:
Basically I'd like to implement the composite design pattern with leaves that are either of reference or value types, but even using generics I can't seem to avoid boxing (using ArrayList or...
5
by: Marc Gravell | last post by:
Short version: is it possible to control the endian-ness of BitConverter? - or are there any /framework/ methods like BigEndianBitConverter.GetBytes(long) and .ToInt64()? (I am not after...
11
by: kolmogolov | last post by:
hi, it's not really an endian problem. I think I must be missing something else ... The problem can be reduced to different results of the following two segments of codes: (cut and pasted...
6
by: Peter Lee | last post by:
what's the correct behaver about the following code ? ( C++ standard ) I got a very strange result.... class MyClass { public: MyClass(const char* p) { printf("ctor p=%s\n", p);
6
by: Javier | last post by:
Hello people, I'm recoding a library that made a few months ago, and now that I'm reading what I wrote I have some questions. My program reads black and white images from a bitmap (BMP 24bpp...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.