Connecting Tech Pros Worldwide Help | Site Map

Q about endian-ness/portability

Joe C
Guest
 
Posts: n/a
#1: Jul 22 '05
I have some code that performs bitwise operations on files. I'm trying to
make the code portable on different endian systems. This is not work/school
related...just trying to learn/understand.

My computer is little endian 32-bit (intel, imagine that). My code deals
with binary data in files, and I've been treating all data as native words
(32-bit) for performance reasons. I found out that if I treat the data as
long long (64-bit) I only suffer a 5% performance hit. So...I'm thinking
that making it 64-bit might be a forward-looking way to support future
platforms.
Anyway...I don't have access to a 64-bit Big-endian system, and I want to
make sure I understand how the data is internally represented on such a
system.

Suppose I have a file containing 8-bytes. In Ascii, it contains:
"abcdefgh"
In hex, the file contains:
61 62 63 64 65 66 67 68
The file is then read into memory on my machine (2 different ways) and on a
hypothetical big-endian 64-bit machine. Each system does an operation on
the data then writes the data to a binary file. Will all three files
contain the identical bit-sequence? Thanks.

Case1)
I read this data as binary into a 2-element array of 32-bit words, on my
little-endian machine using something like:
in.read(reinterpret_cast<char*>(array), 8)
after which:
array[0] == 1684234849 == 0x64636261
array[1] == 1751606885 == 0x68676665

I then do the following transformation (rotate "right" 1-bit) and write the
binary output to a file:
int carrybit = array[0] & 1;
array[0] = (array[0] >> 1) | ((array[1] & 1) << 31);
array[1] = (array[1] >> 1) | (carrybit << 31);
ofstream out ("fileout1.dat", ios::binary | ios::out);
char* o = reinterpret_cast<char*>(array);
out.write(o, 8);

Case2)
I read this data as binary into long long variable (64-bit), on my
little-endian machine using something like:
in.read(reinterpret_cast<char*>(&variable), 8)
after which:
variable == 7523094288207667809 == 0x6867666564636261

I then do the following transformation (rotate "right" 1-bit) and write the
binary output to a file:
variable = (variable >> 1) | (variable << 63);
ofstream out ("fileout2.dat", ios::binary | ios::out);
char* o = reinterpret_cast<char*>(&variable);
out.write(o, 8);

Case3)
I read this data as binary into a 64-bit variable on a hypothetical
big-endian machine using something like:
in.read(reinterpret_cast<char*>(&variable), 8)
after which:
variable == 7017280452245743464 == 0x6162636465666768

I then do the following transformation (rotate "left" 1-bit) and write the
binary output to a file:
variable = (variable << 1) | (variable >> 63);
ofstream out ("fileout3.dat", ios::binary | ios::out);
char* o = reinterpret_cast<char*>(&variable);
out.write(o, 8);

_______________________

The question...do all three files contain identical data, namely(hex):
30 b1 31 b2 32 b3 33 b4

Thanks for your help.

Joe


Kevin Saff
Guest
 
Posts: n/a
#2: Jul 22 '05

re: Q about endian-ness/portability



"Joe C" <jkc8289@bellsouth.net> wrote in message
news:RYWMb.50837$qC.49288@bignews3.bellsouth.net.. .[color=blue]
> I have some code that performs bitwise operations on files. I'm trying to
> make the code portable on different endian systems. This is not[/color]
work/school[color=blue]
> related...just trying to learn/understand.
>
> My computer is little endian 32-bit (intel, imagine that). My code deals
> with binary data in files, and I've been treating all data as native words
> (32-bit) for performance reasons. I found out that if I treat the data as
> long long (64-bit) I only suffer a 5% performance hit. So...I'm thinking
> that making it 64-bit might be a forward-looking way to support future
> platforms.[/color]

On the other hand, "long long" is a non-standard extension to C++.
[color=blue]
> Anyway...I don't have access to a 64-bit Big-endian system, and I want to
> make sure I understand how the data is internally represented on such a
> system.
>
> Suppose I have a file containing 8-bytes. In Ascii, it contains:
> "abcdefgh"
> In hex, the file contains:
> 61 62 63 64 65 66 67 68
> The file is then read into memory on my machine (2 different ways) and on[/color]
a[color=blue]
> hypothetical big-endian 64-bit machine. Each system does an operation on
> the data then writes the data to a binary file. Will all three files
> contain the identical bit-sequence? Thanks.[/color]

Maybe. In general C++ cannot guarantee that your file is portable.
However, in these cases I think one usually assumes that both computers use
the same char-size, and a set of chars written by one computer can be read
in the same order by the other computer. On different computers, bit
sequences are not required to have the same textual representation, or
signify the same numbers.
[color=blue]
> Case1)
> I read this data as binary into a 2-element array of 32-bit words, on my
> little-endian machine using something like:
> in.read(reinterpret_cast<char*>(array), 8)
> after which:
> array[0] == 1684234849 == 0x64636261
> array[1] == 1751606885 == 0x68676665
>
> I then do the following transformation (rotate "right" 1-bit) and write[/color]
the[color=blue]
> binary output to a file:
> int carrybit = array[0] & 1;
> array[0] = (array[0] >> 1) | ((array[1] & 1) << 31);
> array[1] = (array[1] >> 1) | (carrybit << 31);
> ofstream out ("fileout1.dat", ios::binary | ios::out);
> char* o = reinterpret_cast<char*>(array);
> out.write(o, 8);
>
> Case2)
> I read this data as binary into long long variable (64-bit), on my
> little-endian machine using something like:
> in.read(reinterpret_cast<char*>(&variable), 8)
> after which:
> variable == 7523094288207667809 == 0x6867666564636261
>
> I then do the following transformation (rotate "right" 1-bit) and write[/color]
the[color=blue]
> binary output to a file:
> variable = (variable >> 1) | (variable << 63);
> ofstream out ("fileout2.dat", ios::binary | ios::out);
> char* o = reinterpret_cast<char*>(&variable);
> out.write(o, 8);
>
> Case3)
> I read this data as binary into a 64-bit variable on a hypothetical
> big-endian machine using something like:
> in.read(reinterpret_cast<char*>(&variable), 8)
> after which:
> variable == 7017280452245743464 == 0x6162636465666768
>
> I then do the following transformation (rotate "left" 1-bit) and write the
> binary output to a file:
> variable = (variable << 1) | (variable >> 63);
> ofstream out ("fileout3.dat", ios::binary | ios::out);
> char* o = reinterpret_cast<char*>(&variable);
> out.write(o, 8);[/color]

Some confusions you might have here:

1) You are confused about the meaning of shift left/right. Left shifting is
always multiplication by two (if possible), right shifting division by two,
regardless of the bit representation.

2) Big-endian vs. little-endian is about the order of BYTES, not the order
of BITS. In fact, since a char is by definition the smallest addressable
units of memory in C++, it doesn't really make much since to talk about bit
order. OTOH byte order can be important, especially since IO involves
streaming objects as byte sequences.
[color=blue]
> The question...do all three files contain identical data, namely(hex):
> 30 b1 31 b2 32 b3 33 b4[/color]

Taking a much easier example, say we have the short (0x0102) saved on the
intel (as 0x02 0x01). Then "future computer" reads this in as (0x0201).
Whereas the intel short right-shifts to (0x0081), saving as (0x81 0x00); the
"future computer" will left-shift to (0x0402), written (0x04 0x02). OTOH if
the "future computer" right-shifts, it arrives at (0x8100), which writes
(0x81 0x00), the same as the intel.
[color=blue]
>
> Thanks for your help.
>[/color]

It probably isn't worth coding for this until it comes up. At the least
someone would have to compile and test for the new platform, when needed,
anyway. If/when it is needed an entire compatibility layer would probably
need to be added, which is too much work. Doing this compatibility work
will limit your current design, since it will make it much harder to make
needed changes to your binary format - every new feature will need to be
endian-proofed, and this will discourage real improvements.

HTH
--
KCS


Joe C
Guest
 
Posts: n/a
#3: Jul 22 '05

re: Q about endian-ness/portability



"Kevin Saff" <google.com@kevin.saff.net> wrote in message
news:HrHw6r.7pu@news.boeing.com...
[color=blue]
> On the other hand, "long long" is a non-standard extension to C++.[/color]

right...but a 64 bit integer data-type surely be available.

[color=blue]
>
> Taking a much easier example, say we have the short (0x0102) saved on the
> intel (as 0x02 0x01). Then "future computer" reads this in as (0x0201).
> Whereas the intel short right-shifts to (0x0081), saving as (0x81 0x00);[/color]
the[color=blue]
> "future computer" will left-shift to (0x0402), written (0x04 0x02). OTOH[/color]
if[color=blue]
> the "future computer" right-shifts, it arrives at (0x8100), which writes
> (0x81 0x00), the same as the intel.
>[/color]

Thanks a bunch for this good explaination. My analysis was flawed and you
have shed bright light on the issues.

Joe


EventHelix.com
Guest
 
Posts: n/a
#4: Jul 22 '05

re: Q about endian-ness/portability


The following article should help:

http://www.eventhelix.com/RealtimeMa...ndOrdering.htm

Sandeep
--
http://www.EventHelix.com/EventStudio
EventStudio 2.0 - System Architecture Design CASE Tool
Closed Thread