Q about endian-ness/portability

Joe C

I have some code that performs bitwise operations on files. I'm trying to
make the code portable on different endian systems. This is not work/school
related...just trying to learn/understand.

My computer is little endian 32-bit (intel, imagine that). My code deals
with binary data in files, and I've been treating all data as native words
(32-bit) for performance reasons. I found out that if I treat the data as
long long (64-bit) I only suffer a 5% performance hit. So...I'm thinking
that making it 64-bit might be a forward-looking way to support future
platforms.
Anyway...I don't have access to a 64-bit Big-endian system, and I want to
make sure I understand how the data is internally represented on such a
system.

Suppose I have a file containing 8-bytes. In Ascii, it contains:
"abcdefgh"
In hex, the file contains:
61 62 63 64 65 66 67 68
The file is then read into memory on my machine (2 different ways) and on a
hypothetical big-endian 64-bit machine. Each system does an operation on
the data then writes the data to a binary file. Will all three files
contain the identical bit-sequence? Thanks.

Case1)
I read this data as binary into a 2-element array of 32-bit words, on my
little-endian machine using something like:
in.read(reinterpret_cast<char*>(array), 8)
after which:
array[0] == 1684234849 == 0x64636261
array[1] == 1751606885 == 0x68676665

I then do the following transformation (rotate "right" 1-bit) and write the
binary output to a file:
int carrybit = array[0] & 1;
array[0] = (array[0] >> 1) | ((array[1] & 1) << 31);
array[1] = (array[1] >> 1) | (carrybit << 31);
ofstream out ("fileout1.dat", ios::binary | ios::out);
char* o = reinterpret_cast<char*>(array);
out.write(o, 8);

Case2)
I read this data as binary into long long variable (64-bit), on my
little-endian machine using something like:
in.read(reinterpret_cast<char*>(&variable), 8)
after which:
variable == 7523094288207667809 == 0x6867666564636261

I then do the following transformation (rotate "right" 1-bit) and write the
binary output to a file:
variable = (variable >> 1) | (variable << 63);
ofstream out ("fileout2.dat", ios::binary | ios::out);
char* o = reinterpret_cast<char*>(&variable);
out.write(o, 8);

Case3)
I read this data as binary into a 64-bit variable on a hypothetical
big-endian machine using something like:
in.read(reinterpret_cast<char*>(&variable), 8)
after which:
variable == 7017280452245743464 == 0x6162636465666768

I then do the following transformation (rotate "left" 1-bit) and write the
binary output to a file:
variable = (variable << 1) | (variable >> 63);
ofstream out ("fileout3.dat", ios::binary | ios::out);
char* o = reinterpret_cast<char*>(&variable);
out.write(o, 8);

_______________________

The question...do all three files contain identical data, namely(hex):
30 b1 31 b2 32 b3 33 b4

Thanks for your help.

Joe

Jul 22 '05 #1

Subscribe Post Reply

4647

Kevin Saff

"Joe C" <jk*****@bellsouth.net> wrote in message
news:RY******************@bignews3.bellsouth.net.. .

I have some code that performs bitwise operations on files. I'm trying to
make the code portable on different endian systems. This is not work/school related...just trying to learn/understand.

My computer is little endian 32-bit (intel, imagine that). My code deals
with binary data in files, and I've been treating all data as native words
(32-bit) for performance reasons. I found out that if I treat the data as
long long (64-bit) I only suffer a 5% performance hit. So...I'm thinking
that making it 64-bit might be a forward-looking way to support future
platforms.
On the other hand, "long long" is a non-standard extension to C++.
Anyway...I don't have access to a 64-bit Big-endian system, and I want to
make sure I understand how the data is internally represented on such a
system.

Suppose I have a file containing 8-bytes. In Ascii, it contains:
"abcdefgh"
In hex, the file contains:
61 62 63 64 65 66 67 68
The file is then read into memory on my machine (2 different ways) and on a hypothetical big-endian 64-bit machine. Each system does an operation on
the data then writes the data to a binary file. Will all three files
contain the identical bit-sequence? Thanks.
Maybe. In general C++ cannot guarantee that your file is portable.
However, in these cases I think one usually assumes that both computers use
the same char-size, and a set of chars written by one computer can be read
in the same order by the other computer. On different computers, bit
sequences are not required to have the same textual representation, or
signify the same numbers.
Case1)
I read this data as binary into a 2-element array of 32-bit words, on my
little-endian machine using something like:
in.read(reinterpret_cast<char*>(array), 8)
after which:
array[0] == 1684234849 == 0x64636261
array[1] == 1751606885 == 0x68676665

I then do the following transformation (rotate "right" 1-bit) and write the binary output to a file:
int carrybit = array[0] & 1;
array[0] = (array[0] >> 1) | ((array[1] & 1) << 31);
array[1] = (array[1] >> 1) | (carrybit << 31);
ofstream out ("fileout1.dat", ios::binary | ios::out);
char* o = reinterpret_cast<char*>(array);
out.write(o, 8);

Case2)
I read this data as binary into long long variable (64-bit), on my
little-endian machine using something like:
in.read(reinterpret_cast<char*>(&variable), 8)
after which:
variable == 7523094288207667809 == 0x6867666564636261

I then do the following transformation (rotate "right" 1-bit) and write the binary output to a file:
variable = (variable >> 1) | (variable << 63);
ofstream out ("fileout2.dat", ios::binary | ios::out);
char* o = reinterpret_cast<char*>(&variable);
out.write(o, 8);

Case3)
I read this data as binary into a 64-bit variable on a hypothetical
big-endian machine using something like:
in.read(reinterpret_cast<char*>(&variable), 8)
after which:
variable == 7017280452245743464 == 0x6162636465666768

I then do the following transformation (rotate "left" 1-bit) and write the
binary output to a file:
variable = (variable << 1) | (variable >> 63);
ofstream out ("fileout3.dat", ios::binary | ios::out);
char* o = reinterpret_cast<char*>(&variable);
out.write(o, 8);
Some confusions you might have here:

1) You are confused about the meaning of shift left/right. Left shifting is
always multiplication by two (if possible), right shifting division by two,
regardless of the bit representation.

2) Big-endian vs. little-endian is about the order of BYTES, not the order
of BITS. In fact, since a char is by definition the smallest addressable
units of memory in C++, it doesn't really make much since to talk about bit
order. OTOH byte order can be important, especially since IO involves
streaming objects as byte sequences.
The question...do all three files contain identical data, namely(hex):
30 b1 31 b2 32 b3 33 b4
Taking a much easier example, say we have the short (0x0102) saved on the
intel (as 0x02 0x01). Then "future computer" reads this in as (0x0201).
Whereas the intel short right-shifts to (0x0081), saving as (0x81 0x00); the
"future computer" will left-shift to (0x0402), written (0x04 0x02). OTOH if
the "future computer" right-shifts, it arrives at (0x8100), which writes
(0x81 0x00), the same as the intel.

Thanks for your help.

It probably isn't worth coding for this until it comes up. At the least
someone would have to compile and test for the new platform, when needed,
anyway. If/when it is needed an entire compatibility layer would probably
need to be added, which is too much work. Doing this compatibility work
will limit your current design, since it will make it much harder to make
needed changes to your binary format - every new feature will need to be
endian-proofed, and this will discourage real improvements.

HTH
--
KCS

Jul 22 '05 #2

Joe C

"Kevin Saff" <go********@kevin.saff.net> wrote in message
news:Hr********@news.boeing.com...

On the other hand, "long long" is a non-standard extension to C++.
right...but a 64 bit integer data-type surely be available.

Taking a much easier example, say we have the short (0x0102) saved on the
intel (as 0x02 0x01). Then "future computer" reads this in as (0x0201).
Whereas the intel short right-shifts to (0x0081), saving as (0x81 0x00); the "future computer" will left-shift to (0x0402), written (0x04 0x02). OTOH if the "future computer" right-shifts, it arrives at (0x8100), which writes
(0x81 0x00), the same as the intel.

Thanks a bunch for this good explaination. My analysis was flawed and you
have shed bright light on the issues.

Joe

Jul 22 '05 #3

EventHelix.com

The following article should help:

http://www.eventhelix.com/RealtimeMa...ndOrdering.htm

Sandeep
--
http://www.EventHelix.com/EventStudio
EventStudio 2.0 - System Architecture Design CASE Tool

Jul 22 '05 #4

by: hicham | last post by:

Hi, I am looking for help, i would like to know how can i use the endian.h and config.h to convert compiled files under solaris from BIG-ENDIAN to compiled files LITTLE-ENDIAN. I am working...

C / C++

Little to big endian conversion

by: Perception | last post by:

Hello all, If I have a C-like data structure such that struct Data { int a; //16-bit value char; //3 ASCII characters int b; //32-bit value int c; //24-bit value }

C / C++

Big Endian

by: ranjeet.gupta | last post by:

Dear All !! I am not sure the question which I am asking is correct or wrong, but I have heard that storing the data into the big endian helps in gting the more transfer rate, Means we can...

C / C++

Please someone test this on a Big-Endian System

by: ThazKool | last post by:

I want to see if this code works the way it should on a Big-Endian system. Also if anyone has any ideas on how determine this at compile-time so that I use the right decoding or encoding...

C / C++

Big Endian and Little Endian

by: bhatia | last post by:

Hello all, If I have a C-like data structure such that struct Data { int a; //16-bit value char; //3 ASCII characters int b; //32-bit value int c; //24-bit value }

C / C++

liitle endian

by: raghu | last post by:

Is it possible to know whether a system is little endian or big endian by writing a C program? If so, can anyone please give me the idea to approach... Thanks a ton. Regards, Raghu

C / C++

endian conversion - composite type

by: ma740988 | last post by:

Data stored on a storage device is byte swapped. The data is big endian and my PC is little. At issue: There's a composite type ( a header ) at the front of the files that I'm trying to read in....

C / C++

Subroutine to determine big/little endian

by: RRick | last post by:

This was a question that showed up in a job interview once. (And to answer your next question: No, I didn't :)) Write a subroutine that returns a bool on whether a system supports big endian...

C / C++

little endian or big endian ???

by: guthena | last post by:

Write a small C program to determine whether a machine's type is little-endian or big-endian.

C / C++

Big Endian - Little Endian

by: Niranjan | last post by:

I have this program : void main() { int i=1; if((*(char*)&i)==1) printf("The machine is little endian."); else printf("The machine is big endian."); }

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Q about endian-ness/portability

Similar topics