473,385 Members | 1,615 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Endian swaps with C++; comments please

/**
* Sample usage:
* unsigned long longvar = 0x12345678;
* unsigned long be_longvar = endian::host_to_big(longvar);
* unsigned short shortvar = 0x1234;
* unsigned short le_shortvar = endian::host_to_little(shortvar);
*/

// for std::reverse:
#include <algorithm>
#include <limits>

// for endian information:
#include <endian.h>
// Linux uses __BYTE_ORDER
// FreeBSD and Apple/Darwin use _BYTE_ORDER
// Some other BSD variants use BYTE_ORDER
#if (defined __BYTE_ORDER && __BYTE_ORDER==__BIG_ENDIAN) || \
(defined _BYTE_ORDER && _BYTE_ORDER== _BIG_ENDIAN) || \
(defined BYTE_ORDER && BYTE_ORDER== BIG_ENDIAN)
#define IS_BIG_ENDIAN 1
#else
#define IS_BIG_ENDIAN 0
#endif

namespace endian {

// This function will copy the supplied value and return a byte-swapped
// version of it. This function may/should be optimized for specific
// architectures when necessary. It may also be necessary to create
// partial specializations for certain types, since the current state
// of this function only allows some fundamental types to be swapped.
template <typename _type>
_type byteswap(_type val) {
if (std::numeric_limits<_type>::is_specialized &&
!std::numeric_limits<_type>::is_signed) {
// Found a type that is specialized and is unsigned.
switch (sizeof(_type)) {
case 1:
return val;
case 2:
return ((val & 0x00ff) << 8) | ((val & 0xff00) >> 8);
case 4:
return ((val & 0x000000ff) << 24) | ((val & 0x0000ff00) << 8) |
((val & 0x00ff0000) >> 8) | ((val & 0xff000000) >> 24);
}
}
// Swap this type using a different/fallback/hacky method:
unsigned char* v = reinterpret_cast<unsigned char*>(&val);
std::reverse(v, v + sizeof(_type));
return val;
}

template <typename _type>
_type host_to_big(_type val) {
return IS_BIG_ENDIAN ? val : byteswap(val);
}

template <typename _type>
_type host_to_little(_type val) {
return IS_BIG_ENDIAN ? byteswap(val) : val;
}

template <typename _type>
_type big_to_host(_type val) {
return IS_BIG_ENDIAN ? val : byteswap(val);
}

template <typename _type>
_type little_to_host(_type val) {
return IS_BIG_ENDIAN ? byteswap(val) : val;
}

} // end namespace endian

// Don't need this definition anymore:
#undef IS_BIG_ENDIAN

Jan 31 '06 #1
20 5339

"Aaron Graham" <at******@gmail.com> wrote in message
news:11**********************@g49g2000cwa.googlegr oups.com...
/**
* Sample usage:
* unsigned long longvar = 0x12345678;
* unsigned long be_longvar = endian::host_to_big(longvar);
* unsigned short shortvar = 0x1234;
* unsigned short le_shortvar = endian::host_to_little(shortvar);
*/


I've never seen a need to swap actual integer variable values. The only
time I execute any swapping code is when I'm writing out an integer-type
variable to disk (or reading it back), when that data might be read on
another platform. We decided on a standard for all integers in the files,
and all platforms must write (and read) in that format.

So, on each platform, we have read and write functions for the numeric data
types, which stream in/out the data in the order we need.

On the Mac, for example, the read and write functions simply read/write the
bytes from first memory location to last, while on Windows, we read/write
the bytes in reverse order.

This way, there's never a stored numeric variable in memory (aside from
perhaps in a buffer), which we have to worry about the "endianness" of.

-Howard
Jan 31 '06 #2
Aaron Graham wrote:
[...]
switch (sizeof(_type)) {
case 1:
return val;
case 2:
return ((val & 0x00ff) << 8) | ((val & 0xff00) >> 8);
This assumes that 'sizeof' returns the number of octets. It doesn't.
It returns the number of 'bytes'. Please read up on the difference.
case 4:
return ((val & 0x000000ff) << 24) | ((val & 0x0000ff00) << 8) |
((val & 0x00ff0000) >> 8) | ((val & 0xff000000) >> 24);
}
}
[..]


V
Jan 31 '06 #3
hello,

have a look at:

man htonl

("network" byte order is bigendian)

/**
* Sample usage:
* unsigned long longvar = 0x12345678;
* unsigned long be_longvar = endian::host_to_big(longvar);
* unsigned short shortvar = 0x1234;
* unsigned short le_shortvar = endian::host_to_little(shortvar);
*/

// for std::reverse:
#include <algorithm>
#include <limits>

// for endian information:
#include <endian.h>
// Linux uses __BYTE_ORDER
// FreeBSD and Apple/Darwin use _BYTE_ORDER
// Some other BSD variants use BYTE_ORDER
#if (defined __BYTE_ORDER && __BYTE_ORDER==__BIG_ENDIAN) || \
(defined _BYTE_ORDER && _BYTE_ORDER== _BIG_ENDIAN) || \
(defined BYTE_ORDER && BYTE_ORDER== BIG_ENDIAN)
#define IS_BIG_ENDIAN 1
#else
#define IS_BIG_ENDIAN 0
#endif

namespace endian {

// This function will copy the supplied value and return a byte-swapped
// version of it. This function may/should be optimized for specific
// architectures when necessary. It may also be necessary to create
// partial specializations for certain types, since the current state
// of this function only allows some fundamental types to be swapped.
template <typename _type>
_type byteswap(_type val) {
if (std::numeric_limits<_type>::is_specialized &&
!std::numeric_limits<_type>::is_signed) {
// Found a type that is specialized and is unsigned.
switch (sizeof(_type)) {
case 1:
return val;
case 2:
return ((val & 0x00ff) << 8) | ((val & 0xff00) >> 8);
case 4:
return ((val & 0x000000ff) << 24) | ((val & 0x0000ff00) << 8) |
((val & 0x00ff0000) >> 8) | ((val & 0xff000000) >> 24);
}
}
// Swap this type using a different/fallback/hacky method:
unsigned char* v = reinterpret_cast<unsigned char*>(&val);
std::reverse(v, v + sizeof(_type));
return val;
}

template <typename _type>
_type host_to_big(_type val) {
return IS_BIG_ENDIAN ? val : byteswap(val);
}

template <typename _type>
_type host_to_little(_type val) {
return IS_BIG_ENDIAN ? byteswap(val) : val;
}

template <typename _type>
_type big_to_host(_type val) {
return IS_BIG_ENDIAN ? val : byteswap(val);
}

template <typename _type>
_type little_to_host(_type val) {
return IS_BIG_ENDIAN ? byteswap(val) : val;
}

} // end namespace endian

// Don't need this definition anymore:
#undef IS_BIG_ENDIAN

Jan 31 '06 #4
andrea wrote:
have a look at:

man htonl


I'm already very familiar with it. I was looking for a more general
solution. htonl only works for long, and htons only works for short.
What about 64-bit quantities?

And #include <netinet/in.h> brings in a lot of baggage (#defines
mostly) that is not desirable in portable C++ code. For instance, if
you #include <netinet/in.h> in vxWorks (and likely other BSD systems),
you get #defines of the following symbols: m_len, m_data, m_type,
m_flags, and many others. You can imagine what kind of problems you
would have trying to port/compile code that uses hungarian notation
(not that I use HN).

Thanks for your suggestion.
Aaron

Jan 31 '06 #5
> This assumes that 'sizeof' returns the number of octets. It doesn't.
It returns the number of 'bytes'. Please read up on the difference.


I was not familiar with the distinction. I suppose systems that use
differently-sized-bytes would have to port this function, or let it
fall back to std::reverse. I'm not averse to having to port this
function for specific architectures, as long as the porting is highly
localized. Obviously, some architectures have native endian swapping
capabilities in their instruction sets, and it would be best to take
advantage of those as well (as I said in the comments).

Thanks for you input.
Aaron

Jan 31 '06 #6
> I've never seen a need to swap actual integer variable values. The only
time I execute any swapping code is when I'm writing out an integer-type
variable to disk (or reading it back), when that data might be read on
another platform. We decided on a standard for all integers in the files,
and all platforms must write (and read) in that format.

So, on each platform, we have read and write functions for the numeric data
types, which stream in/out the data in the order we need.


This begs the question a little bit. Somewhere, something has to do
the endian swapping. Besides, I don't always have control over file
formats I read and write. For instance, FLAC files use big endian for
metadata blocks, but the Vorbis comment metadata block uses little
endian internally.

Aaron

Jan 31 '06 #7
andrea wrote:
hello,

have a look at:

man htonl


[redacted]

1. Please do not top post.
2. htonl is a good solution, but it is not part of Standard C++. It is
a POSIX-ism that is implemented practically everywhere, but it's not in
the Standard. As such, it doesn't meet the OP's choice for a standard
C++ only solution. (of course, <endian.h> is also system specific....)
Jan 31 '06 #8
[...]
2. htonl is a good solution, but it is not part of Standard C++. It is
a POSIX-ism that is implemented practically everywhere, but it's not in
the Standard. As such, it doesn't meet the OP's choice for a standard
C++ only solution. (of course, <endian.h> is also system specific....)


htonl really _isn't_ a good solution, because it doesn't do anything on
big-endian machines. What if you're trying to read little-endian data
on a big-endian machine?

I agree that the #include <endian.h> is an ugly wart. Is there a
better way to know endianness at compile time? Is there a better
standard compiler built-in that will give you this information?

Aaron

Feb 1 '06 #9
Aaron Graham wrote:
[...]
2. htonl is a good solution, but it is not part of Standard C++. It is
a POSIX-ism that is implemented practically everywhere, but it's not in
the Standard. As such, it doesn't meet the OP's choice for a standard
C++ only solution. (of course, <endian.h> is also system specific....)


htonl really _isn't_ a good solution, because it doesn't do anything on
big-endian machines. What if you're trying to read little-endian data
on a big-endian machine?


Oh, good point. I got fixated on putting stuff into network byte order,
and forgot the general byteswap case.

I think you'll have to go compiler dependent and use the appropriate
manifest defines, or specify a command line option (or a custom endian.h
for each target platform).
Feb 1 '06 #10
Aaron Graham wrote:
/**
* Sample usage:
* unsigned long longvar = 0x12345678;
* unsigned long be_longvar = endian::host_to_big(longvar);
* unsigned short shortvar = 0x1234;
* unsigned short le_shortvar = endian::host_to_little(shortvar);
*/


Check this post:
http://groups.google.com/group/comp....97a255f?hl=en&

Usage:

NetworkOrder<int> val = 0x12345678;

int x = val;

Feb 1 '06 #11
Aaron Graham wrote:
htonl really _isn't_ a good solution, because it doesn't do anything on
big-endian machines. What if you're trying to read little-endian data
on a big-endian machine?

I agree that the #include <endian.h> is an ugly wart. Is there a
better way to know endianness at compile time? Is there a better
standard compiler built-in that will give you this information?


Why do you need to know at compile-time ?

The compiler's optimizer can (and does on compilers I've tested)
eliminate dead code when doing a "run time" endianness check.

This is one of those classic premature optimization issues.

Feb 1 '06 #12

Aaron Graham wrote:
andrea wrote:
have a look at:

man htonl


I'm already very familiar with it. I was looking for a more general
solution. htonl only works for long, and htons only works for short.
What about 64-bit quantities?


Note that on 64-bit linux 8 == sizeof(long). I wonder if htonl operates
on long rather than int32_t.

Feb 1 '06 #13
>>2. htonl is a good solution, but it is not part of Standard C++. It is
a POSIX-ism that is implemented practically everywhere, but it's not in
the Standard. As such, it doesn't meet the OP's choice for a standard
C++ only solution. (of course, <endian.h> is also system specific....)

Well, it is not the Standard but from your snippet it was clear that you
are working in a unix-like environment...
htonl really _isn't_ a good solution, because it doesn't do anything on
big-endian machines. What if you're trying to read little-endian data
on a big-endian machine?


I understand the desire to generalize the code as much as possible but,
IMHO, one should mainly aim at simplicity and efficiency. Foreseeing the
possibility to read little-endian data could be good for completeness
but I would write the data in bigendian, instead.

bye,
andrea
Feb 1 '06 #14
> Well, it is not the Standard but from your snippet it was clear that you
are working in a unix-like environment...
Well, I'm not working in a Windows environment, anyway...
I understand the desire to generalize the code as much as possible but,
IMHO, one should mainly aim at simplicity and efficiency. Foreseeing the
possibility to read little-endian data could be good for completeness
but I would write the data in bigendian, instead.


I think my solution is simple and efficient and general. The compiler
will take care of the optimizations to the point where it's just as
efficient for longs as htonl is (more efficient, if you consider that
swapbytes can be optimized for specific architectures).

It's not possible to always write files in big-endian, because I don't
dictate the endian-ness of popular file formats. If I write a wma file
using big-endian, for instance, nobody else will be able to read it.

Aaron

Feb 1 '06 #15
> Why do you need to know at compile-time ?

Endian swapping is used in tight loops all the time, and is commonly
used on resource-lean embedded systems (I am often in situations where
both of these points are applicable).
The compiler's optimizer can (and does on compilers I've tested)
eliminate dead code when doing a "run time" endianness check.
I'm not sure I understand what you're saying. If the compiler can't
determine at compile-time which branch you're going to be taking, it
can't assume there's any dead code. If you mean that endian-ness
checks that are commonly regarded as "runtime" are actually "compile
time" checks with some compilers, then I think that may be true. But
the most common one:

unsigned x = 1;
return !(*(char*)(&x));

.... is not optimized away by gcc, even at the highest optimization
level (at least, not in any of the disassemblies I've looked at).
There's probably a good reason for it, but I don't know what that is.
This is one of those classic premature optimization issues.


How do you know this? How do you know I'm not attempting to create a
good general solution to a problem where I've determined that endian
swapping is a significant contributor to slow performance?

Aaron

Feb 1 '06 #16
Aaron Graham wrote:
unsigned x = 1;
return !(*(char*)(&x));

... is not optimized away by gcc, even at the highest optimization
level (at least, not in any of the disassemblies I've looked at).
There's probably a good reason for it, but I don't know what that is.


In my investigations, that *was* optimized away including the dead code.

What did you test ?
Feb 1 '06 #17

"Aaron Graham" <at******@gmail.com> wrote in message
news:11**********************@g49g2000cwa.googlegr oups.com...
I've never seen a need to swap actual integer variable values. The only
time I execute any swapping code is when I'm writing out an integer-type
variable to disk (or reading it back), when that data might be read on
another platform. We decided on a standard for all integers in the
files,
and all platforms must write (and read) in that format.

So, on each platform, we have read and write functions for the numeric
data
types, which stream in/out the data in the order we need.


This begs the question a little bit. Somewhere, something has to do
the endian swapping. Besides, I don't always have control over file
formats I read and write. For instance, FLAC files use big endian for
metadata blocks, but the Vorbis comment metadata block uses little
endian internally.

Aaron


There doesn't ever have to be any swapping, as such. All data-ordering can
be done while reading and writing. If you know the ordering of the data to
be read or written, code that into your reading and writing routines for the
specific data you're handling. And there's no need to know what your
machine's internal byte-ordering is, since you can use mask&shift (or
multiplication/division) operations, which work the same, regardless of the
internal physical byte-ordering.

I'm pretty sure this is covered in the FAQ...?

-Howard

Feb 1 '06 #18
Gianni Mariani wrote:
Aaron Graham wrote:
unsigned x = 1;
return !(*(char*)(&x));

... is not optimized away by gcc, even at the highest optimization
level (at least, not in any of the disassemblies I've looked at).
There's probably a good reason for it, but I don't know what that is.


In my investigations, that *was* optimized away including the dead code.

What did you test ?


Okay, you're right: I tried a couple more compilers that I have
sitting around on one of my dev machines. It seems that gcc 2.95.2
does the optimization, but I can't get it to happen with the latest gcc
4.0.2 for linux-x86. Maybe a bug?

Try this:
#include <stdio.h>
void tell_endian() {
unsigned x = 1;
if (*(char*)&x) printf("little endian\n");
else printf("big endian\n");
}

Doing an objdump of the results of "gcc-4.0.2 -O3 -c -o foo foo.c"
gives me this:

00000000 <tell_endian>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 18 sub $0x18,%esp
6: c7 45 fc 01 00 00 00 movl $0x1,0xfffffffc(%ebp)
d: 80 7d fc 00 cmpb $0x0,0xfffffffc(%ebp)
11: 74 15 je 28 <tell_endian+0x28>
13: 83 ec 0c sub $0xc,%esp
16: 68 00 00 00 00 push $0x0
17: R_386_32 .rodata.str1.1
1b: e8 fc ff ff ff call 1c <tell_endian+0x1c>
1c: R_386_PC32 printf
20: 83 c4 10 add $0x10,%esp
23: c9 leave
24: c3 ret
25: 8d 76 00 lea 0x0(%esi),%esi
28: 83 ec 0c sub $0xc,%esp
2b: 68 1b 00 00 00 push $0x1b
2c: R_386_32 .rodata.str1.1
30: e8 fc ff ff ff call 31 <tell_endian+0x31>
31: R_386_PC32 printf
35: 83 c4 10 add $0x10,%esp
38: c9 leave
39: c3 ret

Feb 1 '06 #19
Aaron Graham wrote:
Gianni Mariani wrote:
Aaron Graham wrote:

unsigned x = 1;
return !(*(char*)(&x));

... is not optimized away by gcc, even at the highest optimization
level (at least, not in any of the disassemblies I've looked at).
There's probably a good reason for it, but I don't know what that is.


In my investigations, that *was* optimized away including the dead code.

What did you test ?

Okay, you're right: I tried a couple more compilers that I have
sitting around on one of my dev machines. It seems that gcc 2.95.2
does the optimization, but I can't get it to happen with the latest gcc
4.0.2 for linux-x86. Maybe a bug?


I changed it to:

bool tell_endian()
{
unsigned x = 1;
return *(char*)&x;
}

g++ 3.4.2 produces:

00000000 <_Z11tell_endianv>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: b8 01 00 00 00 mov $0x1,%eax
8: c9 leave
9: c3 ret
g++ 4.0.0 produces:

0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 10 sub $0x10,%esp
6: c7 45 fc 01 00 00 00 movl $0x1,0xfffffffc(%ebp)
d: 31 c0 xor %eax,%eax
f: 80 7d fc 00 cmpb $0x0,0xfffffffc(%ebp)
13: 0f 95 c0 setne %al
16: c9 leave
17: c3 ret

compile line:
g++ -O3 -c -o endian_test.o endian_test.cpp
Seem like a serious optimizer regression to me.

With g++ 3.4.2 it appears that it creates the right code even on -O1
level optimization.
Feb 2 '06 #20
Gianni Mariani wrote:

I changed it to:

bool tell_endian()
{
unsigned x = 1;
return *(char*)&x;
}

g++ 3.4.2 produces:

00000000 <_Z11tell_endianv>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: b8 01 00 00 00 mov $0x1,%eax
8: c9 leave
9: c3 ret
g++ 4.0.0 produces:

0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 10 sub $0x10,%esp
6: c7 45 fc 01 00 00 00 movl $0x1,0xfffffffc(%ebp)
d: 31 c0 xor %eax,%eax
f: 80 7d fc 00 cmpb $0x0,0xfffffffc(%ebp)
13: 0f 95 c0 setne %al
16: c9 leave
17: c3 ret

compile line:
g++ -O3 -c -o endian_test.o endian_test.cpp
Seem like a serious optimizer regression to me.

With g++ 3.4.2 it appears that it creates the right code even on -O1
level optimization.


I submitted a bug report at gcc bugzilla, #26069. I used your example,
since all my non-4.0.2 compilers are for different architectures.

Aaron

Feb 2 '06 #21

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: hicham | last post by:
Hi, I am looking for help, i would like to know how can i use the endian.h and config.h to convert compiled files under solaris from BIG-ENDIAN to compiled files LITTLE-ENDIAN. I am working...
8
by: Perception | last post by:
Hello all, If I have a C-like data structure such that struct Data { int a; //16-bit value char; //3 ASCII characters int b; //32-bit value int c; //24-bit value }
14
by: ThazKool | last post by:
I want to see if this code works the way it should on a Big-Endian system. Also if anyone has any ideas on how determine this at compile-time so that I use the right decoding or encoding...
33
by: raghu | last post by:
Is it possible to know whether a system is little endian or big endian by writing a C program? If so, can anyone please give me the idea to approach... Thanks a ton. Regards, Raghu
13
by: junky_fellow | last post by:
Hi guys, I need to convert a big endian integer to little endian integer. (the integer is 4 bytes in size on my implementation). I came up with the following code. I need your comments on...
13
by: Guillaume Dargaud | last post by:
Hello all, I tried to write a macro to check if a system is big endian or little endian with no success. Don't get me wrong, I have no problem writing a function or a one liner to do that, but I...
17
by: Kelly B | last post by:
#include<stdio.h> #define LITTLE_ENDIAN 0 #define BIG_ENDIAN 1 int endian() { int i = 1; char *p = (char *)&i; if (p == 1)
23
by: Niranjan | last post by:
I have this program : void main() { int i=1; if((*(char*)&i)==1) printf("The machine is little endian."); else printf("The machine is big endian."); }
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.