liitle endian

raghu

Is it possible to know whether a system is little endian or big endian
by writing a C program? If so, can anyone please give me the idea to
approach...

Thanks a ton.

Regards,
Raghu

Dec 9 '06 #1

Subscribe Post Reply

3182

Richard Heathfield

raghu said:

Is it possible to know whether a system is little endian or big endian
by writing a C program?

If you're careful, it's possible for it not to matter.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.

Dec 9 '06 #2

jaysome

On Sat, 09 Dec 2006 09:59:44 +0000, Richard Heathfield
<rj*@see.sig.invalidwrote:

>raghu said:

>Is it possible to know whether a system is little endian or big endian
by writing a C program?

Yes.

>If you're careful, it's possible for it not to matter.

Yes. And some might even exercise poetic license and rephrase your
answer to something like this:

If you're really carefull, it's guaranteed for it not to matter.

--
jay

Dec 9 '06 #3

Richard Tobin

In article <11**********************@f1g2000cwa.googlegroups. com>,
raghu <ra*********@gmail.comwrote:

>Is it possible to know whether a system is little endian or big endian
by writing a C program? If so, can anyone please give me the idea to
approach...

The essence of endianness is the question of what happens when you use
an address to access objects of two different sizes. When you fetch
the smaller object, do you get the big end or the little end of larger
object?

In C, you can legally treat any object as a sequence of bytes, so you
can indeed portably determine the endianness (though it's not guaranteed
to be either big- or little-endian - it could be something stranger).

-- Richard

--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.

Dec 9 '06 #4

huuguanghui

#include <stdio.h>

struct test
{
unsigned short a;
unsigned short b;
};

int main()
{
struct test t = {0, 1};
printf("%d\n", t);
}

if the output is 1, your system is big endian
if the output is 65536, your system is little endian

Dec 9 '06 #5

Barry

<hu*********@gmail.comwrote in message
news:11**********************@16g2000cwy.googlegro ups.com...

#include <stdio.h>

struct test
{
unsigned short a;
unsigned short b;
};

int main()
{
struct test t = {0, 1};
printf("%d\n", t);
}

if the output is 1, your system is big endian
if the output is 65536, your system is little endian

My compiler output is 0, I guess its un-endian.

Dec 9 '06 #6

Joe Wright

jaysome wrote:

On Sat, 09 Dec 2006 09:59:44 +0000, Richard Heathfield
<rj*@see.sig.invalidwrote:

>raghu said:

>>Is it possible to know whether a system is little endian or big endian
by writing a C program?

Yes.

>If you're careful, it's possible for it not to matter.

Yes. And some might even exercise poetic license and rephrase your
answer to something like this:

If you're really carefull, it's guaranteed for it not to matter.

Hi Jay, tell me how.

I have Standard C programs which must read, write and manipulate .DBF
data files. The .DBF file contains interesting data in 16 and 32 bit
integers in little endian format. My C programs must perform identically
on Sparc (big endian) and x86 (little endian) boxes. How shall I do that
without knowing and caring about endianess of the box?

I agree with Richard that it usually shouldn't matter but even being
really careful, I wonder the nature of your guarantee; Money Back?,
Double Your Money Back? :-)

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Dec 9 '06 #7

Ernie Wright

raghu wrote:

Is it possible to know whether a system is little endian or big
endian by writing a C program? If so, can anyone please give me the
idea to approach...

The following is fairly common:

unsigned int i = 0x04030201;
char *p = ( char * ) &i;

switch ( p[ 0 ] ) {
case 1: /* Intel x86, Windows, little-endian */ ...
case 4: /* Motorola 68K, big-endian */ ...
default: /* something else */ ...
}

This assumes that int is 32 bits, which isn't guaranteed. It won't work
on a Cray, or in most compilers for 16-bit CPUs, for example. But in
most cases you can first examine the values of UINT_MAX, ULONG_MAX and
USHRT_MAX to find a 32-bit integer type and use that type in the test.

- Ernie http://home.comcast.net/~erniew

Dec 9 '06 #8

Richard Heathfield

Joe Wright said:

<snip>

I have Standard C programs which must read, write and manipulate .DBF
data files. The .DBF file contains interesting data in 16 and 32 bit
integers in little endian format. My C programs must perform identically
on Sparc (big endian) and x86 (little endian) boxes. How shall I do that
without knowing and caring about endianess of the box?

Do you have 8 bits per byte? Looks like it, from your platform list. So it's
easy - to read a 16-bit integer in little-endian format, first read the
first 8 bits, and then read the second eight bits. Multiply one or other of
them (I can never remember which, but it's not going to take you all day to
find out by a process of elimination) by 256, add the other one, and you're
done. For 32-bit integers, do this rather more often and with bigger
numbers. :-)

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.

Dec 9 '06 #9

Ernie Wright

Joe Wright wrote:

I have Standard C programs which must read, write and manipulate .DBF
data files. The .DBF file contains interesting data in 16 and 32 bit
integers in little endian format. My C programs must perform identically
on Sparc (big endian) and x86 (little endian) boxes. How shall I do that
without knowing and caring about endianess of the box?

Something like this.

int getI4_L( FILE *fp )
{
int i, c, n = 0;

for ( i = 0; i < 4; i++ ) {
c = fgetc( fp );
if ( c == EOF )
return handleError( fp );
n |= c << ( i * 8 );
}
return n;
}

To read big-endian 4-byte integers,

int getI4_B( FILE *fp )
{
...
for ( i = 3; i >= 0; i-- ) {

- Ernie http://home.comcast.net/~erniew

Dec 9 '06 #10

CBFalconer

Joe Wright wrote:

jaysome wrote:

.... snip ...

>>
If you're really carefull, it's guaranteed for it not to matter.

Hi Jay, tell me how.

I have Standard C programs which must read, write and manipulate
.DBF data files. The .DBF file contains interesting data in 16 and
32 bit integers in little endian format. My C programs must perform
identically on Sparc (big endian) and x86 (little endian) boxes.
How shall I do that without knowing and caring about endianess of
the box?

I agree with Richard that it usually shouldn't matter but even
being really careful, I wonder the nature of your guarantee; Money
Back?, Double Your Money Back? :-)

For example (untested):

/* convert 4 octet little endian to integer */
/* assumes each byte contains one octet */
/* also that UINT_MAX is >= 2 ** 32 */
/* (else use longs, which will always work) */
unsigned int convert4(const char *s) {
unsigned int i, val;

for (i = val = 0; i < 4; i++)
val = val * 256 + ((s[i] & 0ffh) << (8 * i));
return val;
}

If you need signed ints, test the final result for exceeding
MAX_INT. Note that this will work even if unsigned ints contain
padding bits. The endianess of your system doesn't matter.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

Dec 9 '06 #11

Bill Pursell

raghu wrote:

Is it possible to know whether a system is little endian or big endian
by writing a C program? If so, can anyone please give me the idea to
approach...

Other people have suggested methods for checking the value
of one of the bytes of a multi-byte value, and that is a reasonable
techinque, but for most hosted systems you can also
look in the system headers for something like:

#define BYTE_ORDER LITTLE_ENDIAN
When you're writing the program, if you need to know, you
can tell by doing something like:
#if BYTE_ORDER == LITTLE_ENDIAN
do_little_endian_processing();
#else
do_big_endian_processing();
#endif

(Note that this example assumes that
there is no other byte ordering, which is absurd
since it is perfectly reasonable for someone to
build a machine in which a 4 byte integer is ordered
0,3,1,2, or some other permutation.)

Also, consider using the htonl() family of functions.

--
Bill Pursell

Dec 9 '06 #12

Barry Schwarz

On 9 Dec 2006 04:57:58 -0800, "hu*********@gmail.com"
<hu*********@gmail.comwrote:

>#include <stdio.h>

struct test
{
unsigned short a;

And if the compiler decided to insert some padding?

unsigned short b;
};

int main()
{
struct test t = {0, 1};
printf("%d\n", t);

Why would you assume that a struct is passed to a variadic function
the same way an int would be? Why would you assume 2 *
sizeof(unsigned short) is equal to sizeof(int)?

>}

if the output is 1, your system is big endian
if the output is 65536, your system is little endian

Since you have invoked undefined behavior, you have determined
nothing.
Remove del for email

Dec 9 '06 #13

Joe Wright

Richard Heathfield wrote:

Joe Wright said:

<snip>

>I have Standard C programs which must read, write and manipulate .DBF
data files. The .DBF file contains interesting data in 16 and 32 bit
integers in little endian format. My C programs must perform identically
on Sparc (big endian) and x86 (little endian) boxes. How shall I do that
without knowing and caring about endianess of the box?

Do you have 8 bits per byte? Looks like it, from your platform list. So it's
easy - to read a 16-bit integer in little-endian format, first read the
first 8 bits, and then read the second eight bits. Multiply one or other of
them (I can never remember which, but it's not going to take you all day to
find out by a process of elimination) by 256, add the other one, and you're
done. For 32-bit integers, do this rather more often and with bigger
numbers. :-)

Richard, you misunderstand I think. I can easily convert between Sparc
and x86 integers. The question was "Shall I?" and how to know.

Networking libraries may present us htonl() and ntohl() but, alas, they
are not C.

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Dec 9 '06 #14

Eric Sosman

Joe Wright wrote:

Richard Heathfield wrote:
>Joe Wright said:

<snip>

>>I have Standard C programs which must read, write and manipulate .DBF
data files. The .DBF file contains interesting data in 16 and 32 bit
integers in little endian format. My C programs must perform identically
on Sparc (big endian) and x86 (little endian) boxes. How shall I do that
without knowing and caring about endianess of the box?

Do you have 8 bits per byte? Looks like it, from your platform list.
So it's easy - to read a 16-bit integer in little-endian format, first
read the first 8 bits, and then read the second eight bits. Multiply
one or other of them (I can never remember which, but it's not going
to take you all day to find out by a process of elimination) by 256,
add the other one, and you're done. For 32-bit integers, do this
rather more often and with bigger numbers. :-)

Richard, you misunderstand I think. I can easily convert between Sparc
and x86 integers. The question was "Shall I?" and how to know.

You needn't know or care: Richard's method works unchanged
on Little-, Big-, Middle-, and Mixed-Endian platforms. (Extra
credit: Devise a corresponding "endian-oblivious" scheme for
writing integers.)

Are you worried about offending the Little Tin God? Given
that there's I/O going on, it's dollars to doughnuts that your
CPU is loafing anyhow and could afford to do a half-dozen FFTs
between reads without slowing anything down.

--
Eric Sosman
es*****@acm-dot-org.invalid

Dec 9 '06 #15

P.J. Plauger

"Bill Pursell" <bi**********@gmail.comwrote in message
news:11**********************@80g2000cwy.googlegro ups.com...

raghu wrote:
>Is it possible to know whether a system is little endian or big endian
by writing a C program? If so, can anyone please give me the idea to
approach...

Other people have suggested methods for checking the value
of one of the bytes of a multi-byte value, and that is a reasonable
techinque, but for most hosted systems you can also
look in the system headers for something like:

#define BYTE_ORDER LITTLE_ENDIAN
When you're writing the program, if you need to know, you
can tell by doing something like:
#if BYTE_ORDER == LITTLE_ENDIAN
do_little_endian_processing();
#else
do_big_endian_processing();
#endif

(Note that this example assumes that
there is no other byte ordering, which is absurd
since it is perfectly reasonable for someone to
build a machine in which a 4 byte integer is ordered
0,3,1,2, or some other permutation.)

At a stretch, yes. Nevertheless, practically all computers today
use either {0 1 2 3} or {3 2 1 0}. Neverthe-nevertheless, the
original PDP-11 C compiler represented 32-bit integers as
{2 3 0 1}.

Also, consider using the htonl() family of functions.

Exactly. The presenting problem is assembling bytes from a
stream to make an internal integer, and/or emitting bytes
to a stream given an internal integer. You don't need to
know the internal properties of the host architecture to
do that job.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Dec 9 '06 #16

Stephen Sprunk

"Joe Wright" <jo********@comcast.netwrote in message
news:y5******************************@comcast.com. ..

Richard, you misunderstand I think. I can easily convert between Sparc
and x86 integers. The question was "Shall I?" and how to know.

You don't need to know. The problem is reading from the file as a
multi-byte object and then trying to determine if you need to adjust it.
The solution is reading one byte at a time, so that what you do with the
bytes afterwards doesn't vary between platforms.

Networking libraries may present us htonl() and ntohl() but, alas,
they are not C.

Even if available, htonl() and ntohl() are the exact opposite of what
the OP wants. They convert between system endianness and "network byte
order", which is big endian. Since his files are stored in
little-endian format, these functions will manage to give him the wrong
results on _every_ platform.

If you're looking for a fast, non-portable solution, I'd look for
something like Linux's <endian.hand <byteswap.h>.

The shift-and-add method is the portable way, but it's likely to be much
slower unless the compiler has a specific optimization to detect and
replace that code pattern.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking
--
Posted via a free Usenet account from http://www.teranews.com

Dec 9 '06 #17

dick

easy.

short int a=0x1234;

if( (*(char*)&a)=='\x12')
{
// big endian
}
else
{
// little endian
}

good enough?

raghu wrote:

Is it possible to know whether a system is little endian or big endian
by writing a C program? If so, can anyone please give me the idea to
approach...

Thanks a ton.

Regards,
Raghu

Dec 9 '06 #18

Keith Thompson

"dick" <di***********@hotmail.comwrites:

raghu wrote:
>Is it possible to know whether a system is little endian or big endian
by writing a C program? If so, can anyone please give me the idea to
approach...

easy.

short int a=0x1234;

if( (*(char*)&a)=='\x12')
{
// big endian
}
else
{
// little endian
}

good enough?

Please don't top-post. Read the following:

http://www.caliburn.nl/topposting.html
http://www.cpax.org.uk/prg/writings/topposting.php

And since "Please don't top-post" appears to be the most regularly
posted advice in this newsgroup these days, I'll also advise you to
follow a newsgroup for a while before posting. (Of course, anyone who
needs this advice won't read it -- sigh.)

No, it's not good enough in general. You're assuming that short is 2
bytes, and that a byte is 8 bits. Neither is guaranteed by the
language. I've worked on systems where short is 32 or 64 bits.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Dec 9 '06 #19

dick

easy.

short int a=0x1234;

if(! (*(char*)&a)=='\x34')
{
// big endian
}

else
{
// little endian

}

good enough?

- Hide quoted text -

Keith Thompson wrote:

"dick" <di***********@hotmail.comwrites:
raghu wrote:
Is it possible to know whether a system is little endian or big endian
by writing a C program? If so, can anyone please give me the idea to
approach...
easy.

short int a=0x1234;

if( (*(char*)&a)=='\x12')
{
// big endian
}
else
{
// little endian
}

good enough?

Please don't top-post. Read the following:

http://www.caliburn.nl/topposting.html
http://www.cpax.org.uk/prg/writings/topposting.php

And since "Please don't top-post" appears to be the most regularly
posted advice in this newsgroup these days, I'll also advise you to
follow a newsgroup for a while before posting. (Of course, anyone who
needs this advice won't read it -- sigh.)

No, it's not good enough in general. You're assuming that short is 2
bytes, and that a byte is 8 bits. Neither is guaranteed by the
language. I've worked on systems where short is 32 or 64 bits.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Dec 9 '06 #20

Keith Thompson

"dick" <di***********@hotmail.comwrites:

easy.

[snip]

good enough?

No.

As I wrote before:

>Please don't top-post. Read the following:

http://www.caliburn.nl/topposting.html
http://www.cpax.org.uk/prg/writings/topposting.php

Please read it this time.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Dec 10 '06 #21

dick

Keith Thompson wrote:

"dick" <di***********@hotmail.comwrites:
easy.
[snip]
good enough?

No.

As I wrote before:

Please don't top-post. Read the following:

http://www.caliburn.nl/topposting.html
http://www.cpax.org.uk/prg/writings/topposting.php

Please read it this time.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

easy.

short int a=0x0001;

if( (*(char*)&a)>'\000')
{
// little endian
}
else
{
// big endian
}

good enough?

Dec 10 '06 #22

Joe Wright

Eric Sosman wrote:

Joe Wright wrote:
>Richard Heathfield wrote:
>>Joe Wright said:

<snip>

I have Standard C programs which must read, write and manipulate .DBF
data files. The .DBF file contains interesting data in 16 and 32 bit
integers in little endian format. My C programs must perform
identically
on Sparc (big endian) and x86 (little endian) boxes. How shall I do
that
without knowing and caring about endianess of the box?

Do you have 8 bits per byte? Looks like it, from your platform list.
So it's easy - to read a 16-bit integer in little-endian format,
first read the first 8 bits, and then read the second eight bits.
Multiply one or other of them (I can never remember which, but it's
not going to take you all day to find out by a process of
elimination) by 256, add the other one, and you're done. For 32-bit
integers, do this rather more often and with bigger numbers. :-)

Richard, you misunderstand I think. I can easily convert between Sparc
and x86 integers. The question was "Shall I?" and how to know.

You needn't know or care: Richard's method works unchanged
on Little-, Big-, Middle-, and Mixed-Endian platforms. (Extra
credit: Devise a corresponding "endian-oblivious" scheme for
writing integers.)

Are you worried about offending the Little Tin God? Given
that there's I/O going on, it's dollars to doughnuts that your
CPU is loafing anyhow and could afford to do a half-dozen FFTs
between reads without slowing anything down.

I fear I am being misunderstood. Consider that the first 32 bytes of a
..DBF file have a structure like this..

typedef struct {
uchar version; /* 00 0x03 or 0x83 (with .dbt file) */
uchar date[3]; /* 01 Date YY MM DD in binary */
ulong numrecs; /* 04 Number of records in data file */
ushort hdrlen; /* 08 Offset to first record */
ushort reclen; /* 0A Length of each record */
uchar reserved[20];/* 0C Balance of 32 bytes */
} HEADER;

The ulong (unsigned long [32]) and ushort (unsigned short [16]) are
little endian. The .DBF is native format of dBASE II, dBASE III, FoxPro,
Clipper and other database management systems born on Intel processors.

The values are stored Little Endian!

In manipulating the .DBF we have to read the various numrecs, hdrlen and
reclen values into a HEADER structure. A straightforward fread().

If numrecs is ten, those four bytes will be 0A000000. All well and good
on x86 hardware. But what about Sparc hardware? Our numrecs is still
four bytes but its value is now 167 million and more.

Clearly my C programs must know whether they are running on big or
little endian hardware so that they know the value of 0A000000.

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Dec 10 '06 #23

Chris Torek

In article <Y7******************************@comcast.com>
Joe Wright <jo********@comcast.netwrote:

>I fear I am being misunderstood. Consider that the first 32 bytes of a
.DBF file have a structure like this..

typedef struct {
uchar version; /* 00 0x03 or 0x83 (with .dbt file) */
uchar date[3]; /* 01 Date YY MM DD in binary */
ulong numrecs; /* 04 Number of records in data file */
ushort hdrlen; /* 08 Offset to first record */
ushort reclen; /* 0A Length of each record */
uchar reserved[20];/* 0C Balance of 32 bytes */
} HEADER;

The ulong (unsigned long [32]) and ushort (unsigned short [16]) are
little endian. The .DBF is native format of dBASE II, dBASE III, FoxPro,
Clipper and other database management systems born on Intel processors.

The values are stored Little Endian!

In manipulating the .DBF we have to read the various numrecs, hdrlen and
reclen values into a HEADER structure. A straightforward fread().

"Doctor, doctor, it hurts when I do this!"

"Well, don't do that, then."

:-)

Seriously, the above is a classic example of "what not to do". Using
using fread() and fwrite() on the raw internal data structures not
only makes you endian-dependent, it also makes you alignment- and
size-dependent. The code will fail horribly on machines where
"ulong" is 8 bytes instead of 4. Even "ushort" is 8 bytes on a Cray
(assuming "ushort" is an alias for "unsigned short" -- there is no
2-byte data type).

To read or write a ".DBF" file, "don't do that, then". Instead of:

result = fread(&header, sizeof header, 1, fp);
if (result != 1) ... handle error ...

-- which is admittedly very short -- use the more verbose:

unsigned char buf[32];

result = fread(buf, sizeof buf, 1, fp);
if (result != 1) ... handle error ...
/* optionally, check buf[0] here */
header.version = buf[0];
header.date[0] = buf[1];
header.date[1] = buf[2];
header.date[2] = buf[3];
header.numrecs = buf[4] + (buf[5] << 8) +
((ulong)buf[6] << 16) + ((ulong)buf[7] << 24);
header.hdrlen = buf[8] + (buf[9] << 8);
header.reclen = buf[10] + (buf[11] << 8);
/* ignore or copy "spare" bytes as desired */

>If numrecs is ten, those four bytes will be 0A000000. All well and good
on x86 hardware. But what about Sparc hardware? Our numrecs is still
four bytes but its value is now 167 million and more.

The above works fine on SPARC, Cray, x86-64, and even the hardware
that has not been invented yet that will come out in four years.

>Clearly my C programs must know whether they are running on big or
little endian hardware so that they know the value of 0A000000.

Or, maybe not. :-)

Endianness is a problem only if you let someone else take apart
and put together your data into sub-units like "bytes" (unsigned
char, in C). If you do it "manually", instead of having the machine
do it for you inside fread() and fwrite(), *you* control the details.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Dec 10 '06 #24

Richard Heathfield

Joe Wright said:

<snip>

I fear I am being misunderstood.

No, you're just stuck in a mind-rut. (As are we all, from time to time.)

Consider that the first 32 bytes of a

.DBF file have a structure like this..

typedef struct {
uchar version; /* 00 0x03 or 0x83 (with .dbt file) */
uchar date[3]; /* 01 Date YY MM DD in binary */
ulong numrecs; /* 04 Number of records in data file */
ushort hdrlen; /* 08 Offset to first record */
ushort reclen; /* 0A Length of each record */
uchar reserved[20];/* 0C Balance of 32 bytes */
} HEADER;

Now consider the possibility that the first 32 bytes of a .DBF file are
represented on the disk like this:

unsigned char firstbyte;
unsigned char secondbyte;
unsigned char thirdbyte;
......etc

>
The ulong (unsigned long [32]) and ushort (unsigned short [16]) are
little endian. The .DBF is native format of dBASE II, dBASE III, FoxPro,
Clipper and other database management systems born on Intel processors.

The values are stored Little Endian!

No, the values stored are just bytes.

In manipulating the .DBF we have to read the various numrecs, hdrlen and
reclen values into a HEADER structure. A straightforward fread().

Stop right there. Rewind. Re-read. This time, byte by byte, assembling
aggregate (multi-byte) values "manually".

If numrecs is ten, those four bytes will be 0A000000. All well and good
on x86 hardware. But what about Sparc hardware? Our numrecs is still
four bytes but its value is now 167 million and more.

No, you read the first byte: 0A. Okay, store that in your unsigned long, and
now read the second byte, and multiply by 256, and add in. 0000000A +
00000000 = 00000000. Now read the third byte, and multiply by 256^2, and
add in. That's 0000000A + 00000000 which is 0000000A. Now read the fourth
byte, multiply by 256^3, and add in. That's 0000000A + 00000000 = 0000000A,
which is the correct answer, irrespective of what end your CPU is ianing.

Clearly my C programs must know whether they are running on big or
little endian hardware so that they know the value of 0A000000.

What matters is the file's endianism, not the hardware's endianism. If your
file's ends are the other way about, your reader needs to be the other way
about, too. In fact, the number of different readers you need = number of
different integer formats (endianisms, signs) * number of different integer
sizes. That's always assuming you share with the file generating program a
common notion of the number of bits in a byte, of course!

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.

Dec 10 '06 #25

CBFalconer

Joe Wright wrote:

>

.... snip ...

>
I fear I am being misunderstood. Consider that the first 32 bytes
of a .DBF file have a structure like this..

typedef struct {
uchar version; /* 00 0x03 or 0x83 (with .dbt file) */
uchar date[3]; /* 01 Date YY MM DD in binary */
ulong numrecs; /* 04 Number of records in data file */
ushort hdrlen; /* 08 Offset to first record */
ushort reclen; /* 0A Length of each record */
uchar reserved[20];/* 0C Balance of 32 bytes */
} HEADER;

The ulong (unsigned long [32]) and ushort (unsigned short [16]) are
little endian. The .DBF is native format of dBASE II, dBASE III, FoxPro,
Clipper and other database management systems born on Intel processors.

The values are stored Little Endian!

The point is that the file arrangement is fixed. Get rid of that
definition and define the fields as arrays of unsigned char, which
you know to be describing little endian values of various lengths
in some places. Then use the methods that have been described here
to make local endian independent conversions to and from those
fields. Remember that C file i/o routines are always reading and
writing sequences of bytes.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

Dec 10 '06 #26

Joe Wright

Richard Heathfield wrote:

Joe Wright said:

<snip>

>I fear I am being misunderstood.

No, you're just stuck in a mind-rut. (As are we all, from time to time.)

Consider that the first 32 bytes of a
>.DBF file have a structure like this..

typedef struct {
uchar version; /* 00 0x03 or 0x83 (with .dbt file) */
uchar date[3]; /* 01 Date YY MM DD in binary */
ulong numrecs; /* 04 Number of records in data file */
ushort hdrlen; /* 08 Offset to first record */
ushort reclen; /* 0A Length of each record */
uchar reserved[20];/* 0C Balance of 32 bytes */
} HEADER;

Now consider the possibility that the first 32 bytes of a .DBF file are
represented on the disk like this:

unsigned char firstbyte;
unsigned char secondbyte;
unsigned char thirdbyte;
.....etc

>The ulong (unsigned long [32]) and ushort (unsigned short [16]) are
little endian. The .DBF is native format of dBASE II, dBASE III, FoxPro,
Clipper and other database management systems born on Intel processors.

The values are stored Little Endian!

No, the values stored are just bytes.

>In manipulating the .DBF we have to read the various numrecs, hdrlen and
reclen values into a HEADER structure. A straightforward fread().

Stop right there. Rewind. Re-read. This time, byte by byte, assembling
aggregate (multi-byte) values "manually".

>If numrecs is ten, those four bytes will be 0A000000. All well and good
on x86 hardware. But what about Sparc hardware? Our numrecs is still
four bytes but its value is now 167 million and more.

No, you read the first byte: 0A. Okay, store that in your unsigned long, and
now read the second byte, and multiply by 256, and add in. 0000000A +
00000000 = 00000000. Now read the third byte, and multiply by 256^2, and
add in. That's 0000000A + 00000000 which is 0000000A. Now read the fourth
byte, multiply by 256^3, and add in. That's 0000000A + 00000000 = 0000000A,
which is the correct answer, irrespective of what end your CPU is ianing.

>Clearly my C programs must know whether they are running on big or
little endian hardware so that they know the value of 0A000000.

What matters is the file's endianism, not the hardware's endianism. If your
file's ends are the other way about, your reader needs to be the other way
about, too. In fact, the number of different readers you need = number of
different integer formats (endianisms, signs) * number of different integer
sizes. That's always assuming you share with the file generating program a
common notion of the number of bits in a byte, of course!

Thanks. I see the rut now and I'm climbing out.

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Dec 10 '06 #27

christian.bau

Joe Wright wrote:

Hi Jay, tell me how.

I have Standard C programs which must read, write and manipulate .DBF
data files. The .DBF file contains interesting data in 16 and 32 bit
integers in little endian format. My C programs must perform identically
on Sparc (big endian) and x86 (little endian) boxes. How shall I do that
without knowing and caring about endianess of the box?

Lets say you have an array

unsigned char data [6];

of eight-bit bytes which contains one 16 bit and one 32 bit unsigned
integer in little-endian format. You want to get the values into two
variables

unsigned int x1;
unsigned long x2; // x1 and x2 are guaranteed to be big enough

x1 = data [0] + (data [1] << 8);
x2 = data [2] + (data [3] << 8) + (((unsigned long) data [4]) << 16) +
(((unsigned long) data [5]) << 24);

This works even if your box has some weird mixed-endian format.

Dec 10 '06 #28

Richard Bos

"dick" <di***********@hotmail.comwrote:

easy.

short int a=0x0001;

if( (*(char*)&a)>'\000')
{
// little endian
}
else
{
// big endian
}

good enough?

No. System has 16-bit chars, 16-bit shorts, 32-bit ints, and is
big-endian for chars and shorts within ints. Program falls over.

There is no general solution.

Richard

Dec 11 '06 #29

Kenneth Brody

raghu wrote:

>
Is it possible to know whether a system is little endian or big endian
by writing a C program? If so, can anyone please give me the idea to
approach...

[...]

Well, ignoring the "why should it matter" angle...

Overlay an int with an unsigned char array, store 0x11223344
(assuming 32-bit ints here) and examine it.

And don't forget that there are more than "little endian" and "big
endian" systems out there. I seem to recall that some form of
VAXen use an inside-out type order, where each 16-bit "word" is
stored little-endian, but each "word" in a 32-bit "dword" is
stored big-endian. (ie: 0x11223344 would be stored in the order
"22 11 44 33".)

Back to the "why should it matter" angle, I have the same issue.
I manage a program for which certain modules need to know the byte
ordering -- such as the program which converts the data files from
one byte order to another to move data between platforms. I wrote
a simple program which does the above (and more, as I need to know
the padding used as well) and dumps the unsigned char array to
stdout for me to examine. From there, I tweak the header file used
by the program to specify which byte order and padding is used
natively. (This could probably be automated nowadays, like many
gnu "configure" scripts do, but this predates gnu, and it's
probably overkill for this particular need.)

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h|
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:Th*************@gmail.com>

Dec 11 '06 #30

sjdevnull

Kenneth Brody wrote:

raghu wrote:

Is it possible to know whether a system is little endian or big endian
by writing a C program? If so, can anyone please give me the idea to
approach...
[...]

Well, ignoring the "why should it matter" angle...

Overlay an int with an unsigned char array, store 0x11223344
(assuming 32-bit ints here) and examine it.

And don't forget that there are more than "little endian" and "big
endian" systems out there. I seem to recall that some form of
VAXen use an inside-out type order, where each 16-bit "word" is
stored little-endian, but each "word" in a 32-bit "dword" is
stored big-endian. (ie: 0x11223344 would be stored in the order
"22 11 44 33".)

It was the PDP-11 (and probably other PDPs).

http://www.idiap.ch/~formaz/doc/glib...er-macros.html
says:

"Finally, to complicate matters, some other processors store the bytes
in a rather curious order known as PDP-endian. For a 4-byte word, the
3rd most significant byte is stored first, then the 4th, then the 1st
and finally the 2nd."

Which I think is equivalent to what you said.

Dec 11 '06 #31

dick

My solution is: do not buy this machine. do not buy this compiler.

Richard Bos wrote:

"dick" <di***********@hotmail.comwrote:

easy.

short int a=0x0001;

if( (*(char*)&a)>'\000')
{
// little endian
}
else
{
// big endian
}

good enough?

No. System has 16-bit chars, 16-bit shorts, 32-bit ints, and is
big-endian for chars and shorts within ints. Program falls over.

There is no general solution.

Richard

Dec 12 '06 #32

Ian Collins

dick wrote:

My solution is: do not buy this machine. do not buy this compiler.

and do not top post...

Richard Bos wrote:

>>"dick" <di***********@hotmail.comwrote:

>>>easy.

short int a=0x0001;

if( (*(char*)&a)>'\000')
{
// little endian
}
else
{
// big endian
}

good enough?

No. System has 16-bit chars, 16-bit shorts, 32-bit ints, and is
big-endian for chars and shorts within ints. Program falls over.

There is no general solution.

Richard

--
Ian Collins.

Dec 12 '06 #33

Dave Thompson

On Sat, 09 Dec 2006 11:48:00 -0500, CBFalconer <cb********@yahoo.com>
wrote:
<snip>

For example (untested):

Apparently.

/* convert 4 octet little endian to integer */
/* assumes each byte contains one octet */
/* also that UINT_MAX is >= 2 ** 32 */
/* (else use longs, which will always work) */
unsigned int convert4(const char *s) {

If you make s const unsigned char * (and assume byte=octet as already)
you don't need to mask below. Or if you want to keep plain char for
the convenience of (the) caller(s), copy s to say slocal and use that.

unsigned int i, val;

for (i = val = 0; i < 4; i++)
val = val * 256 + ((s[i] & 0ffh) << (8 * i));

You want _either_ val = val * 256 + s[i]
/* big-endian, run i from 3 downto 0 for littleendian */

_or_ val = val + (s[i] << 8*i)
/* littleendian, 8*(3-i) for bigendian */
except you actually need (unsigned int)s[i] because otherwise the
shift is done in _signed_ int possibly 1+31-bit and overflow is UB.

And you can make the + instead of |, which I consider clearer in at
least the second case which emphasis the bit-representation of
numbers, plus there it doesn't need the parentheses for grouping.

- David.Thompson1 at worldnet.att.net

Dec 26 '06 #34

Similar topics