By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
426,115 Members | 863 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 426,115 IT Pros & Developers. It's quick & easy.

Binary files, little&big endian setting bits

P: n/a
Hi, i know this is an old question (sorry)

but its a different problem, i need to write a binary file as follows

00000011
00000000
00000000
00000101
00000000
11111111
11111111
00000000

program will be compiled in Microsoft Visual C++

was thinking of just writing it as chars (afaik chars are the only
unsigned int thats only 1 byte) so basicly i'll be writing
3,0,0,5,0,256,256,0

question is if i write a file like that will it come out as the bits
above, does VC++ write little or big endian and other than endian
issues if it doesn't come out as above, why not??
Nov 14 '05 #1
Share this Question
Share on Google+
11 Replies


P: n/a

"Steve" <st********@nuigalway.ie> wrote in message
news:28**************************@posting.google.c om...
Hi, i know this is an old question (sorry)
Off topic, too...

question is if i write a file like that will it come out as the bits
above, does VC++ write little or big endian and other than endian
issues if it doesn't come out as above, why not??


The Intel 80x86 processors are Little Endian, regardless of your compiler,
16 bit and 32-bit words will not be "in order" but will have their order
reversed. Writing bytes, however, you should have no endiannes problems.

See
http://www.cs.umass.edu/~verts/cs32/endian.html
http://www.rdrop.com/~cary/html/endian_faq.html
for details.
Nov 14 '05 #2

P: n/a
Steve <st********@nuigalway.ie> wrote:
but its a different problem, i need to write a binary file as follows 00000011
00000000
00000000
00000101
00000000
11111111
11111111
00000000 program will be compiled in Microsoft Visual C++
Should be irrelevant here, if you need something VC++ specific you
better should ask in some MS related newsgroup.
was thinking of just writing it as chars (afaik chars are the only
unsigned int thats only 1 byte)
Sorry, but a char isn't nessecarily a single byte (i.e. 8 bits) - a
char can have different numbers of bits on different architectures.
See the macro CHAR_BIT in <limits.h>, that tells you how many bits a
char has.
so basicly i'll be writing
3,0,0,5,0,256,256,0 question is if i write a file like that will it come out as the bits
above, does VC++ write little or big endian and other than endian
issues if it doesn't come out as above, why not??


When you write single bytes endianness isn't an issue at all - it
only becomes a problem when you write out data with a size larger
than a byte.
Regards, Jens
--
\ Jens Thoms Toerring ___ Je***********@physik.fu-berlin.de
\__________________________ http://www.toerring.de
Nov 14 '05 #3

P: n/a
Steve wrote:
Hi, i know this is an old question (sorry)

but its a different problem, i need to write a binary file as follows

00000011
00000000
00000000
00000101
00000000
11111111
11111111
00000000

program will be compiled in Microsoft Visual C++

was thinking of just writing it as chars (afaik chars are the only
unsigned int thats only 1 byte) so basicly i'll be writing
3,0,0,5,0,256,256,0

question is if i write a file like that will it come out as the bits
above, does VC++ write little or big endian and other than endian
issues if it doesn't come out as above, why not??


I think you're asking about the order in which the
individual bits of each byte will be written: will the
first bit of the 3 be the high-order zero or the low-
order one?

To begin with, there may not *be* any order at all.
For example, suppose the output is sent to a parallel
interface that presents all eight bits simultaneously:
which bit is "first among equals" when they all march
in line abreast? The individual bits may not even
exist as discrete units: Consider writing to a modem
that encodes many bits in each signal transition, or
which uses data compression and winds up transmitting
2.71828 bits to encode the eight you presented? At the
C language level -- and even at the machine language
level, for most machines -- the byte is an indivisible
unit of I/O, and since it's indivisible the "order" of
its components cannot be discerned.

The question does eventually arise, at the level of
the medium on which the data is stored or through which
it is transmitted. And here, each storage device or
transmission medium has its own standards for the encoding
of these "indivisible" bytes. Some, like serial interfaces,
will indeed "split the atom" and transmit the individual
bits in a specified other. Others, like SCSI controllers,
designate specific signal lines for specific bits. Still
others, like card punches (anybody remember punched cards?)
will produce a pattern of holes that encode the character
designated by 3; this pattern will probably not have any
obvious relation to the original eight bits.

But you needn't worry about this unless you're the
person charged with implementing the electrical interface
to the storage or transmission medium. It is the job of
that interface to accept the serialized bits or the SCSI
signals or the holes in a punched card and to reconstitute
the indivisible byte value from them. As a programmer you
almost never care about the details (unless, perhaps, you're
writing diagnostic code that wants to produce specified
patterns in the signal lines to detect cross-talk, or that
sort of thing). You write out a 3, and it's the business
of the various media through which that 3 travels to ensure
that a 3 comes out at the other end. No huhu, cobber.

Where you *do* need to worry about endianness issues
is when you're dealing with multi-byte data objects: the
low-level media take care of individual bytes for you, but
you're responsible for arranging those bytes into larger
structures. Different systems have different conventions
for such arrangements, and that's why you can't just use
`fwrite(&int_value, sizeof int_value, 1, stream)' to send
an integer from one system to another. But once you've
settled on an "exchange format" that specifies the order
and meaning of the individual bytes, all you need to do is
decompose your larger objects into those bytes before
writing them, and reassemble the bytes into the larger
objects when reading. The actual form of the bytes "in
flight" is not your problem.

The only possible worry you might have with byte-by-
byte data exchange is if the machines use different byte
sizes: Exchanging data between machines with 8-bit and
9-bit bytes, for instance, can be tricky. But if you're
dealing with a common byte size, all is well.

--
Er*********@sun.com

Nov 14 '05 #4

P: n/a
In article <news:28**************************@posting.google. com>
... i need to write a binary file as follows
00000011
00000000
00000000
00000101
00000000
11111111
11111111
00000000
... (afaik chars are the only unsigned int thats only 1 byte) so
basicly i'll be writing 3,0,0,5,0,256,256,0


Actually, eight 1 bits, treated as an unsigned char, represents the
value 255, not 256.

Eric Sosman has already addressed the (lack of) endianness that
occurs when 8-bit units are your atomic level of input/output.

I want to point out that in C, "byte" and "char" mean the same
thing, which is not necessarily "8 bits" -- but it probably does
not matter, in part because you are unlikely to have a 9- or 32-bit
"char" system in the first place, and in part because those have
to deal with the rest of the world.

And then I just had to write this... :-)

Bits in the C

When using a protocol over a net
(like TCP/IP or one I forget)
Where the number of bits has got to be eight
The Standards for C won't keep the things straight:
A char that is un-signed has got enough bits
But it might have too many, giving you fits!

A byte is a char, and a char is a byte
Eight bits is common, but nine is in sight
Digital Signalling Processors? Whew!
Here you may find there's a whole thirty-two!

When external formats on you are imposed
The trick to remember (while staying composed):
The C system's "bytes" may well be too big
But this does not mean you must give up the jig
To talk to another, the box you are on
Must have SOME way for them to begone
("Them" being pesky extraneous bits)
It just is not Standard, the part that omits
Some high order zeros of values between
Oh oh and eff eff (and hex sure is keen!).

To hold the right values, a char that is un-signed
Will do the trick nicely, I think you will find.
Who cares if it's bigger than strictly required?
The values you need will never get mired.
The eight bits you want won't get overtired
And values you need will never get mired!

Perhaps, with some more work and a good rousing tune, this might
even make a Gilbert & Sullivan pastiche. :-)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #5

P: n/a
On Thu, 28 Oct 2004 11:56:16 -0400, Eric Sosman <er*********@sun.com>
wrote:
<snip>
[Endianness] does eventually arise, at the level of
the medium on which the data is stored or through which
it is transmitted. <snip> Some, like serial interfaces,
will indeed "split the atom" and transmit the individual
bits in a specified other. Others, like SCSI controllers,
designate specific signal lines for specific bits. Still
others, like card punches (anybody remember punched cards?)
will produce a pattern of holes that encode the character
designated by 3; this pattern will probably not have any
obvious relation to the original eight bits.

If the bits were EBCDIC, it certainly does bear a relation obvious to
anyone who thinks a bit about it (and knows the BCDIC history); even
for ASCII significant chunks of the translation to and from EBCDIC
(and thus card aka Hollerith) are systematic.

(Otherwise concur.)

Now, if you want an octet-parallel interface people will probably have
trouble remembering, how about IEEE-488 (IIRC) GPIB nee HPIB? <G>
- David.Thompson1 at worldnet.att.net
Nov 14 '05 #6

P: n/a
On Mon, 01 Nov 2004 08:14:12 GMT
Dave Thompson <da*************@worldnet.att.net> wrote:

<snip>
Now, if you want an octet-parallel interface people will probably have
trouble remembering, how about IEEE-488 (IIRC) GPIB nee HPIB? <G>


I think it was originally HPIB (Hewlet Packard Interface Bus), then GPIB
and IEEE-488 came alone as later names for it.

I've made plenty of use of it in the past talking to DSO, DMM...

I also did some low level hacking around with it trying to detect if kit
was connected without crashing the program doing the check or locking up
the bus. All in HP Pascal.

So I have absolutely no trouble remembering it and know where there is
kit still making use of it. :-)
--
Flash Gordon
Sometimes I think shooting would be far too good for some people.
Although my email address says spam, it is real and I read it.
Nov 14 '05 #7

P: n/a
<Je***********@physik.fu-berlin.de> wrote in message
news:2u*************@uni-berlin.de...
Steve <st********@nuigalway.ie> wrote:
but its a different problem, i need to write a binary file as follows
00000011
00000000
00000000
00000101
00000000
11111111
11111111
00000000

program will be compiled in Microsoft Visual C++


Should be irrelevant here, if you need something VC++ specific you
better should ask in some MS related newsgroup.
was thinking of just writing it as chars (afaik chars are the only
unsigned int thats only 1 byte)


Steve, note that type 'char' might or might not be unsigned.
This is defined by the implementation. If you want to ensure
an unsigned type, explicitly say so:

unsigned char c;
Sorry, but a char isn't nessecarily a single byte
Actually, yes it is.

See ISO 9899:

3.6

3.7.1

5.2.1/3

(i.e. 8 bits) - a
char can have different numbers of bits on different architectures.


As can a byte. "byte equals eight bits" is a very common
misconception.

-Mike
Nov 14 '05 #8

P: n/a

"Chris Torek" <no****@torek.net> wrote in message
news:cl*********@news3.newsguy.com...
In article <news:28**************************@posting.google. com> And then I just had to write this... :-)

Bits in the C


I love it! Printed and pasted on my office wall.

Thanks.

-Mike
Nov 14 '05 #9

P: n/a
"Mike Wahler" <mk******@mkwahler.net> wrote:
(i.e. 8 bits) - a
char can have different numbers of bits on different architectures.


As can a byte. "byte equals eight bits" is a very common
misconception.


OOI, how many bits are there in a kilobyte, if 1 byte is 32 bits?
Should I start referring to file sizes in bits to avoid confusion?
Nov 14 '05 #10

P: n/a

"Old Wolf" <ol*****@inspire.net.nz> wrote in message
news:84**************************@posting.google.c om...
"Mike Wahler" <mk******@mkwahler.net> wrote:
(i.e. 8 bits) - a
char can have different numbers of bits on different architectures.
As can a byte. "byte equals eight bits" is a very common
misconception.


OOI, how many bits are there in a kilobyte, if 1 byte is 32 bits?


1024 * 32
Should I start referring to file sizes in bits to avoid confusion?


If you need to be that precise in your specification, yes.

-Mike
Nov 14 '05 #11

P: n/a
On Tue, 02 Nov 2004 21:30:28 GMT, Mike Wahler
<mk******@mkwahler.net> wrote:
"Old Wolf" <ol*****@inspire.net.nz> wrote in message
news:84**************************@posting.google.c om...
"Mike Wahler" <mk******@mkwahler.net> wrote:
>
> >(i.e. 8 bits) - a
> > char can have different numbers of bits on different architectures.
>
> As can a byte. "byte equals eight bits" is a very common
> misconception.


OOI, how many bits are there in a kilobyte, if 1 byte is 32 bits?


1024 * 32
Should I start referring to file sizes in bits to avoid confusion?


If you need to be that precise in your specification, yes.


That's the reason why communications specifications use the term
'octet', defined as being exactly 8 bits, because they need to be
specific about how many of them are used for fields. They also specify
the order of them (and order of bits if that is significant) to be
totally precise (big- and little-endian confusion is a major cause of
programming errors in comms software). I often define an explicit type
'octet' in my code (the same as uint8_t in C99, but not all compilers
are C99 and have stdint.h yet).

Chris C
Nov 14 '05 #12

This discussion thread is closed

Replies have been disabled for this discussion.