Binary files, little&big endian setting bits

Steve

Hi, i know this is an old question (sorry)

but its a different problem, i need to write a binary file as follows

00000011
00000000
00000000
00000101
00000000
11111111
11111111
00000000

program will be compiled in Microsoft Visual C++

was thinking of just writing it as chars (afaik chars are the only
unsigned int thats only 1 byte) so basicly i'll be writing
3,0,0,5,0,256,256,0

question is if i write a file like that will it come out as the bits
above, does VC++ write little or big endian and other than endian
issues if it doesn't come out as above, why not??

Nov 14 '05 #1

Subscribe Post Reply

2532

dandelion

"Steve" <st********@nuigalway.ie> wrote in message
news:28**************************@posting.google.c om...

Hi, i know this is an old question (sorry)
Off topic, too...

question is if i write a file like that will it come out as the bits
above, does VC++ write little or big endian and other than endian
issues if it doesn't come out as above, why not??

The Intel 80x86 processors are Little Endian, regardless of your compiler,
16 bit and 32-bit words will not be "in order" but will have their order
reversed. Writing bytes, however, you should have no endiannes problems.

See
http://www.cs.umass.edu/~verts/cs32/endian.html
http://www.rdrop.com/~cary/html/endian_faq.html
for details.

Nov 14 '05 #2

Jens.Toerring

Steve <st********@nuigalway.ie> wrote:

but its a different problem, i need to write a binary file as follows 00000011
00000000
00000000
00000101
00000000
11111111
11111111
00000000 program will be compiled in Microsoft Visual C++
Should be irrelevant here, if you need something VC++ specific you
better should ask in some MS related newsgroup.
was thinking of just writing it as chars (afaik chars are the only
unsigned int thats only 1 byte)
Sorry, but a char isn't nessecarily a single byte (i.e. 8 bits) - a
char can have different numbers of bits on different architectures.
See the macro CHAR_BIT in <limits.h>, that tells you how many bits a
char has.
so basicly i'll be writing
3,0,0,5,0,256,256,0 question is if i write a file like that will it come out as the bits
above, does VC++ write little or big endian and other than endian
issues if it doesn't come out as above, why not??

When you write single bytes endianness isn't an issue at all - it
only becomes a problem when you write out data with a size larger
than a byte.
Regards, Jens
--
\ Jens Thoms Toerring ___ Je***********@physik.fu-berlin.de
\__________________________ http://www.toerring.de

Nov 14 '05 #3

Eric Sosman

Steve wrote:

Hi, i know this is an old question (sorry)

but its a different problem, i need to write a binary file as follows

00000011
00000000
00000000
00000101
00000000
11111111
11111111
00000000

program will be compiled in Microsoft Visual C++

was thinking of just writing it as chars (afaik chars are the only
unsigned int thats only 1 byte) so basicly i'll be writing
3,0,0,5,0,256,256,0

question is if i write a file like that will it come out as the bits
above, does VC++ write little or big endian and other than endian
issues if it doesn't come out as above, why not??

I think you're asking about the order in which the
individual bits of each byte will be written: will the
first bit of the 3 be the high-order zero or the low-
order one?

To begin with, there may not *be* any order at all.
For example, suppose the output is sent to a parallel
interface that presents all eight bits simultaneously:
which bit is "first among equals" when they all march
in line abreast? The individual bits may not even
exist as discrete units: Consider writing to a modem
that encodes many bits in each signal transition, or
which uses data compression and winds up transmitting
2.71828 bits to encode the eight you presented? At the
C language level -- and even at the machine language
level, for most machines -- the byte is an indivisible
unit of I/O, and since it's indivisible the "order" of
its components cannot be discerned.

The question does eventually arise, at the level of
the medium on which the data is stored or through which
it is transmitted. And here, each storage device or
transmission medium has its own standards for the encoding
of these "indivisible" bytes. Some, like serial interfaces,
will indeed "split the atom" and transmit the individual
bits in a specified other. Others, like SCSI controllers,
designate specific signal lines for specific bits. Still
others, like card punches (anybody remember punched cards?)
will produce a pattern of holes that encode the character
designated by 3; this pattern will probably not have any
obvious relation to the original eight bits.

But you needn't worry about this unless you're the
person charged with implementing the electrical interface
to the storage or transmission medium. It is the job of
that interface to accept the serialized bits or the SCSI
signals or the holes in a punched card and to reconstitute
the indivisible byte value from them. As a programmer you
almost never care about the details (unless, perhaps, you're
writing diagnostic code that wants to produce specified
patterns in the signal lines to detect cross-talk, or that
sort of thing). You write out a 3, and it's the business
of the various media through which that 3 travels to ensure
that a 3 comes out at the other end. No huhu, cobber.

Where you *do* need to worry about endianness issues
is when you're dealing with multi-byte data objects: the
low-level media take care of individual bytes for you, but
you're responsible for arranging those bytes into larger
structures. Different systems have different conventions
for such arrangements, and that's why you can't just use
`fwrite(&int_value, sizeof int_value, 1, stream)' to send
an integer from one system to another. But once you've
settled on an "exchange format" that specifies the order
and meaning of the individual bytes, all you need to do is
decompose your larger objects into those bytes before
writing them, and reassemble the bytes into the larger
objects when reading. The actual form of the bytes "in
flight" is not your problem.

The only possible worry you might have with byte-by-
byte data exchange is if the machines use different byte
sizes: Exchanging data between machines with 8-bit and
9-bit bytes, for instance, can be tricky. But if you're
dealing with a common byte size, all is well.

--
Er*********@sun.com

Nov 14 '05 #4

Chris Torek

In article <news:28**************************@posting.google. com>

... i need to write a binary file as follows
00000011
00000000
00000000
00000101
00000000
11111111
11111111
00000000
... (afaik chars are the only unsigned int thats only 1 byte) so
basicly i'll be writing 3,0,0,5,0,256,256,0

Actually, eight 1 bits, treated as an unsigned char, represents the
value 255, not 256.

Eric Sosman has already addressed the (lack of) endianness that
occurs when 8-bit units are your atomic level of input/output.

I want to point out that in C, "byte" and "char" mean the same
thing, which is not necessarily "8 bits" -- but it probably does
not matter, in part because you are unlikely to have a 9- or 32-bit
"char" system in the first place, and in part because those have
to deal with the rest of the world.

And then I just had to write this... :-)

Bits in the C

When using a protocol over a net
(like TCP/IP or one I forget)
Where the number of bits has got to be eight
The Standards for C won't keep the things straight:
A char that is un-signed has got enough bits
But it might have too many, giving you fits!

A byte is a char, and a char is a byte
Eight bits is common, but nine is in sight
Digital Signalling Processors? Whew!
Here you may find there's a whole thirty-two!

When external formats on you are imposed
The trick to remember (while staying composed):
The C system's "bytes" may well be too big
But this does not mean you must give up the jig
To talk to another, the box you are on
Must have SOME way for them to begone
("Them" being pesky extraneous bits)
It just is not Standard, the part that omits
Some high order zeros of values between
Oh oh and eff eff (and hex sure is keen!).

To hold the right values, a char that is un-signed
Will do the trick nicely, I think you will find.
Who cares if it's bigger than strictly required?
The values you need will never get mired.
The eight bits you want won't get overtired
And values you need will never get mired!

Perhaps, with some more work and a good rousing tune, this might
even make a Gilbert & Sullivan pastiche. :-)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Nov 14 '05 #5

Dave Thompson

On Thu, 28 Oct 2004 11:56:16 -0400, Eric Sosman <er*********@sun.com>
wrote:
<snip>

[Endianness] does eventually arise, at the level of
the medium on which the data is stored or through which
it is transmitted. <snip> Some, like serial interfaces,
will indeed "split the atom" and transmit the individual
bits in a specified other. Others, like SCSI controllers,
designate specific signal lines for specific bits. Still
others, like card punches (anybody remember punched cards?)
will produce a pattern of holes that encode the character
designated by 3; this pattern will probably not have any
obvious relation to the original eight bits.

If the bits were EBCDIC, it certainly does bear a relation obvious to
anyone who thinks a bit about it (and knows the BCDIC history); even
for ASCII significant chunks of the translation to and from EBCDIC
(and thus card aka Hollerith) are systematic.

(Otherwise concur.)

Now, if you want an octet-parallel interface people will probably have
trouble remembering, how about IEEE-488 (IIRC) GPIB nee HPIB? <G>
- David.Thompson1 at worldnet.att.net

Nov 14 '05 #6

Flash Gordon

On Mon, 01 Nov 2004 08:14:12 GMT
Dave Thompson <da*************@worldnet.att.net> wrote:

<snip>

Now, if you want an octet-parallel interface people will probably have
trouble remembering, how about IEEE-488 (IIRC) GPIB nee HPIB? <G>

I think it was originally HPIB (Hewlet Packard Interface Bus), then GPIB
and IEEE-488 came alone as later names for it.

I've made plenty of use of it in the past talking to DSO, DMM...

I also did some low level hacking around with it trying to detect if kit
was connected without crashing the program doing the check or locking up
the bus. All in HP Pascal.

So I have absolutely no trouble remembering it and know where there is
kit still making use of it. :-)
--
Flash Gordon
Sometimes I think shooting would be far too good for some people.
Although my email address says spam, it is real and I read it.

Nov 14 '05 #7

Mike Wahler

<Je***********@physik.fu-berlin.de> wrote in message
news:2u*************@uni-berlin.de...

Steve <st********@nuigalway.ie> wrote:
but its a different problem, i need to write a binary file as follows
00000011
00000000
00000000
00000101
00000000
11111111
11111111
00000000

program will be compiled in Microsoft Visual C++

Should be irrelevant here, if you need something VC++ specific you
better should ask in some MS related newsgroup.
was thinking of just writing it as chars (afaik chars are the only
unsigned int thats only 1 byte)

Steve, note that type 'char' might or might not be unsigned.
This is defined by the implementation. If you want to ensure
an unsigned type, explicitly say so:

unsigned char c;
Sorry, but a char isn't nessecarily a single byte
Actually, yes it is.

See ISO 9899:

3.6

3.7.1

5.2.1/3

(i.e. 8 bits) - a
char can have different numbers of bits on different architectures.

As can a byte. "byte equals eight bits" is a very common
misconception.

-Mike

Nov 14 '05 #8

Mike Wahler

"Chris Torek" <no****@torek.net> wrote in message
news:cl*********@news3.newsguy.com...

In article <news:28**************************@posting.google. com> And then I just had to write this... :-)

Bits in the C

I love it! Printed and pasted on my office wall.

Thanks.

-Mike

Nov 14 '05 #9

Old Wolf

"Mike Wahler" <mk******@mkwahler.net> wrote:

(i.e. 8 bits) - a
char can have different numbers of bits on different architectures.

As can a byte. "byte equals eight bits" is a very common
misconception.

OOI, how many bits are there in a kilobyte, if 1 byte is 32 bits?
Should I start referring to file sizes in bits to avoid confusion?

Nov 14 '05 #10

Mike Wahler

"Old Wolf" <ol*****@inspire.net.nz> wrote in message
news:84**************************@posting.google.c om...

"Mike Wahler" <mk******@mkwahler.net> wrote:
(i.e. 8 bits) - a
char can have different numbers of bits on different architectures.
As can a byte. "byte equals eight bits" is a very common
misconception.

OOI, how many bits are there in a kilobyte, if 1 byte is 32 bits?

1024 * 32
Should I start referring to file sizes in bits to avoid confusion?

If you need to be that precise in your specification, yes.

-Mike

Nov 14 '05 #11

Chris Croughton

On Tue, 02 Nov 2004 21:30:28 GMT, Mike Wahler
<mk******@mkwahler.net> wrote:

"Old Wolf" <ol*****@inspire.net.nz> wrote in message
news:84**************************@posting.google.c om...
"Mike Wahler" <mk******@mkwahler.net> wrote:
>
> >(i.e. 8 bits) - a
> > char can have different numbers of bits on different architectures.
>
> As can a byte. "byte equals eight bits" is a very common
> misconception.

OOI, how many bits are there in a kilobyte, if 1 byte is 32 bits?

1024 * 32
Should I start referring to file sizes in bits to avoid confusion?

If you need to be that precise in your specification, yes.

That's the reason why communications specifications use the term
'octet', defined as being exactly 8 bits, because they need to be
specific about how many of them are used for fields. They also specify
the order of them (and order of bits if that is significant) to be
totally precise (big- and little-endian confusion is a major cause of
programming errors in comms software). I often define an explicit type
'octet' in my code (the same as uint8_t in C99, but not all compilers
are C99 and have stdint.h yet).

Chris C

Nov 14 '05 #12

Similar topics

preg_match(_all) & big strings

by: Muumac | last post by:

I have problem with large textfiles! When I load over 4MB xml and then try to preg_match something in this I get always FALSE! I have <File>....</File> tags in XML. Between tags is files contents...

PHP

Encryption between Python & PHP

by: Geoff Caplan | last post by:

Hi folks, I am looking for a practical way of sending encrypted strings back and forth between a Python HTTP client on Windoze and an Apache/PHP server on Linux. I am looking for a simple,...

Python

Accessing files & folders on client/other machine

by: Khalique | last post by:

I have built a web service whose purpose is to copy files from a secure place to client machine and vice versa. The problem I am having is perhaps related to permissions and access rights. For...

.NET Framework

data structure & alignment accessing speed on 32 bits system

by: pt | last post by:

Hi, i am wonderng what is faster according to accessing speed to read these data structure from the disk in c/c++ including alignment handling if we access it on little endian system 32 bits...

C / C++

Files & dirs: historical reasons?

by: Sensei | last post by:

I was having an interesting discussion about the ANSI C and some ``weird inconsistencies'', or at least what at first sight can be seen as an imbalance. I hope someone can satisfy my curiosity. ...

C / C++

Little Endian & Binary

by: UnknownBlue | last post by:

Hi. How do I display a binary data that is formatted in little endian in Visual C++ 6? Can I just display it as per normal or is there an additional code to use to read and display it?

C / C++

Manifest Files & VS2005

by: John Bowman | last post by:

Hi All, I originally sent this to the dotnet.security ng, but no repsonse in 5 days. Hopefully, someone here can explain to me what's going on. I'm fairly new to .NET app security and manifests....

C# / C Sharp

lshift & rshift

by: iesvs | last post by:

Hello guys, every time a rode a doc or a book about the language C I saw that operators << and >exist. But each time they said that << translate the digit to the left (and >...) but no one said if...

C / C++

((i & 1) == 1)

by: joso | last post by:

public static bool isodd(int i) { return ((i & 1) == 1); } can someone explain me how this is working

C# / C Sharp

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server