By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,230 Members | 1,403 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,230 IT Pros & Developers. It's quick & easy.

differance between binary file and ascii file

P: n/a
vim
hello everybody
Plz tell the differance between binary file and ascii
file...............
Thanks
in advance
vim

May 13 '06 #1
Share this Question
Share on Google+
68 Replies


P: n/a
vim said:
hello everybody
Plz tell the differance between binary file and ascii
file...............


Well, that's really the wrong question.

The right question is: "what is the difference between a stream opened in
binary mode, and a stream opened in text mode?"

Let's deal with the easy one first. When you associate a binary stream with
a file, the data flows in from the file, through the stream, unmodified
(or, if you're writing, it flows out, through the stream, to the file,
unmodified). It's just a raw stream of bytes, to do with as you will.

Okay, now the hard one. When you associate a /text/ stream with a file, you
are assuming the convention that the data comprises zero or more lines,
where each line is composed of 0 or more bytes followed by a newline marker
of some kind.

The newline marker defined by C is '\n'.

On Unix, this agrees well with the marker actually used by longstanding
convention on that system.

On CP/M and derivatives (such as Q-DOS and that point-and-click adventure
game it spawned), the marker is '\r' and '\n', in that order.

On the Mac, it's just plain '\r'.

On the mainframe - well, you /really/ don't want to know.

All this is a bit of a nuisance, and it would be nice if we didn't have to
bother with such niceties when processing plain ol' text. And so, when you
are reading from a text stream, the standard library performs any necessary
conversions on incoming data, to force-map the newline marker into a nice
simple '\n'. And when you are writing to the stream, the standard library
looks for '\n' characters and replaces them with the byte or bytes used for
marking newlines on the particular system on which the program is running.

So, when you are writing your code, you can just pretend that the newline
marker is '\n', and - to all intents and purposes - so it is! So you don't
have to mess about with detecting whether you're running on a Mac or a mini
or a mainframe - you can just assume a '\n' delimiter and let the standard
library worry about the underlying representation.

If you don't /want/ the system to do this, open the file in binary mode. But
then managing the newline stuff all falls to you instead.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
May 13 '06 #2

P: n/a
"vim" wrote:
Plz tell the differance between binary file and ascii
file...............


Here is a long and recent discussion on the subject.

http://tinyurl.com/s737d
May 13 '06 #3

P: n/a
vim
thanks a lot

May 13 '06 #4

P: n/a

vim wrote:
hello everybody
Plz tell the differance between binary file and ascii
Mate, you are in big trouble now. You just used 'silly' 'Plz'.
And this question is off-topic for someone.

If we come to answer. Every file is binary but when you open ASCII
file, you will see bytes representing elements of ASCII set. Every
element of ASCII set can be stored in char in C programming language.
And ASCII char is one byte and one byte is eight bits. And one bit only
can be 0 or 1. You will find more if you google for "binary
arithmetic".
file...............
Thanks
in advance
vim


May 13 '06 #5

P: n/a
Parahat Melayev said:

vim wrote:
hello everybody
Plz tell the differance between binary file and ascii
Mate, you are in big trouble now.


No, he isn't.
You just used 'silly' 'Plz'.
Yes, he did.
And this question is off-topic for someone.
No, it isn't. But in some ways, your answer is.
If we come to answer. Every file is binary
The C Standard does not guarantee this.
but when you open ASCII file,
The C Standard does not specify the concept "ASCII file".
you will see bytes representing elements of ASCII set. Every
element of ASCII set can be stored in char in C programming language.
And ASCII char is one byte and one byte is eight bits.


C does not specify that one byte is eight bits wide - although it does
specify that one byte is *at least* eight bits wide.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
May 13 '06 #6

P: n/a
Parahat Melayev wrote:
vim wrote:
hello everybody
Plz tell the differance between binary file and ascii
Mate, you are in big trouble now. You just used 'silly' 'Plz'.


It *is* a silly abbreviation. While nobody is going to be in big
trouble, consistent use of such incomprehensible English will simply
result the poster being ignored.
And this question is off-topic for someone.
Who said it was off-topic? Certainly it perfectly within topic as far
as I'm aware.
And ASCII char is one byte and one byte is eight bits.
A C byte is not always eight bits is size. Though 8-bit bytes are
common on PCs, other sizes are possible, and indeed are prevalent, on
Mainframes, DSPs etc. A char is garunteed by the standard to be atleast
8 bits but it could be more.
You will find more if you google for "binary arithmetic".


What has binary arithmetic got to do with C streams?

May 13 '06 #7

P: n/a

Richard Heathfield wrote:
[When] you
are reading from a text stream, the standard library performs any necessary
conversions on incoming data, to force-map the newline marker into a nice
simple '\n'. And when you are writing to the stream, the standard library
looks for '\n' characters and replaces them with the byte or bytes used for
marking newlines on the particular system on which the program is running. From this explanation, I am under the impression that stdin is

therefore opened in binary mode, since I find I have to explicitly deal
with '\r's to ensure that redirected input from text files works. For
example, I once wrote an rtrim function to remove trailing whitespace
from a input line assumed to have come from stdin (being part of a K&R
exercise, I didn't give myself the luxury of using things like
isspace(); the code quality is in any event not the focus of my
curiosity :-)

#include <stdio.h>

/* rtrim: removes trailing whitespace from s, re-attaches '\n' if
necessary*/
void rtrim(char s[], int len)
{
int i, newline;

newline = 0;
i = len - 1;
while(s[i] == '\t' || s[i] == ' ' || s[i] == '\n' || s[i] ==
'\r') {
if(s[i] == '\n')
newline = 1;
--i;
}

if(newline && i > 0)
s[++i] = '\n';
s[++i] = '\0';
}

I added the check for '\r' as a afterthought in order to make the code
more portable (since it broke under Windows) - the input line I was
passing to rtrim was being read from stdin using getchar() and was
simply a line terminated by a sole '\n' character. The '\r's of course
cropped up when I redirected Windows text files to stdin.

So again, it seems stdin is opened in binary mode, and not text mode,
since the newlines don't get converted to a single '\n'. Does the
Standard make any statement about the default mode stdin opens in (and
for that matter, stdout and stderr), and is it possible or worthwhile
to explicitly put stdin into text mode if you know that you are going
to deal with text input exclusively?

Or it may well be the case I am missing something more fundamental
here....

Mike S
---
"[BASIC] programmers...are mentally mutilated beyond hope of
regeneration. " - Dijkstra

May 14 '06 #8

P: n/a
Mike S said:

Richard Heathfield wrote:
[When] you
are reading from a text stream, the standard library performs any
necessary conversions on incoming data, to force-map the newline marker
into a nice simple '\n'. And when you are writing to the stream, the
standard library looks for '\n' characters and replaces them with the
byte or bytes used for marking newlines on the particular system on which
the program is running.

From this explanation, I am under the impression that stdin is

therefore opened in binary mode, since I find I have to explicitly deal
with '\r's to ensure that redirected input from text files works.


No, that almost certainly means simply that you've got a Windows text file
on a Linux system. Linux doesn't know, bless it, that you've been sullying
its filesystem with foreign muck. :-)

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
May 14 '06 #9

P: n/a
santosh wrote:
Parahat Melayev wrote:
vim wrote:
> hello everybody
> Plz tell the differance between binary file and ascii
Mate, you are in big trouble now. You just used 'silly' 'Plz'.


It *is* a silly abbreviation. While nobody is going to be in big
trouble, consistent use of such incomprehensible English will simply
result the poster being ignored.
And this question is off-topic for someone.


Who said it was off-topic? Certainly it perfectly within topic as far
as I'm aware.


This newsgroup is for C language related questions. The OP is asking a more
generalized question.

The OP may find help here -
http://en.wikipedia.org/wiki/Binary_arithmetic
And ASCII char is one byte and one byte is eight bits.
A C byte is not always eight bits is size. Though 8-bit bytes are
common on PCs, other sizes are possible, and indeed are prevalent, on
Mainframes, DSPs etc.


A byte is always 8 bits by definition!!! On older CDC computers, for
example, there was a "character" of 6 bits but it was never referred to as
a "byte".

A char is garunteed by the standard to be atleast
8 bits but it could be more.
A character is not some arbitrary size. A character is either one Byte (ie
8 bits) or in the case of Unicode it is two Bytes (ie 16 bits). BTW - the
word is "guaranteed".
You will find more if you google for "binary arithmetic".


What has binary arithmetic got to do with C streams?


Don't you know??
Alan

May 14 '06 #10

P: n/a
Alan said:
This newsgroup is for C language related questions. The OP is asking a
more generalized question.
No, he was asking for an explanation of binary mode and text mode in C
streams.

The OP may find help here -
http://en.wikipedia.org/wiki/Binary_arithmetic
Unlikely, since he was not asking about binary arithmetic.
A byte is always 8 bits by definition!!!
Not true in C. I've used a system with 32-bit bytes, and I'm by no means the
only one here who has done so.
A char is garunteed by the standard to be atleast
8 bits but it could be more.


A character is not some arbitrary size.


It is exactly CHAR_BIT bits wide, and CHAR_BIT is at least 8 but can be
more.
A character is either one Byte
(ie
8 bits) or in the case of Unicode it is two Bytes (ie 16 bits).
No, a character is always exactly one byte in size. If a Unicode glyph
representation won't fit in a single byte, then it won't fit in a character
either, but will have to make do with a "wide character".
BTW - the word is "guaranteed".


Careful - those who live by the spelling flame will die by the spelling
flame.
You will find more if you google for "binary arithmetic".


What has binary arithmetic got to do with C streams?


Don't you know??


No. Please explain the connection.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
May 14 '06 #11

P: n/a
Richard Heathfield wrote:
Parahat Melayev said:

vim wrote:
hello everybody
Plz tell the differance between binary file and ascii


Mate, you are in big trouble now.


No, he isn't.
You just used 'silly' 'Plz'.


Yes, he did.
And this question is off-topic for someone.


No, it isn't. But in some ways, your answer is.
If we come to answer. Every file is binary


The C Standard does not guarantee this.
but when you open ASCII file,


The C Standard does not specify the concept "ASCII file".
you will see bytes representing elements of ASCII set. Every
element of ASCII set can be stored in char in C programming language.
And ASCII char is one byte and one byte is eight bits.


C does not specify that one byte is eight bits wide - although it does
specify that one byte is *at least* eight bits wide.


I think you mean character not byte

"Byte: a group of eight binary digits, often used to represent one
character". The Concise Oxford Dictionary.

"Byte: a group of eight binary digits processed as a unit by a computer and
used to represent an alphanumeric character". Merriam-Webster Dictionary.

We have to use some standard definition of words otherwise we will fall into
a morass of misunderstanding.

Alan

May 14 '06 #12

P: n/a
Alan said:
Richard Heathfield wrote:
C does not specify that one byte is eight bits wide - although it does
specify that one byte is *at least* eight bits wide.
I think you mean character not byte


Well, you're wrong. I meant byte.
"Byte: a group of eight binary digits, often used to represent one
character". The Concise Oxford Dictionary.
"byte: addressable unit of data storage large enough to hold any member of
the basic character set of the execution environment.
Note 1: It is possible to express the address of each individual byte of an
object uniquely.
Note 2: A byte is composed of a contiguous sequence of bits, the number of
which is implementation-defined." - ISO/IEC 9899:1999

Authoritative technical definitions trump dictionary definitions.
We have to use some standard definition of words otherwise we will fall
into a morass of misunderstanding.


That's why we have an International C Standard, which defines "byte" very
precisely.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
May 14 '06 #13

P: n/a
jjf

Alan wrote:
Richard Heathfield wrote:

C does not specify that one byte is eight bits wide - although it does
specify that one byte is *at least* eight bits wide.
I think you mean character not byte


You think incorrectly.
"Byte: a group of eight binary digits, often used to represent one
character". The Concise Oxford Dictionary.

"Byte: a group of eight binary digits processed as a unit by a computer and
used to represent an alphanumeric character". Merriam-Webster Dictionary.
It's surprising that these two authorities should both be wrong, but
they are. A byte is a group of bits, the number defined by the context
in which the term is used. That's why the International Standardization
groups had to invent the word 'octet' to mean a byte of 8 bits.
We have to use some standard definition of words otherwise we will fall into
a morass of misunderstanding.


Indeed. In the C context that standard definition is provided by the C
Standard. A byte is defined to be the same size as a char, which is an
implementation-defined size greater than 7 bits.

May 14 '06 #14

P: n/a
jjf

Alan wrote:
santosh wrote:

A C byte is not always eight bits is size. Though 8-bit bytes are
common on PCs, other sizes are possible, and indeed are prevalent, on
Mainframes, DSPs etc.


A byte is always 8 bits by definition!!! On older CDC computers, for
example, there was a "character" of 6 bits but it was never referred to as
a "byte".


Nonsense. A byte is a group of bits of a size defined by its context.
I've worked on systems which had bytes of 6 bits. That couldn't be used
as a byte in C of course.
A char is garunteed by the standard to be atleast
8 bits but it could be more.


A character is not some arbitrary size. A character is either one Byte (ie
8 bits) or in the case of Unicode it is two Bytes (ie 16 bits). BTW - the
word is "guaranteed".


The size of a character is defined by its character set definition.
ASCII characters are 7 bits; 8859-1 characters are 8 bits; Unicode
characters are 21 bits (assuming you use the single-word
representation).
You will find more if you google for "binary arithmetic".


What has binary arithmetic got to do with C streams?


Don't you know??


I don't; an explanation would be welcome.

May 14 '06 #15

P: n/a
Richard Heathfield <in*****@invalid.invalid> writes:
Alan said:
This newsgroup is for C language related questions. The OP is asking a
more generalized question.


No, he was asking for an explanation of binary mode and text mode in C
streams.


Perhaps, but he didn't say so. The original question was:

| hello everybody
| Plz tell the differance between binary file and ascii
| file...............

I don't think we can necessarily assume he was asking about C streams
(though he should have been, given that he posted the question here).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
May 14 '06 #16

P: n/a
jj*@bcs.org.uk wrote:
Alan wrote:


<snip>
We have to use some standard definition of words otherwise we will fall into
a morass of misunderstanding.


Indeed. In the C context that standard definition is provided by the C
Standard. A byte is defined to be the same size as a char, which is an
implementation-defined size greater than 7 bits.


Greater than 8 bits. Of course, as 8 is greater than 7 it is also
greater than 7 bits ;-)
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

Inviato da X-Privat.Org - Registrazione gratuita http://www.x-privat.org/join.php
May 14 '06 #17

P: n/a
Mike S wrote:
Does the
Standard make any statement about the default mode stdin opens in (and
for that matter, stdout and stderr),


Yes, they are all text mode.

--
pete
May 14 '06 #18

P: n/a
Flash Gordon said:
jj*@bcs.org.uk wrote:
Alan wrote:


<snip>
We have to use some standard definition of words otherwise we will fall
into a morass of misunderstanding.


Indeed. In the C context that standard definition is provided by the C
Standard. A byte is defined to be the same size as a char, which is an
implementation-defined size greater than 7 bits.


Greater than 8 bits.


No, greater than 7 bits. It is legal for CHAR_BIT to be 8, as you really
ought to know. Please make sure your "corrections" are correct before
posting.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
May 14 '06 #19

P: n/a
In article <wL********************@bt.com>, Richard Heathfield
<in*****@invalid.invalid> writes
Parahat Melayev said:

vim wrote:
hello everybody
Plz tell the differance between binary file and ascii


Mate, you are in big trouble now.


No, he isn't.
You just used 'silly' 'Plz'.


Yes, he did.
And this question is off-topic for someone.


No, it isn't. But in some ways, your answer is.
If we come to answer. Every file is binary


The C Standard does not guarantee this.
but when you open ASCII file,


The C Standard does not specify the concept "ASCII file".
you will see bytes representing elements of ASCII set. Every
element of ASCII set can be stored in char in C programming language.
And ASCII char is one byte and one byte is eight bits.


C does not specify that one byte is eight bits wide - although it does
specify that one byte is *at least* eight bits wide.


Even though in some architectures is isn't 8bits (or 7 for that matter)
though fortunately that is more historical than current.

--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills Staffs England /\/\/\/\/
/\/\/ ch***@phaedsys.org www.phaedsys.org \/\/\
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

May 14 '06 #20

P: n/a
In article <44***********************@news.optusnet.com.au> , Alan
<i.****@octopus.com.au> writes
Richard Heathfield wrote:
Parahat Melayev said:

vim wrote:
hello everybody
Plz tell the differance between binary file and ascii

Mate, you are in big trouble now.


No, he isn't.
You just used 'silly' 'Plz'.


Yes, he did.
And this question is off-topic for someone.


No, it isn't. But in some ways, your answer is.
If we come to answer. Every file is binary


The C Standard does not guarantee this.
but when you open ASCII file,


The C Standard does not specify the concept "ASCII file".
you will see bytes representing elements of ASCII set. Every
element of ASCII set can be stored in char in C programming language.
And ASCII char is one byte and one byte is eight bits.


C does not specify that one byte is eight bits wide - although it does
specify that one byte is *at least* eight bits wide.


I think you mean character not byte

"Byte: a group of eight binary digits, often used to represent one
character". The Concise Oxford Dictionary.

"Byte: a group of eight binary digits processed as a unit by a computer and
used to represent an alphanumeric character". Merriam-Webster Dictionary.

We have to use some standard definition of words otherwise we will fall into
a morass of misunderstanding.


Unfortunately both are wrong... A byte AFAIK is the smallest amount you
can address on a MCU/CPU. Now 95% of the time it is 8 bits but in the
past it has been many other things.

--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills Staffs England /\/\/\/\/
/\/\/ ch***@phaedsys.org www.phaedsys.org \/\/\
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

May 14 '06 #21

P: n/a
In article <44***********************@news.optusnet.com.au> , Alan
<i.****@octopus.com.au> writes
santosh wrote:
Parahat Melayev wrote:
vim wrote:
> hello everybody
> Plz tell the differance between binary file and ascii

Mate, you are in big trouble now. You just used 'silly' 'Plz'.
It *is* a silly abbreviation. While nobody is going to be in big
trouble, consistent use of such incomprehensible English will simply
result the poster being ignored.
And this question is off-topic for someone.


Who said it was off-topic? Certainly it perfectly within topic as far
as I'm aware.


This newsgroup is for C language related questions. The OP is asking a more
generalized question.


We keep having this argument. SOME users want to limit the scope of this
NG and other want to widen it. As there is no charter the majority will
eventually prevail.

And ASCII char is one byte and one byte is eight bits.


A C byte is not always eight bits is size. Though 8-bit bytes are
common on PCs, other sizes are possible, and indeed are prevalent, on
Mainframes, DSPs etc.


A byte is always 8 bits by definition!!!


I don't think so. My Father who has been in computing since 1952 would
not agree with you. AS you say....
On older CDC computers, for
example,
Actually many other "older computers that is pre the mid-late 1980's
when the PC and other 8 bit "home" computers came to prominence. A byte
width of all sorts of sizes was used previously.

Bytes were *usually* but not always the same size as a single character
the character set being used.

there was a "character" of 6 bits but it was never referred to as
a "byte".


Characters were all sorts of sizes (I have a comms program that would
handle 4 to 9 bits)
A char is garunteed by the standard to be atleast
8 bits but it could be more.


A character is not some arbitrary size. A character is either one Byte (ie
8 bits) or in the case of Unicode it is two Bytes (ie 16 bits). BTW - the
word is "guaranteed".


I though a char could hold a character. Some systems use less than 8
bits and some more but, in the past these were not always multiples of 8
bits.

Because the size of byte is variable we used OCTET for 8 bits in the
comms industry where PC's are not common.
--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills Staffs England /\/\/\/\/
/\/\/ ch***@phaedsys.org www.phaedsys.org \/\/\
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

May 14 '06 #22

P: n/a
Chris Hills wrote:
Because the size of byte is variable we used OCTET for 8 bits in the
comms industry where PC's are not common.


In C, which is the topic here of this newsgroup,
the "size" of a type, is measured in bytes.

The "width" of a byte, varries according to the implementation.

CHAR_BIT from limits.h, is the width of a byte.

--
pete
May 14 '06 #23

P: n/a
Richard Heathfield wrote:
Flash Gordon said:
jj*@bcs.org.uk wrote:
Alan wrote:

<snip>
We have to use some standard definition of words otherwise we will fall
into a morass of misunderstanding.
Indeed. In the C context that standard definition is provided by the C
Standard. A byte is defined to be the same size as a char, which is an
implementation-defined size greater than 7 bits.

Greater than 8 bits.


No, greater than 7 bits. It is legal for CHAR_BIT to be 8, as you really
ought to know. Please make sure your "corrections" are correct before
posting.


Obviously too early on a Sunday morning. I'm so used to seeing >=8 I
misread it.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc
May 14 '06 #24

P: n/a
Chris Hills posted:

Unfortunately both are wrong... A byte AFAIK is the smallest amount you
can address on a MCU/CPU. Now 95% of the time it is 8 bits but in the
past it has been many other things.


Incorrect.

On a particular system, it's okay if:

A) The smallest amount of memory addressable is 16 bits.
B) A char is 8 bits. (And thus a byte is 8 bits).

To facilitate this on this particular system, a char* holds more info than
an int*, because it also specifies whether it wants "the first 8 bits or
the last 8 bits".

-Tomás
May 14 '06 #25

P: n/a
"Richard Heathfield" writes:

"Byte: a group of eight binary digits, often used to represent one
character". The Concise Oxford Dictionary.


"byte: addressable unit of data storage large enough to hold any member of
the basic character set of the execution environment.
Note 1: It is possible to express the address of each individual byte of
an
object uniquely.
Note 2: A byte is composed of a contiguous sequence of bits, the number of
which is implementation-defined." - ISO/IEC 9899:1999

Authoritative technical definitions trump dictionary definitions.
We have to use some standard definition of words otherwise we will fall
into a morass of misunderstanding.


That's why we have an International C Standard, which defines "byte" very
precisely.


I'll be damned! In Note 2, they defined byte very precisely as a word that
simply means a collection of contiguous bits. They took a widely used
word, that meant something to hundreds of thousands of people and redefined
it to mean something entirely different.

There are about 30 definitions of byte that make the cut on google, and the
*vast* majority say a byte is eight bits.

http://tinyurl.com/j79j4

That's simply appalling! Now the world needs a committee to define a word
for eight contiguous bits. How about naming it in honor of the clown who
got that inserted into the standard?

Historically, the smallest addressable unit of storage was a character.
They seem to have gotten tangled up and ignored the distinction between a
character and a character code, and ignored the fact hat they were different
things. I think this made up example from history is right: The IBM 7094
has a six-bit character. The character code is BCD.

Note that addressable does not imply that a single character can be read
from memory, it only means there are hardware instructions to do something
useful at this level.
May 14 '06 #26

P: n/a
Tomás said:
Chris Hills posted:

Unfortunately both are wrong... A byte AFAIK is the smallest amount you
can address on a MCU/CPU. Now 95% of the time it is 8 bits but in the
past it has been many other things.


Incorrect.

On a particular system, it's okay if:

A) The smallest amount of memory addressable is 16 bits.
B) A char is 8 bits. (And thus a byte is 8 bits).


Well, the underlying system can be like that, but if so, a conforming
implementation must either set CHAR_BIT as 16 or do some magic to make it
appear as if octets are addressable individually.

Chris is perfectly correct, from a C perspective.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
May 14 '06 #27

P: n/a
"osmium" <r1********@comcast.net> wrote in message
news:4c*************@individual.net...
That's why we have an International C Standard, which defines "byte" very
precisely.


I'll be damned! In Note 2, they defined byte very precisely as a word
that simply means a collection of contiguous bits. They took a widely
used word, that meant something to hundreds of thousands of people and
redefined it to mean something entirely different.

There are about 30 definitions of byte that make the cut on google, and
the *vast* majority say a byte is eight bits.


We forgot to do a web search before we chose that terminology in 1983.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
May 14 '06 #28

P: n/a
"P.J. Plauger" writes:
I'll be damned! In Note 2, they defined byte very precisely as a word
that simply means a collection of contiguous bits. They took a widely
used word, that meant something to hundreds of thousands of people and
redefined it to mean something entirely different.

There are about 30 definitions of byte that make the cut on google, and
the *vast* majority say a byte is eight bits.


We forgot to do a web search before we chose that terminology in 1983.


I appreciate your sarcasm and have no desire to argue with anyone - and most
certainly not with you.

But wasn't the word byte pretty much introduced into the world by the IBM
360 in 1964 or thereabouts?
May 14 '06 #29

P: n/a
osmium said:
"P.J. Plauger" writes:
I'll be damned! In Note 2, they defined byte very precisely as a word
that simply means a collection of contiguous bits. They took a widely
used word, that meant something to hundreds of thousands of people and
redefined it to mean something entirely different.

There are about 30 definitions of byte that make the cut on google, and
the *vast* majority say a byte is eight bits.


We forgot to do a web search before we chose that terminology in 1983.


I appreciate your sarcasm and have no desire to argue with anyone - and
most certainly not with you.

But wasn't the word byte pretty much introduced into the world by the IBM
360 in 1964 or thereabouts?


Knuth says that the 8-bit "standardisation" happened in around 1975 or so.
By then, C was already well under way, and dmr was almost certainly
accustomed to using the word in its non-"standard" sense.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
May 14 '06 #30

P: n/a
osmium wrote:
But wasn't the word byte pretty much introduced into the world by the IBM
360 in 1964 or thereabouts?


No. The word "byte" was common long before 1964, and with many machines
when no one had ever hallucinated a 360. In fact, it is from 1956,
coined by Werner Buchholz.
At least you got the company right. Otherwise, you have wrong
the date: 1956, not 1964
the computer: IBM Stretch, not IBM 360
You probably got the size wrong, too. Its orginal incarnation was 1 to
6 bits, not 8. The Dec family, especially the PDP-6 and -10 nicely
extended this to 1 to 36 bits.
May 14 '06 #31

P: n/a
"Tomás" <NU**@NULL.NULL> writes:
Chris Hills posted:
Unfortunately both are wrong... A byte AFAIK is the smallest amount you
can address on a MCU/CPU. Now 95% of the time it is 8 bits but in the
past it has been many other things.


Incorrect.

On a particular system, it's okay if:

A) The smallest amount of memory addressable is 16 bits.
B) A char is 8 bits. (And thus a byte is 8 bits).

To facilitate this on this particular system, a char* holds more info than
an int*, because it also specifies whether it wants "the first 8 bits or
the last 8 bits".


Then an 8-bit byte is addressible (with a little extra effort by the
compiler).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
May 14 '06 #32

P: n/a
Chris Hills <ch***@phaedsys.org> writes:
In article <44***********************@news.optusnet.com.au> , Alan
<i.****@octopus.com.au> writes

[...]
This newsgroup is for C language related questions. The OP is asking a more
generalized question.


We keep having this argument. SOME users want to limit the scope of this
NG and other want to widen it. As there is no charter the majority will
eventually prevail.


There is currently exactly one newsgroup where we can discuss standard
C without getting bogged down in system-specific or otherwise
irrelevant details. (comp.lang.c.moderated is too slow, and
comp.std.c has a different purpose.) Some users want us to have *no*
such newsgroup at all. It's possible that they'll eventually prevail,
but I sincerely hope they don't.

Chris, I've asked you a couple of questions on this topic, and you've
never answered them.

You mentioned other (non-Usenet) forums where people discuss the C
programming language. I'd like to take a look. Where are they?

If you think the current topicality guidelines should be changed,
*how* do you think they should be changed? Can you give specific
examples of things you think should be considered topical that
currently aren't? If there were a charter for this newsgroup, what
should it say?

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
May 14 '06 #33

P: n/a
"osmium" <r1********@comcast.net> writes:
[...]
I'll be damned! In Note 2, they defined byte very precisely as a word that
simply means a collection of contiguous bits. They took a widely used
word, that meant something to hundreds of thousands of people and redefined
it to mean something entirely different.

There are about 30 definitions of byte that make the cut on google, and the
*vast* majority say a byte is eight bits.

http://tinyurl.com/j79j4

That's simply appalling! Now the world needs a committee to define a word
for eight contiguous bits. How about naming it in honor of the clown who
got that inserted into the standard?

Historically, the smallest addressable unit of storage was a character.
They seem to have gotten tangled up and ignored the distinction between a
character and a character code, and ignored the fact hat they were different
things. I think this made up example from history is right: The IBM 7094
has a six-bit character. The character code is BCD.


No, the term "byte" did not originally mean exactly 8 bits.

<http://www.catb.org/~esr/jargon/html/B/byte.html> says:

byte: /bi:t/, n.

[techspeak] A unit of memory or data equal to the amount used to
represent one character; on modern architectures this is
invariably 8 bits. Some older architectures used byte for
quantities of 6, 7, or (especially) 9 bits, and the PDP-10
supported bytes that were actually bitfields of 1 to 36 bits!
These usages are now obsolete, killed off by universal adoption of
power-of-2 word sizes.

Historical note: The term was coined by Werner Buchholz in 1956
during the early design phase for the IBM Stretch computer;
originally it was described as 1 to 6 bits (typical I/O equipment
of the period used 6-bit chunks of information). The move to an
8-bit byte happened in late 1956, and this size was later adopted
and promulgated as a standard by the System/360. The word was
coined by mutating the word .bite. so it would not be accidentally
misspelled as bit. See also nybble.

(I would dispute the use of the word "invariably".)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
May 14 '06 #34

P: n/a
In article <ln************@nuthaus.mib.org>, Keith Thompson <kst-
u@mib.org> writes
Chris Hills <ch***@phaedsys.org> writes:
In article <44***********************@news.optusnet.com.au> , Alan
<i.****@octopus.com.au> writes[...]
This newsgroup is for C language related questions. The OP is asking a more
generalized question.


We keep having this argument. SOME users want to limit the scope of this
NG and other want to widen it. As there is no charter the majority will
eventually prevail.


There is currently exactly one newsgroup where we can discuss standard
C without getting bogged down in system-specific or otherwise
irrelevant details. (comp.lang.c.moderated is too slow, and
comp.std.c has a different purpose.) Some users want us to have *no*
such newsgroup at all. It's possible that they'll eventually prevail,
but I sincerely hope they don't.

Chris, I've asked you a couple of questions on this topic, and you've
never answered them.


I have them marked for reply and feeling guilty I have not done so. It
needs thinking about for specific answers and I have been busy this week
and did not want to just dash off a quick reply.
You mentioned other (non-Usenet) forums where people discuss the C
programming language. I'd like to take a look. Where are they?
I found some on both Yahoo and Google. The problem is it is very easy on
Yahoo to start a group compared to usenet. SO some people never get past
the web interface of Yahoo and to the usenet groups.

It came as quite a surprise to me that there were so many c, c++ 8051
and 8 bit MCU Yahoo groups. People how have no idea that there is a
comp.arch.embedded. Eventually the people using some of the usenet the
technical groups are going to fade away and not be replaced.
If you think the current topicality guidelines should be changed,
*how* do you think they should be changed? Can you give specific
examples of things you think should be considered topical that
currently aren't? If there were a charter for this newsgroup, what
should it say?


That is a good idea.... can we draw up a charter? It would mean people
have to think about it. I did not replay to your "specific" question as
it needed some thought and I did not want to do an off the cuff reply.

The problem is it is difficult to be specific on vaguely loosening a
tight spec. :-)


--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills Staffs England /\/\/\/\/
/\/\/ ch***@phaedsys.org www.phaedsys.org \/\/\
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

May 14 '06 #35

P: n/a
"Chris Hills" wrote:
Keith Thompson wrote:

If you think the current topicality guidelines should be changed,
*how* do you think they should be changed? Can you give specific
examples of things you think should be considered topical that
currently aren't? If there were a charter for this newsgroup, what
should it say?


That is a good idea.... can we draw up a charter? It would mean people
have to think about it. I did not replay to your "specific" question as
it needed some thought and I did not want to do an off the cuff reply.


<I *think* that pruning is right>

My guess is that the question is more along the lines of "If I gave you a
million dollars, what would you do with it?".
May 14 '06 #36

P: n/a
"osmium" <r1********@comcast.net> wrote in message
news:4c*************@individual.net...
"P.J. Plauger" writes:
I'll be damned! In Note 2, they defined byte very precisely as a word
that simply means a collection of contiguous bits. They took a widely
used word, that meant something to hundreds of thousands of people and
redefined it to mean something entirely different.

There are about 30 definitions of byte that make the cut on google, and
the *vast* majority say a byte is eight bits.


We forgot to do a web search before we chose that terminology in 1983.


I appreciate your sarcasm and have no desire to argue with anyone - and
most certainly not with you.

But wasn't the word byte pretty much introduced into the world by the IBM
360 in 1964 or thereabouts?


As others have pointed out in detail, byte is an old word. IBM's System/360,
introduced in the early 1960s, began the modern trend to eight-bit bytes
and byte-resolution arithmetic. Nevertheless, in the 1980s it was still not
uncommon to refer to "an x-bit byte machine", where x assumed quite a few
different values. Thus, the C Standard hardly plowed any new ground in
this area, and certainly didn't defy common parlance of the time.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
May 14 '06 #37

P: n/a
Chris Hills wrote:
Keith Thompson <ks***@mib.org> writes
.... snip ...
If you think the current topicality guidelines should be changed,
*how* do you think they should be changed? Can you give specific
examples of things you think should be considered topical that
currently aren't? If there were a charter for this newsgroup,
what should it say?


That is a good idea.... can we draw up a charter? It would mean
people have to think about it. I did not replay to your "specific"
question as it needed some thought and I did not want to do an off
the cuff reply.

The problem is it is difficult to be specific on vaguely loosening
a tight spec. :-)


If you are going to make a proposal then I suggest you start from
"the tight spec". Remember that even now things involving POSIX
are adequately housed on comp.unix.programming (or such) and
windows (cursed be its name) on microsoft.*.

One loosening that might be worthwhile is advice on how to make
various almost-C compilers hew to the various standards, and their
failings.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
May 14 '06 #38

P: n/a
In article <4c*************@individual.net>, osmium
<r1********@comcast.net> writes
"Chris Hills" wrote:
Keith Thompson wrote:

If you think the current topicality guidelines should be changed,
*how* do you think they should be changed? Can you give specific
examples of things you think should be considered topical that
currently aren't? If there were a charter for this newsgroup, what
should it say?


That is a good idea.... can we draw up a charter? It would mean people
have to think about it. I did not replay to your "specific" question as
it needed some thought and I did not want to do an off the cuff reply.


<I *think* that pruning is right>

My guess is that the question is more along the lines of "If I gave you a
million dollars, what would you do with it?".


500K on wine, women and parties. Then just fritter the rest away :-)

--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills Staffs England /\/\/\/\/
/\/\/ ch***@phaedsys.org www.phaedsys.org \/\/\
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

May 14 '06 #39

P: n/a

Richard Heathfield wrote:
No, that almost certainly means simply that you've got a Windows text file
on a Linux system. Linux doesn't know, bless it, that you've been sullying
its filesystem with foreign muck. :-)


Sorry, I should have explained myself more clearly. At the moment I am
running on Windows with a Windows port of gcc. But, before I get
off-topic with environment specs, my real question is simply: does the
Standard require that stdin, stdout, and stdout be opened in a known
mode or is this detail left to the compiler? I ask because if I compile
the following (say as test.exe):

#include <stdio.h>

/* count occurences of '\r' in the input stream */
int main()
{
int c, count;
count = 0;
while ((c = getchar()) != EOF)
if(c == '\r')
++count;
printf("counted %d \\r's in input.\n", count);
return 0;
}

and I run the program with itself as input ("test < test.c"), the
result (on my machine) is

counted 12 \r's in input.

However, as I understand it, any '\r\n' sequences in the input stream
should have been mapped to '\n'. The only literature I can find
relating to this is what Microsoft has to say about how their C
compilers open stdin (they say it is opened in text mode in their
compilers). But what does the ANSI/ISO Standard say about what mode
stdin, stdout, and stderr are opened in? Maybe my compiler is just
misbehaving...

Mike S

May 14 '06 #40

P: n/a
"Keith Thompson" <ks***@mib.org> wrote
"Tomás" <NU**@NULL.NULL> writes:
To facilitate this on this particular system, a char* holds more info
than
an int*, because it also specifies whether it wants "the first 8 bits or
the last 8 bits".


Then an 8-bit byte is addressible (with a little extra effort by the
compiler).

Thats where the whole language begins to break down. If pointers no longer
represent machine addresses then we've lost the primary design goal of C,
which is to allow the programmer direct access to the computer's memory.
Then you've got the problem that every void pointer has to be an extended
pointer, so conversions cause code to be executed. Then most registers hold
exactly an address, so do void pointers fit in registers? So perfectly
unexceptional code with nothing to do with octets but passing things around
in void *s hits the treacle.
--
www.personal.leeds.ac.uk/~bgy1mm
May 14 '06 #41

P: n/a
In article <6b******************************@bt.com> in*****@invalid.invalid writes:
osmium said:

....
But wasn't the word byte pretty much introduced into the world by the IBM
360 in 1964 or thereabouts?


Knuth says that the 8-bit "standardisation" happened in around 1975 or so.


Indeed. And the word byte was indeed introduced by IBM, but originally
was intended for 6 bit quantities. With the 360 the new bytes became 8 bits.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
May 14 '06 #42

P: n/a
On Sun, 14 May 2006 14:32:11 UTC, "osmium" <r1********@comcast.net>
wrote:
I'll be damned! In Note 2, they defined byte very precisely as a word that
simply means a collection of contiguous bits. They took a widely used
word, that meant something to hundreds of thousands of people and redefined
it to mean something entirely different.

There are about 30 definitions of byte that make the cut on google, and the
*vast* majority say a byte is eight bits.

http://tinyurl.com/j79j4

That's simply appalling! Now the world needs a committee to define a word
for eight contiguous bits. How about naming it in honor of the clown who
got that inserted into the standard?

Historically, the smallest addressable unit of storage was a character.
Noways. The smallest addressable unit of storage was a word holding 12
decimal digits. That word was interpreted by the CPU as
- command:
- 4 decimal digit commad word
- 8 decimal digit memory address target/source date
to read/add/sub/store to/from accu from/to memory
- 8 decimal digit decimal constant to add/sub to/from accu
- 12 decimal digit numerical value
- up to 6 ASCI bytes high to low order when accessed with an
command specified as transferring text.

At that time there was no standard.
They seem to have gotten tangled up and ignored the distinction between a
character and a character code, and ignored the fact hat they were different
things. I think this made up example from history is right: The IBM 7094
has a six-bit character. The character code is BCD.

Note that addressable does not imply that a single character can be read
from memory, it only means there are hardware instructions to do something
useful at this level.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
May 15 '06 #43

P: n/a
In article <44***************@yahoo.com>, CBFalconer
<cb********@yahoo.com> writes
Chris Hills wrote:
Keith Thompson <ks***@mib.org> writes
... snip ...
If you think the current topicality guidelines should be changed,
*how* do you think they should be changed? Can you give specific
examples of things you think should be considered topical that
currently aren't? If there were a charter for this newsgroup,
what should it say?


That is a good idea.... can we draw up a charter? It would mean
people have to think about it. I did not replay to your "specific"
question as it needed some thought and I did not want to do an off
the cuff reply.

The problem is it is difficult to be specific on vaguely loosening
a tight spec. :-)


If you are going to make a proposal then I suggest you start from
"the tight spec". Remember that even now things involving POSIX
are adequately housed on comp.unix.programming (or such) and
windows (cursed be its name) on microsoft.*.

One loosening that might be worthwhile is advice on how to make
various almost-C compilers hew to the various standards,


Why would they want to? The job of standards is to standardise industry
practice not go off on a whim and expect the industry to follow. That
is the problem with C at the moment. It has gone in a direction the
industry does not want to follow.
and their failings.


Others have commented on the failings of the standards including people
on the standards panels.


--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills Staffs England /\/\/\/\/
/\/\/ ch***@phaedsys.org www.phaedsys.org \/\/\
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

May 15 '06 #44

P: n/a
On Sun, 14 May 2006 07:32:11 -0700,
osmium <r1********@comcast.net> wrote
in Msg. <4c*************@individual.net>
That's simply appalling! Now the world needs a committee to define a word
for eight contiguous bits.


That word already exists: octet.

From wikipedia:

In computer technology and networking, an octet is a
group of 8 bits.

[...]

However, the size of a byte is determined by the architecture of a
particular computer system: some old computers had 9, 10, or 12-bit
bytes, while others had bytes as small as 5 or 6 bits. An octet is
always exactly 8 bits. As a result, computer networking standards
almost exclusively use "octet" to refer to the 8-bit quantity.

robert
May 15 '06 #45

P: n/a
Richard Heathfield <in*****@invalid.invalid> wrote:
vim said:
hello everybody
Plz tell the differance between binary file and ascii
file...............
Well, that's really the wrong question.

The right question is: "what is the difference between a stream opened in
binary mode, and a stream opened in text mode?"

[sniped explanations of line translation]
If you don't /want/ the system to do this, open the file in binary mode. But
then managing the newline stuff all falls to you instead.

Why is there the text mode in the first place? All operations valid
for text streams seem to be valid for binary ones, too. Text streams
are more difficult to handle (eg. you can't calculate offsets, there's
some extra undefinededness). Apart from system compatibility, is there
any advantage to opening files in text mode?

--
Stan Tobias
mailx `echo si***@FamOuS.BedBuG.pAlS.INVALID | sed s/[[:upper:]]//g`
May 15 '06 #46

P: n/a
Mike S wrote:
Richard Heathfield wrote:
No, that almost certainly means simply that you've got a Windows text file
on a Linux system. Linux doesn't know, bless it, that you've been sullying
its filesystem with foreign muck. :-)


Sorry, I should have explained myself more clearly. At the moment I am
running on Windows with a Windows port of gcc. But, before I get
off-topic with environment specs, my real question is simply: does the
Standard require that stdin, stdout, and stdout be opened in a known
mode or is this detail left to the compiler? I ask because if I compile
the following (say as test.exe):

#include <stdio.h>

/* count occurences of '\r' in the input stream */
int main()
{
int c, count;
count = 0;
while ((c = getchar()) != EOF)
if(c == '\r')
++count;
printf("counted %d \\r's in input.\n", count);
return 0;
}

and I run the program with itself as input ("test < test.c"), the
result (on my machine) is

counted 12 \r's in input.

However, as I understand it, any '\r\n' sequences in the input stream
should have been mapped to '\n'. The only literature I can find
relating to this is what Microsoft has to say about how their C
compilers open stdin (they say it is opened in text mode in their
compilers). But what does the ANSI/ISO Standard say about what mode
stdin, stdout, and stderr are opened in? Maybe my compiler is just
misbehaving...

Mike S

At my house your program says..

counted 0 \r's in input.

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
May 15 '06 #47

P: n/a
"S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> wrote in message
news:4c*************@individual.net...
Richard Heathfield <in*****@invalid.invalid> wrote:
vim said:
hello everybody
Plz tell the differance between binary file and ascii
file...............


Well, that's really the wrong question.

The right question is: "what is the difference between a stream opened in
binary mode, and a stream opened in text mode?"

[sniped explanations of line translation]

If you don't /want/ the system to do this, open the file in binary mode.
But
then managing the newline stuff all falls to you instead.

Why is there the text mode in the first place? All operations valid
for text streams seem to be valid for binary ones, too. Text streams
are more difficult to handle (eg. you can't calculate offsets, there's
some extra undefinededness). Apart from system compatibility, is there
any advantage to opening files in text mode?


System compatibility is a damned important reason. Every system has its
own convention for representing text files, as created by text editors
and consumed by other text-processing programs. If that convention doesn't
match the C convention -- zero or more lines of arbitrary length, each
terminated by a newline -- somebody has to do some mapping. Whitesmiths,
Ltd. introduced the text/binary dichotomy in 1978 when porting C to
dozens of non-Unix systems, and other companies did much the same thing
in the coming years. It was a slam dunk to put it in the draft C Standard
begun in 1983.

If you try to live with just binary mode, then every program either has
to map text files for itself or tolerate a broad assortment of rules for
delimiting text lines. There's precedent for the latter approach too
(see, for example, Java), but Unix gives a powerful precedent for
having a uniform internal convention for representing text streams.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
May 15 '06 #48

P: n/a
S.Tobias wrote:
Richard Heathfield <in*****@invalid.invalid> wrote:
vim said:
hello everybody
Plz tell the differance between binary file and ascii
file...............

Well, that's really the wrong question.

The right question is: "what is the difference between a stream opened in
binary mode, and a stream opened in text mode?"

[sniped explanations of line translation]
If you don't /want/ the system to do this, open the file in binary mode. But
then managing the newline stuff all falls to you instead.

Why is there the text mode in the first place? All operations valid
for text streams seem to be valid for binary ones, too. Text streams
are more difficult to handle (eg. you can't calculate offsets, there's
some extra undefinededness). Apart from system compatibility, is there
any advantage to opening files in text mode?


Without text streams how can you produce a C source file that is
guaranteed to produce a valid text file on whatever system you run the
program on? Historically systems have used rather more schemes than just
terminating lines with CR, CRLF or LF, some have used some form or
record format, e.d. the first couple of bytes on a line saying how long
the line is.

Who is to say that in the future a system might not choose to encode the
file type as a mime header? Then the system might not even let you open
a text file as a binary file, or open a binary file as a text file, and
such restrictions could be useful. Also, on such a system, if you
created a file as a binary file the normal text editor of the system
might refuse to open it!

So the compatibility aspect is pretty major.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc
May 15 '06 #49

P: n/a
Richard Heathfield wrote:
[...]
The newline marker defined by C is '\n'. [... How various systems "really" store EOL in a file ...] On the mainframe - well, you /really/ don't want to know.

[...]

I recall using a "mainframe" (okay, really a micro version of a mainframe)
that didn't have any "end of line marker" in text files. Instead, each
line started with two bytes containing the length of the line, and the
contents would be padded to an even number of bytes. So, a text file with
two lines -- "hi" and "there" -- would actually contain thissequence of
bytes:

0x02, 0x00, 'h', 'i', 0x05, 0x00, 't', 'h', 'e', 'r', 'e', 0x00

This is also a perfect example why you can't pass arbitrary values to
fseek().

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:Th*************@gmail.com>
May 15 '06 #50

68 Replies

This discussion thread is closed

Replies have been disabled for this discussion.