473,756 Members | 5,660 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Binary or Ascii Text?

Hi, everyone. I got a question. How can I identify whether a file is a
binary file or an ascii text file? For instance, I wrote a piece of
code and saved as "Test.c". I knew it was an ascii text file. Then
after compilation, I got a "Test" file and it was a binary executable
file. The problem is, I know the type of those two files in my mind
because I executed the process of compilation, but how can I make the
computer know the type of a given file by writing code in C? Files are
all save as 0's and 1's. What's the difference?

Please help me, thanks.

Mar 31 '06
31 3219
"P.J. Plauger" wrote:
"CBFalconer " <cb********@yah oo.com> wrote in message

.... snip ...

In this case you are being unfair to Microsoft (yes, I know it's
hard to do). C is the offbeat animal here. Text lines were
terminated with cr/lf for many moons before C decided to ignore the
cr,


You mean, before Unix developed a uniform notation for text streams,
both inside and outside the program, and C built it into its runtime
library.


Pascal is pretty well contemporaneous with C and Unix, and had/has
a well defined concept of files and streams. It doesn't make any
assumptions about line termination characters etc. The world is
not a Unix machine.

--
"If you want to post a followup via groups.google.c om, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell. org/google/>
Also see <http://www.safalra.com/special/googlegroupsrep ly/>
Apr 1 '06 #21
"P.J. Plauger" <pj*@dinkumware .com> writes:
"osmium" <r1********@com cast.net> wrote in message
news:49******** ****@individual .net...
"P.J. Plauger" writes:
So it was a concession to ASCII.

Not really.


I realized that back space might enter into that too, but thought there
might have been problems with that considering the physical nature of
actual drum printers, chain printers and so on.


Dunno how CR would be any better off than BS if that was the case.


I had a dot-matrix printer once (Okidata ML520) that would
overheat and stop (until it cooled down) if you sent it too much
text that contained lots of backspaces to do
character-by-character bold or underline. That kind of thing
made the printhead go back and forth incredibly rapidly, and it
just wasn't designed for that.

On the other hand, using CR didn't cause a problem because it
didn't make the printhead reverse direction any more often than
normal.
--
Ben Pfaff
email: bl*@cs.stanford .edu
web: http://benpfaff.org
Apr 1 '06 #22
"CBFalconer " <cb********@yah oo.com> wrote in message
news:44******** *******@yahoo.c om...
"P.J. Plauger" wrote:
"CBFalconer " <cb********@yah oo.com> wrote in message
... snip ...

In this case you are being unfair to Microsoft (yes, I know it's
hard to do). C is the offbeat animal here. Text lines were
terminated with cr/lf for many moons before C decided to ignore the
cr,


You mean, before Unix developed a uniform notation for text streams,
both inside and outside the program, and C built it into its runtime
library.


Pascal is pretty well contemporaneous with C and Unix, and had/has
a well defined concept of files and streams. It doesn't make any
assumptions about line termination characters etc.


Right, and it's a damn poor model, with terrible lookahead properties.
Kernighan and I had to really work at imposing decent primitives atop
it. It is no accident that the model hasn't survived.
The world is
not a Unix machine.


Actually, it is. Compare the operating systems of today with those
of 35 years ago and you'll see how ubiquitous the basic design
decisions of Unix have become. Line terminators are at least now
always embedded characters in a stream -- gone are padding blanks
and structured files -- if not always the same terminators. And
C is certainly ubiquitous, with its simple rules for mapping
C-style text streams to and from text files.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Apr 2 '06 #23
"Ben Pfaff" <bl*@cs.stanfor d.edu> wrote in message
news:87******** ****@benpfaff.o rg...
"P.J. Plauger" <pj*@dinkumware .com> writes:
"osmium" <r1********@com cast.net> wrote in message
news:49******** ****@individual .net...
"P.J. Plauger" writes:

> So it was a concession to ASCII.

Not really.

I realized that back space might enter into that too, but thought there
might have been problems with that considering the physical nature of
actual drum printers, chain printers and so on.


Dunno how CR would be any better off than BS if that was the case.


I had a dot-matrix printer once (Okidata ML520) that would
overheat and stop (until it cooled down) if you sent it too much
text that contained lots of backspaces to do
character-by-character bold or underline. That kind of thing
made the printhead go back and forth incredibly rapidly, and it
just wasn't designed for that.

On the other hand, using CR didn't cause a problem because it
didn't make the printhead reverse direction any more often than
normal.


Okay, you've made a case for why a good printer *driver* might
rewrite the stream you send it (as practically every smart device
did in Unix and does in today's systems). The issue we've been
discussing is the *linguistics* of text streams. And the point
was that either CR or BS is sufficient to describe overstrikes.
ASCII doesn't have any thermal attributes.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Apr 2 '06 #24
"P.J. Plauger" wrote:
"CBFalconer " <cb********@yah oo.com> wrote in message
"P.J. Plauger" wrote:
"CBFalconer " <cb********@yah oo.com> wrote in message
... snip ...

In this case you are being unfair to Microsoft (yes, I know it's
hard to do). C is the offbeat animal here. Text lines were
terminated with cr/lf for many moons before C decided to ignore the
cr,

You mean, before Unix developed a uniform notation for text streams,
both inside and outside the program, and C built it into its runtime
library.


Pascal is pretty well contemporaneous with C and Unix, and had/has
a well defined concept of files and streams. It doesn't make any
assumptions about line termination characters etc.


Right, and it's a damn poor model, with terrible lookahead properties.
Kernighan and I had to really work at imposing decent primitives atop
it. It is no accident that the model hasn't survived.


I probably should't get into this :-) but people have been
misunderstandin g Pascal i/o for generations now. With the use of
lazy i/o there is no problem with interactive operation, and
prompting can be handled with a prompt function (equivalent to
writeln, but without the line advance) or by detection of
interactive pairs to force buffer flushing.

Meanwhile there are none of the problems associated with
interactive scanf and other routines, because the C stream is never
sure whether the field terminating char has been used or is still
in the stream. With Pascal, it is in the stream. With Pascal, we
always have one char. lookahead.

Granted, we can build the equivalent set in C, but that requires
the discipline to not use many existing functions, or to follow
them with an almost universal ungetc. What we can't get is the
convenience of the shorthand usage of read(ln) and write(ln),
although the C++ mechanisms make an ugly attempt at it.
The world is
not a Unix machine.


Actually, it is. Compare the operating systems of today with those
of 35 years ago and you'll see how ubiquitous the basic design
decisions of Unix have become. Line terminators are at least now
always embedded characters in a stream -- gone are padding blanks
and structured files -- if not always the same terminators. And
C is certainly ubiquitous, with its simple rules for mapping
C-style text streams to and from text files.


Granted the Unix philosophy has simplified file systems. This is
not necessarily good, since the old systems all had reasons for
existing. Many of those reasons have been subsumed into much
higher performance levels at the storage level, but that is
something like approving of gui bloat because cpus are faster.

--
"If you want to post a followup via groups.google.c om, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell. org/google/>
Also see <http://www.safalra.com/special/googlegroupsrep ly/>
Apr 2 '06 #25
"CBFalconer " <cb********@yah oo.com> wrote in message
news:44******** *******@yahoo.c om...
"P.J. Plauger" wrote:
"CBFalconer " <cb********@yah oo.com> wrote in message
"P.J. Plauger" wrote:
"CBFalconer " <cb********@yah oo.com> wrote in message

... snip ...
>
> In this case you are being unfair to Microsoft (yes, I know it's
> hard to do). C is the offbeat animal here. Text lines were
> terminated with cr/lf for many moons before C decided to ignore the
> cr,

You mean, before Unix developed a uniform notation for text streams,
both inside and outside the program, and C built it into its runtime
library.

Pascal is pretty well contemporaneous with C and Unix, and had/has
a well defined concept of files and streams. It doesn't make any
assumptions about line termination characters etc.
Right, and it's a damn poor model, with terrible lookahead properties.
Kernighan and I had to really work at imposing decent primitives atop
it. It is no accident that the model hasn't survived.


I probably should't get into this :-) but


You're probably right.
people have been
misunderstandin g Pascal i/o for generations now.
That may be, but I don't. I've written tens of thousands of lines
of Pascal and hundreds of thousands of lines of C over the past
few decades. I've written essays on the various design principles
of parsing with various degrees of lookahead. I've written or
coauthored textbooks on the subject. In short, I've *thought*
about this topic for longer than the average reader of this
newsgroup has been alive. I think I understand it.
With the use of
lazy i/o there is no problem with interactive operation, and
prompting can be handled with a prompt function (equivalent to
writeln, but without the line advance) or by detection of
interactive pairs to force buffer flushing.
Yes, you can get around the problems. The only problem is that you
*have* to get around the problems.
Meanwhile there are none of the problems associated with
interactive scanf and other routines, because the C stream is never
sure whether the field terminating char has been used or is still
in the stream.
Not true. It's precisely, and usefully, defined.
With Pascal, it is in the stream.
Not always true.
With Pascal, we
always have one char. lookahead.
And with C. You never need more than one char lookahead, by design.
Granted, we can build the equivalent set in C, but that requires
the discipline to not use many existing functions, or to follow
them with an almost universal ungetc. What we can't get is the
convenience of the shorthand usage of read(ln) and write(ln),
although the C++ mechanisms make an ugly attempt at it.
I agree that, beyond a point, this becomes a matter of aesthetics.
I won't argue that. What I will observe is natural selection at
work. The C I/O model has survived and thrives. The Pascal model
is marginalized if not dead.
The world is
not a Unix machine.


Actually, it is. Compare the operating systems of today with those
of 35 years ago and you'll see how ubiquitous the basic design
decisions of Unix have become. Line terminators are at least now
always embedded characters in a stream -- gone are padding blanks
and structured files -- if not always the same terminators. And
C is certainly ubiquitous, with its simple rules for mapping
C-style text streams to and from text files.


Granted the Unix philosophy has simplified file systems. This is
not necessarily good, since the old systems all had reasons for
existing.


Yes, they did. Lots of them. In all sorts of directions. And they
haven't survived. Coincidence? I don't think so.
Many of those reasons have been subsumed into much
higher performance levels at the storage level, but that is
something like approving of gui bloat because cpus are faster.


No, it's something like adapting the total software package to the
needs of current hardware. I see no overall bloat in how buffering
is distributed today vs. 30 years ago. But I do see a significant
simplification of I/O as seen by the user over that same period.

Item: One of the seven looseleaf binders that came with RSX-11M was
titled "Preparing for I/O." There is no Unix equivalent. (Or DOS,
or Linux, or ...) You don't set up file control blocks and I/O
control blocks; you just call open, close, read, and write.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Apr 2 '06 #26
In article <49************ @individual.net > "osmium" <r1********@com cast.net> writes:
....
Now that we know ascii text only use 7 bits of a byte and the first bit
is always set as 0. So I wonder if I could write a program to get a
fixed length of a given file(for example, the first 1024 bytes) , to
store them in a unsigned char array and to check if there is any
elements greater than 0x7F. If any, the file can be judged as a binary
file.

However, the disadvantage of the above method is that it cannot handle
the multi-byte character. Take the UTF-8's japanese character for
example, a japanese character may be encoded as three bytes and some of
them may be greater than 0x7F? In that case, my method will make no
sense.


No, it cannot handle other encodings, but that was not what you asked for.
Note that also files that consist of pure ASCII codes can be binary.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Apr 3 '06 #27
"P.J. Plauger" wrote:
"CBFalconer " <cb********@yah oo.com> wrote in message
.... snip ...
Meanwhile there are none of the problems associated with
interactive scanf and other routines, because the C stream is never
sure whether the field terminating char has been used or is still
in the stream.


Not true. It's precisely, and usefully, defined.
With Pascal, it is in the stream.


Not always true.
With Pascal, we
always have one char. lookahead.


And with C. You never need more than one char lookahead, by design.


Now we can get this off a language war and onto pure C. My problem
with C, and the usual library, is the absence of sane and clear
methods for interactive i/o. To illustrate, the user wants to
output a prompt and receive an integer. How to do it?

The new users first choice is probably scanf. He forgets to check
the error return. And, even worse, what gets entered is:

<programmed prompt>: 1234 x<cr>

and this is being handled by:

printf("<progra mmed prompt>:"); fflush(stdout);
scanf("%d", &i);

which gets the first entry, but falls all over itself when the
sequence is called again. The usual advice is to read full lines,
i.e."

printf("<progra mmed prompt>:"); fflush(stdout);
fgets(buffer, BUFSZ, stdin);
i = strtol(buffer, &errptr, 10);

which brings in the extraneous buffer, a magical BUFSZ derived by
gazing at the ceiling, prayer, and incense sticks, not to mention
errptr. So I consider that solution unclean by my standards. (Of
course they can use my ggets for consistent whole line treatement).

So instead we write a baby routine that inputs from a stream with
getc, skips leading blanks (and possibly blank lines), and ungets
the field termination char. We combine that with my favorite
flushln:

while ((EOF != (ch = getc(f)) && ('\n' != ch)) continue;

and the birds twitter, the sun shines, etc. UNTIL somebody calls
some other input routine and doesn't have the discipline to define
a quiescent i/o state and ensure that that state is achieved at
each input. That in turn leads to calling the flushln twice, and
discarding perfectly usable (and possibly needed) input.
Alternatively it leads to newbies calling fflush(stdin) and similar
curses.

This is what I mean by saying the C doesn't provide the one char
lookahead in the right place, i.e. the i/o, where it can't be lost.

It would help if C provided the ability to detect "the last
character used was a '\n'", which would enable the above flushln to
avoid discarding extra lines. However that won't happen. It would
probably also suffice to define fflush usage on input streams.

--
"If you want to post a followup via groups.google.c om, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell. org/google/>
Also see <http://www.safalra.com/special/googlegroupsrep ly/>
Apr 3 '06 #28
On Sat, 1 Apr 2006 16:00:58 UTC, "osmium" <r1********@com cast.net>
wrote:
"P.J. Plauger" writes:
So it was a concession to ASCII.


Not really.


I realized that back space might enter into that too, but thought there
might have been problems with that considering the physical nature of actual
drum printers, chain printers and so on.

So are you saying that the initial release of the ASCII standard said that
LF was to do line feed AND carriage return? What was the point then, of
having them as separate codes? Unfortunately I don't have the text that
goes with my pre-historic ASCII chart, only a single page showing the
glyphs.


In late 60th and early 70the there was no device available today known
as screen. Tere were line printers, punch card and paper reader and
writers, and TTY devices combining keyboard, puch paper reader and
writer and a character printer. That printer was able to use singe
control chars like
- cr - caridge return - point print unit back to column 1
- lf - linefeed - feed paper to next line
- ff - formfeed - feed paper to next page stop on
the control ribbon
- backspace - one fixed character position back on same
line
- backline - page one line back

Some of these devices werde dumb enogh to get the next character
printed even before the device was able to reach character position 1.
So to get a clean printout you had to do cr before lf to hold the
device until lf was done.

Anyway to get a new line you must give out lf or the prit head would
put the char on the position it was at the time it got the order to
print it.

On mainframes the TTY used was mainly configured to make a cr even
when it got an lf to optimise the programs and save one character in
text (memory was bare and expensive even as the was able to
multitask). The upcoming microprocessors (mostenly homebrowed by
highly different manufacturers were limited in multitasking on the
different hardware levels (mostenly 16) the CPU was able to control
and designed more primitive. They required even more dumb TTY or more
intelligent customer builded I/O devices.

At the time C was created there was a typica computer either a
mainframe with
- a lot of punch card readers as program input
- a lot of magnetic tape devises as data store
- 1 or more punch card writer(s)
- some paper tape readers and writers
- one or more line printers (the first music devices :-)
for developers)
- later then a high number of removeable hard disk
- 1 TTY as operator console

No wouder that the C runtime is not created to handle user input well
but ideal for handling computer designed input like punch cards.

The upcoming microprocessors were designed to control mashines, having
only
- special devices to control mashines
- paper tape punchers and readers
- magnetic tape writers
- seldom line printers
- TTY as operator console.

Ages later they got moved into bureaus and other kinds of special
devices and TTY like devices as user input/output devices.

Modern GUIs are properitary anyway and does not use the C runtime for
user oriented I/O anyway.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
Apr 9 '06 #29
On Sat, 1 Apr 2006 07:33:13 -0800, "osmium" <r1********@com cast.net>
wrote:
"Joe Wright" writes:
Text mode implemented in C is a concession to Microsoft.
I hate Microsoft too. But that is not the case.

The ASCII code was designed to allow a second pass at printing to produce
some of the accents used with the latin alphabet. Early copies of ASCII
show both the circumflex and tilde in the superior position to make this


Are you sure? The far-dominant early ASCII (64-graphic = uppercase
only) devices, Teletype 33 and 35, had uparrow and backarrow. The
earliest revision of the standard document I looked at, IIRC 1968 or
so, added tilde along with lowercase and described circumflex and
underscore as changed precisely so they could be used as modifiers. It
also gave NL as an acceptable alternate meaning of x0A but not the
primary one. (There was the same ambiguity over whether VT and FF
included CR or not, but those were already less important then, and
now have nearly vanished.) And of course ASCII was originally intended
and used only as an "American" (meaning US) standard.
work. And also, to make it work, a line feed had to have no side effects,
such as advancing the medium. I believe the ASCII code has been jiggered
with to redefine CR and LF since the original specification, but I have no
actual proof.

So it was a concession to ASCII.

I would say to ASCII as commonly used, _and_ to other non-Unix and
record-oriented filesystems still pretty important in the 1980s.

- David.Thompson1 at worldnet.att.ne t
Apr 16 '06 #30

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
9123
by: J. Campbell | last post by:
OK...I'm in the process of learning C++. In my old (non-portable) programming days, I made use of binary files a lot...not worrying about endian issues. I'm starting to understand why C++ makes it difficult to read/write an integer directly as a bit-stream to a file. However, I'm at a bit of a loss for how to do the following. So as not to obfuscate the issue, I won't show what I've been attempting ;-) What I want to do is the...
12
7837
by: Sunner Sun | last post by:
Hi, all Since the OS look both ASCII and binary file as a sequence of bytes, is there any way to determine the file type except to judge the extension? Thank you!
13
3554
by: greg | last post by:
Hello, I'm searching to know if a local file is ascii or binary. I couldn't find it in the manual, is there a way to know that ? thanks, -- greg
10
3664
by: joelagnel | last post by:
hi friends, i've been having this confusion for about a year, i want to know the exact difference between text and binary files. using the fwrite function in c, i wrote 2 bytes of integers in binary mode. according to me, notepad opens files and each byte of the file read, it converts that byte from ascii to its correct character and displays
7
7028
by: smith4894 | last post by:
Hello all, I'm working on writing my own streambuf classes (to use in my custom ostream/isteam classes that will handle reading/writing data to a mmap'd file). When reading from the mmap file, I essentially have a char buffer in my streambuf class, that I'm registering with setp(). on an overflow() call, I simply copy the contents of the buffer into the mmap'd file via memcpy().
4
9558
by: Florence | last post by:
How can a binary file be distinguished from a text file on Windows? Obviously I want a way that is more sophisicated that just looking at the dot extention in the filename. I want to write code that processes all text files in a directory but leaves binary files alone. -- http://www.florencesoft.com
11
4503
by: raghu | last post by:
how do i convert a text entered through keyboard into a binary format? Should I first convert each letter of the text to ASCII and then binary??? Is this method correct? Please advise. Thanks a lot. Regards, Raghu
5
2915
by: bwv539 | last post by:
I have to output data into a binary file, that will contain data coming from a four channel measurement instrument. Since those data have to be read from another C program somewhere else, the reading program must know how many channels have been acquired, date, time, and so on. I mean that the position of each datum is not fixed in the file but depends on the conditions when acquired. That is, I need something like a header in the file to...
3
3500
by: logaelo | last post by:
Hello all, Could anyone explain how to optimization this code? In the prosess of optimization what is the factor needed and important to know about it? Thank you very much for all. /********************************************************/ /* Binary converter */ /* By Matt Fowler */ /* email address removed */ /* converts text into...
5
11279
by: dm3281 | last post by:
Hello, I have a text report from a mainframe that I need to parse. The report has about a 2580 byte header that contains binary information (garbage for the most part); although there are a couple areas that have ASCII text that I need to extract. At the end of the 2580 bytes, I can read the report like a standard text file. It should have CR/LF at the end of each line. What is the best way for me to read this report using C#. It is...
0
9487
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9297
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10069
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9884
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8736
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6556
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
3828
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3395
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2697
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.