By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,027 Members | 1,998 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,027 IT Pros & Developers. It's quick & easy.

how to tell text from binary file

P: n/a
How can a binary file be distinguished from a text file on Windows?

Obviously I want a way that is more sophisicated that just looking at the
dot extention in the filename.

I want to write code that processes all text files in a directory but leaves
binary files alone.

--
http://www.florencesoft.com
Aug 28 '06 #1
Share this Question
Share on Google+
4 Replies


P: n/a

Florence wrote:
How can a binary file be distinguished from a text file on Windows?

well.. ALL FILES.. including TEXT files are stored as binary on the
computer , you can read both types , both ways ... if u read a text
file binararily.. ..you'll get ASCII values in binary..similarly u
could get some insane text .. if u read a binary file in text mode..
..EG. this is what i get if i open a .ICO (binary) file in notepad(which
obvuisly reads in "text mode"):

JFIF    C 
 %# , #&')*)-0-(0%()( C

if you check the ascii values for these letters many of them will hav
ascii values ABOVE 130 and below 32 (space) ...because of course this
is NOT TEXT!.. soo... make a little function.. that would read a file
IN TEXT MODE .. and check the asciii values of the first 150 - 200
characters and if most or even some of them have wierd ascii values
like 1 - 30 , or 130+ .. then its a binary.....cuz text files are
MOSTLY not going to hav these values.. . look up some chart for all the
values..

That was the best i could come up with =)

Good Luck!

Gideon

Aug 29 '06 #2

P: n/a

Florence wrote:
How can a binary file be distinguished from a text file on Windows?

well.. ALL FILES.. including TEXT files are stored as binary on the
computer , you can read both types , both ways ... if u read a text
file binararily.. ..you'll get ASCII values in binary..similarly u
could get some insane text .. if u read a binary file in text mode..
..EG. this is what i get if i open a .ICO (binary) file in notepad(which
obvuisly reads in "text mode"):

JFIF    C 
 %# , #&')*)-0-(0%()( C

if you check the ascii values for these letters many of them will hav
ascii values ABOVE 130 and below 32 (space) ...because of course this
is NOT TEXT!.. soo... make a little function.. that would read a file
IN TEXT MODE .. and check the asciii values of the first 150 - 200
characters and if most or even some of them have wierd ascii values
like 1 - 30 , or 130+ .. then its a binary.....cuz text files are
MOSTLY not going to hav these values.. . look up some chart for all the
values..

That was the best i could come up with =)

Good Luck!

Gideon

Aug 29 '06 #3

P: n/a

Florence wrote:
How can a binary file be distinguished from a text file on Windows?

well.. ALL FILES.. including TEXT files are stored as binary on the
computer , you can read both types , both ways ... if u read a text
file binararily.. ..you'll get ASCII values in binary..similarly u
could get some insane text .. if u read a binary file in text mode..
..EG. this is what i get if i open a .ICO (binary) file in notepad(which
obvuisly reads in "text mode"):

JFIF    C 
 %# , #&')*)-0-(0%()( C

if you check the ascii values for these letters many of them will hav
ascii values ABOVE 130 and below 32 (space) ...because of course this
is NOT TEXT!.. soo... make a little function.. that would read a file
IN TEXT MODE .. and check the asciii values of the first 150 - 200
characters and if most or even some of them have wierd ascii values
like 1 - 30 , or 130+ .. then its a binary.....cuz text files are
MOSTLY not going to hav these values.. . look up some chart for all the
values..

That was the best i could come up with =)

Good Luck!

Gideon

Aug 29 '06 #4

P: n/a
You may wish to anticipate chars 9, 10 and 13 in the text (tab, line
feed and carriage return); ASCII doesn't define chars over 127, but
this does happen for many code-pages/encodings - it is still "text"
though. Ditto unicode etc.

Marc

Aug 29 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.