472,995 Members | 1,457 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,995 software developers and data experts.

file read, binary or text mode

what is the difference?

if I open a text file in binary (rb) mode, it doesn't matter... the read()
output is the same.

Jul 18 '05 #1
17 10390
Guyon Morée wrote:
what is the difference?

if I open a text file in binary (rb) mode, it doesn't matter... the read()
output is the same.


If you are on Linux that's the case... or under other
conditions. Maybe describing your platform and showing
an example of what you're trying to do would be helpful.

-Peter
Jul 18 '05 #2
On 2004-09-24, Guyon Morée <gumuz@NO_looze_SPAM.net> wrote:
what is the difference?
42?
if I open a text file in binary (rb) mode, it doesn't matter... the read()
output is the same.


OK...

--
Grant Edwards grante Yow! They
at collapsed... like nuns
visi.com in the street... they had
no teenappeal!
Jul 18 '05 #3
"Guyon Morée" <gumuz@NO_looze_SPAM.net> wrote in
news:41**********************@news.nl.uu.net:
what is the difference?

if I open a text file in binary (rb) mode, it doesn't matter... the
read() output is the same.


"rb" and "r" on a text file is the same if your text file have ascii
caractere (8bit) but it's not the same for Unicode caractere (16 bit).
Bref, if you sure that your file is ONLY text, use "r", else, use always
"rb". And "r" don't read the control caractere other that "\n" "\t" .. etc

Jul 18 '05 #4
ok, i have huffman encoding code.

this is actually build for text, but because python can also read a binary
file as a string, this applies equally well :)

but, i was just wondering if this gives any problems if I use text-mode read
for the binary files and vice versa.

If I undertand correctly now, using binary mode is _always_ save, right?
"Peter Hansen" <pe***@engcorp.com> wrote in message
news:Ma********************@powergate.ca...
Guyon Morée wrote:
what is the difference?

if I open a text file in binary (rb) mode, it doesn't matter... the read() output is the same.


If you are on Linux that's the case... or under other
conditions. Maybe describing your platform and showing
an example of what you're trying to do would be helpful.

-Peter

Jul 18 '05 #5
Guyon Morée wrote:
what is the difference?
On Unix/Linux, none.

On Windows, binary mode is just that while text mode translates "\r\n"
(or "\n\r", I always forget) to "\n" on input and vice-versa on output.

I don't know about other platforms.
if I open a text file in binary (rb) mode, it doesn't matter... the read()
output is the same.


Depends on your platform, and the format of the text file (Unix, Windows
or other platform style line endings).

--
"Codito ergo sum"
Roel Schroeven
Jul 18 '05 #6
On 2004-09-24, Guyon Morée <gumuz@NO_looze_SPAM.net> wrote:
ok, i have huffman encoding code.
You should open the file in binary.
this is actually build for text,
All of the Huffman encoding implimentations I've seen output
binary, but I'll take your word for it.
but because python can also
read a binary file as a string, this applies equally well :)
If the file contains printiable text with cr/nl, nl, or cr line
endings, then open it in text mode. Otherwise open it in
binary mode.
but, i was just wondering if this gives any problems if I use
text-mode read for the binary files and vice versa.
Yes, it will give you problems.
If I undertand correctly now, using binary mode is _always_ save, right?


No.

If it's text, open it in text mode. That way the line endings
are handled properly.

--
Grant Edwards grante Yow! I think I'll do BOTH
at if I can get RESIDUALS!!
visi.com
Jul 18 '05 #7
Guyon Morée wrote:
ok, i have huffman encoding code.

this is actually build for text, but because python can also read a binary
file as a string, this applies equally well :)

but, i was just wondering if this gives any problems if I use text-mode read
for the binary files and vice versa.

If I undertand correctly now, using binary mode is _always_ save, right?


You're not helping a whole lot here. What platform are you using?
I'll assume from the headers in your message that it's Windows.
If that's true, then forget about text and binary and ASCII for
a moment, and just consider this.

If you open a file on Windows using "r" or "rt" or the default (which
is "r"), then when you read the file any occurrences of the byte
sequence 13 followed by 10 (that is, CR LF or \r\n or whatever you want
to call it) will be replaced as the file is read by just the 10, or the
LF, or the \n, or whatever you want to call it.

If you use "rb" instead of just "r" or the default, then this
translation will not occur and you will retrieve all bytes in
the file just as they are stored there.

It's up to you to pick the behaviour you need. Saying it's
"huffman encoding code" doesn't really help, since that doesn't
refer to any universal standard representation data. It
seems likely that it's binary (i.e. the translation provided by
not using "rb" is undesirable), but nobody here knows where you
got that file or what it contains.

And in case that doesn't answer the questions above: (1) yes,
it can definitely give problems reading text files as binary
and vice versa, and (2) binary mode applies whenever "b" is
used on Windows, and not otherwise, so if you save a file without
using "wb" you will get the same translation as above but in
the reverse direction (LF or \n gets turned into CR LF or \r\n
on output).

-Peter
Jul 18 '05 #8
Guyon Morée wrote:
ok, i have huffman encoding code.

this is actually build for text, but because python can also read a binary
file as a string, this applies equally well :)

but, i was just wondering if this gives any problems if I use text-mode read
for the binary files and vice versa.

If I undertand correctly now, using binary mode is _always_ save, right?


It's safe in the sense that everything goes out exactly as it came in.
For example, gzip uses binary mode even when compressing text files. The
files may be text, but gzip doesn't care about that. It doesn't care
about words, sentences and line endings, but it does care about
representing exactly the bytes that are in the file.

Editors, diff, wc, ... use text mode.
cp, tar, gzip, ... use binary mode.

--
"Codito ergo sum"
Roel Schroeven
Jul 18 '05 #9

"Askari" <as****@addressNonValide.com> wrote in message
news:Xn**********************************@207.35.1 77.135...
"Guyon Morée" <gumuz@NO_looze_SPAM.net> wrote in
news:41**********************@news.nl.uu.net:

"rb" and "r" on a text file is the same if your text file have ascii
caractere (8bit) but it's not the same for Unicode caractere (16 bit).
Bref, if you sure that your file is ONLY text, use "r", else, use always
"rb". And "r" don't read the control caractere other that "\n" "\t" ..
etc


Newbies, ignore this confusion.

On Windows, text mode autoconverts \r\n to \n on input and viceverse on
output. I believe that that is all the difference. Period.

Terry J. Reedy

Jul 18 '05 #10
"Terry Reedy" <tj*****@udel.edu> writes:

Newbies, ignore this confusion.

On Windows, text mode autoconverts \r\n to \n on input and viceverse on
output. I believe that that is all the difference. Period.


That's not quite the case. As always windows sucks big time:

$ cat bla.py
open("b.txt", "w").write("bla\x1a")
print len(open("b.txt", "rb").read())
open("b.txt", "a+")
print len(open("b.txt", "rb").read())

ralf@CRACK ~
$ python bla.py
4
3
The last character gets stripped if it's 0x1a when opening a file for
appending in text mode. I remember this from a posting on the metakit
mailing list. The poor guy corrupted his databases while he wanted to
check for write access:
http://www.equi4.com/pipermail/metak...er/001497.html

- Ralf

--
brainbot technologies ag
boppstrasse 64 . 55118 mainz . germany
fon +49 6131 211639-1 . fax +49 6131 211639-2
http://brainbot.com/ mailto:ra**@brainbot.com
Jul 18 '05 #11
Ralf Schmitt wrote:
"Terry Reedy" <tj*****@udel.edu> writes:
On Windows, text mode autoconverts \r\n to \n on input and viceverse on
output. I believe that that is all the difference. Period.
That's not quite the case. As always windows sucks big time:

[snip example with ^Z] The last character gets stripped if it's 0x1a when opening a file for
appending in text mode.


Good point. Note for the picky: it doesn't just get stripped... it
*is* the last character, even if there's data following. Or to
be blunt, ^Z (byte value 26) is treated as EOF on Windows when not
using binary mode to read files.

I suspect Terry and others (including I) overlooked this because
^Z is pretty much obsolete, and since few applications *write*
^Z as the last character of text files any more, almost nobody
bothers to remember that text mode is slightly more complicated
than just the CR LF to LF conversion and back.

-Peter
Jul 18 '05 #12
On 2004-09-24, Peter Hansen <pe***@engcorp.com> wrote:
Good point. Note for the picky: it doesn't just get stripped... it
*is* the last character, even if there's data following. Or to
be blunt, ^Z (byte value 26) is treated as EOF on Windows when not
using binary mode to read files.


<history>

That's because CP/M allocated file space in blocks and only
kept track of the length of the file in blocks. It was common
practice to mark the end of the "real" data in a text file with
a ^Z (IIRC, this was done by the application writing to the
file). Otherwise, you had no way of knowing _where_ in that
last block the data actually ended.

The original MS/PC-DOS was basically a CP/M clone.

I presume CP/M copied that behavior from RSX-11 or RT-11, but
that's just an educated guess.

</history>

--
Grant Edwards grante Yow! My mind is making
at ashtrays in Dayton...
visi.com
Jul 18 '05 #13
Terry Reedy wrote:
"Askari" <as****@addressNonValide.com> wrote in message
news:Xn**********************************@207.35.1 77.135...
"Guyon Morée" <gumuz@NO_looze_SPAM.net> wrote in
news:41**********************@news.nl.uu.net:

"rb" and "r" on a text file is the same if your text file have ascii
caractere (8bit) but it's not the same for Unicode caractere (16 bit).
Bref, if you sure that your file is ONLY text, use "r", else, use always
"rb". And "r" don't read the control caractere other that "\n" "\t" ..
etc

Newbies, ignore this confusion.

On Windows, text mode autoconverts \r\n to \n on input and viceverse on
output. I believe that that is all the difference. Period.


It's the main difference, but not the only thing. From the MSDN
documentation on fopen:

"t

Open in text (translated) mode. In this mode, CTRL+Z is interpreted as
an end-of-file character on input. In files opened for reading/writing
with "a+", fopen checks for a CTRL+Z at the end of the file and removes
it, if possible. This is done because using fseek and ftell to move
within a file that ends with a CTRL+Z, may cause fseek to behave
improperly near the end of the file.

Also, in text mode, carriage return–linefeed combinations are translated
into single linefeeds on input, and linefeed characters are translated
to carriage return–linefeed combinations on output. When a Unicode
stream-I/O function operates in text mode (the default), the source or
destination stream is assumed to be a sequence of multibyte characters.
Therefore, the Unicode stream-input functions convert multibyte
characters to wide characters (as if by a call to the mbtowc function).
For the same reason, the Unicode stream-output functions convert wide
characters to multibyte characters (as if by a call to the wctomb
function)."

So there's
- the line endings translation
- the issue of CTRL-Z as end of file that gets stripped (CTRL-Z is
decimal 26 or hex 1a, consistent with Ralf's mail)
- the Unicode issue, which I frankly don't understand

--
"Codito ergo sum"
Roel Schroeven
Jul 18 '05 #14
"Roel Schroeven" <rs****************@fastmail.fm> wrote in message
news:Oj***********************@phobos.telenet-ops.be...
It's safe in the sense that everything goes out exactly as it came in.
For example, gzip uses binary mode even when compressing text files. The
files may be text, but gzip doesn't care about that. It doesn't care
about words, sentences and line endings, but it does care about
representing exactly the bytes that are in the file.


I think the following is the same question from another angle.
I have an .zip archive of compressed files that
I want to decompress. Using the zipfile module,
I tried
z=zipfile.ZipFile(local.zip)
for zname in z.namelist():
localtxtfile='c:/puthere/'+zname
f=open(localtxtfile,'w')
f.write(z.read(zname))
f.close

The original files were all plain text,
created on an unspecified platform.
The files I decompressed this way contained
*two successive* carriage returns
(ASCII 13) at the end of each line.
If I change 'w' to 'wb' I get only one
carriage return at the end of each line.

Why is this extra carriage return added?
My original guess was the using 'w' instead
of 'wb' would be the right action, since the
platform for the original files is unspecified
and the original files are known to be plain text.

Thanks,
Alan Isaac
Jul 18 '05 #15
Alan G Isaac wrote:
"Roel Schroeven" <rs****************@fastmail.fm> wrote in message
news:Oj***********************@phobos.telenet-ops.be...
It's safe in the sense that everything goes out exactly as it came in.
For example, gzip uses binary mode even when compressing text files. The
files may be text, but gzip doesn't care about that. It doesn't care
about words, sentences and line endings, but it does care about
representing exactly the bytes that are in the file.
I think the following is the same question from another angle.


I think you should consider the same answer from this angle. ;)
I have an .zip archive of compressed files that
I want to decompress. Using the zipfile module,
I tried
z=zipfile.ZipFile(local.zip)
for zname in z.namelist():
localtxtfile='c:/puthere/'+zname
f=open(localtxtfile,'w')
f.write(z.read(zname))
f.close

The original files were all plain text,
created on an unspecified platform.
Are you sure the platform is unspecified? You can find out the platform
by doing zipfile.getinfo(zname).create_system and then *yuck* looking up
the ID number you get against the list in
<http://www.pkware.com/company/standards/appnote/>.
The files I decompressed this way contained
*two successive* carriage returns
(ASCII 13) at the end of each line.
If I change 'w' to 'wb' I get only one
carriage return at the end of each line.

Why is this extra carriage return added?


I imagine the file in the archive was created on a DOS-type system,
where the line ending is \r\n. That's what you read in. When you write
it out in "w" mode the \n is expanded to \r\n without checking to see if
there is already a \r beforehand. So you get \r\r\n.

Essentially you should consider the archive file to be read in "rb"
mode. Writing in "w" mode instead of "wb" mode will give you extra
carriage returns.

If you want to be able to get "universal newline" input from your
zipfile, consider piping input through this generator and using "w" mode:

http://aspn.activestate.com/ASPN/Coo.../Recipe/286165

Then you should get the correct line ending for a text file without
regard to the current platform or the one where the archive was created.
--
Michael Hoffman
Jul 18 '05 #16
"Alan G Isaac" <ai****@american.edu> wrote:

I think the following is the same question from another angle.
I have an .zip archive of compressed files that
I want to decompress. Using the zipfile module,
I tried
z=zipfile.ZipFile(local.zip)
for zname in z.namelist():
localtxtfile='c:/puthere/'+zname
f=open(localtxtfile,'w')
f.write(z.read(zname))
f.close

The original files were all plain text,
created on an unspecified platform.
Not true. They were in plain text, created on a DOS/Windows platform.
The files I decompressed this way contained
*two successive* carriage returns
(ASCII 13) at the end of each line.
If I change 'w' to 'wb' I get only one
carriage return at the end of each line.

Why is this extra carriage return added?
Because the original file inside the zip file contained \r\n. z.read
returns you those exact bytes. When you write "\r\n" to a text file in
Windows, the \r is written as \r, and the \n is written as \r\n. This, you
end up with \r\r\n.
My original guess was the using 'w' instead
of 'wb' would be the right action, since the
platform for the original files is unspecified
and the original files are known to be plain text.


No. If you do not know what your buffer contains, you should always use
'wb' so that those contents are not altered.

That's the real lesson: when you write using 'w' or 'wt', the buffer is
changed on the way out. You only want that if you know exactly what you
are writing.
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Jul 18 '05 #17

"Michael Hoffman" <m.*********************************@example.com > wrote in
message news:cj**********@pegasus.csx.cam.ac.uk...
I imagine the file in the archive was created on a DOS-type system,
where the line ending is \r\n. That's what you read in. When you write
it out in "w" mode the \n is expanded to \r\n without checking to see if
there is already a \r beforehand. So you get \r\r\n.
Thanks; that addresses my basic misconception about writing in textmode.
I had thought that writing in textmode produced a platform specific
conversion of the text written, but I now understand that this only affects
how \n is written.
If you want to be able to get "universal newline" input from your
zipfile, consider piping input through this generator and using "w" mode:
http://aspn.activestate.com/ASPN/Coo.../Recipe/286165


Very helpful.

Thanks,
Alan Isaac
Jul 18 '05 #18

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: wtnt | last post by:
Hello. I've searched all over and haven't seen another thread with this problem. Please bear with me as I try to explain. thanks. :) I have some programs that need to be cross-platform...
3
by: Tron Thomas | last post by:
What does binary mode for an ofstream object do anyway? Despite which mode the stream uses, operator << writes numeric value as their ASCII representation. I read on the Internet that it is...
49
by: Sam | last post by:
Hi all, Is there a function in the standard library that can get the size of a file? Thank you very much. Sam.
35
by: munish.nr | last post by:
Hi All, I want to know the size of file (txt,img or any other file). i knoe only file name. how i can acheive this. does anybody is having idea about that. plz help. rgrds, Munish Nayyar
8
by: siliconwafer | last post by:
Hi All, If I open a binary file in text mode and use text functions to read it then will I be reading numbers as characters or actual values? What if I open a text file and read it using binary...
68
by: vim | last post by:
hello everybody Plz tell the differance between binary file and ascii file............... Thanks in advance vim
3
by: nicolasg | last post by:
Hi, I'm trying to open a file (any file) in binary mode and save it inside a new text file. After that I want to read the source from the text file and save it back to the disk with its...
14
by: prasadjoshi124 | last post by:
Hi All, I am writing a small tool which is supposed to fill the filesystem to a specified percent. For, that I need to read how much the file system is full in percent, like the output given...
3
by: masood.iqbal | last post by:
Hi, Kindly excuse my novice question. In all the literature on ifstream that I have seen, nowhere have I read what happens if you try to read a binary file using the ">>" operator. I ran into...
18
by: MisterE | last post by:
I hear that this isn't always valid: FILE *in; long size; in = fopen("foo.bar","rb"); fseek(in,0,SEEK_END); size = ftell(in); fseek(in,0,SEEK_SET); then fread size many bytes into memory.
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
4
by: GKJR | last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...
3
SueHopson
by: SueHopson | last post by:
Hi All, I'm trying to create a single code (run off a button that calls the Private Sub) for our parts list report that will allow the user to filter by either/both PartVendor and PartType. On...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.