By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,222 Members | 2,478 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,222 IT Pros & Developers. It's quick & easy.

Reading from text files

P: n/a
In the course of playing around with file input and output, I came
across some behavior that is not quite intuitive. I created a simple
text file, test.txt, which contains only 3 lines, and which I expect
will have 5 characters (the digits 1, 2, and 3, and two newline
characters, the first after 1 and the second after 2). Here it is in
all its glory:
1
2
3

However, when I read it using open()and then view it using
file.seek(0); file.read(); file.tell() I get:
'1\n2\n3'
7L

Python thinks there are 7 characters in the file! If I type file.seek(1); file.read() OR >>>file.seek(2); file.read() I get
'\n2\n3'

but file.seek(3); file.read()

gives me what I expected to get with file.seek(2); file.read()
'2\n3'

It appears that Python sometimes counts each of the newline escape
sequences as 2 separate characters and at other times as 1 indivisible
character. What is the appropriate way to think about these
characters?

Thomas Philips
Jul 18 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
There are three solutions to this problem:
1. Don't use Windows
2. Only use offsets with file.seek() that were returned by file.tell()
3. Open the file in binary mode

Windows stores "\n" as a two-byte sequence in text files when written,
and then transforms the two-byte sequence into "\n" when reading, for
files opened as text files.

file.seek() on Windows only knows about raw byte offsets, though, so
if you know the first line of a file is "a\n", you can't seek to 2 to
get to the second line, because that line actually starts at byte 3
(The value .tell() would return after you read the first line)
Jeff

Jul 18 '05 #2

P: n/a

"Thomas Philips" <tk****@hotmail.com> wrote in message
news:b4**************************@posting.google.c om...
In the course of playing around with file input and output, I came
across some behavior that is not quite intuitive. I created a simple
text file, test.txt, which contains only 3 lines, and which I expect
will have 5 characters (the digits 1, 2, and 3, and two newline
characters, the first after 1 and the second after 2). Here it is in
all its glory:
1
2
3

However, when I read it using open()and then view it using
file.seek(0); file.read(); file.tell() I get:
'1\n2\n3'
7L

Python thinks there are 7 characters in the file! If I type file.seek(1); file.read() OR >>>file.seek(2); file.read() I get
'\n2\n3'

but file.seek(3); file.read()

gives me what I expected to get with file.seek(2); file.read()
'2\n3'

It appears that Python sometimes counts each of the newline escape
sequences as 2 separate characters and at other times as 1 indivisible
character. What is the appropriate way to think about these
characters?

Thomas Philips


If you want to actually "see" what is in the file do a directory listing and
dump the file in hex.

On DOS/Windows do a 'dir test.txt' command and inspect the size of the file.
Then, do a 'debug test.txt' command. At the prompt, enter the 'r' command
and press enter. Examine the CX register. It will have the same value as
the size of the file. Then do a 'd' command to dump the bytes out and you
can see exactly what is in the file.

On UNIX/Linux use 'ls -l test.txt' to see the directory listing containing
the size of the file. Use something like 'od -Ax -x test.txt' to see the
contents of the file. If that command does not produce something you like,
use 'man od' to find the parameters with which you are more comfortable.
Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.