468,733 Members | 1,677 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,733 developers. It's quick & easy.

Reading a text file backwards

Jay
I have a very large text file (being read by a CGI script on a web server),
and I get memory errors when I try to read the whole file into a list of
strings. The problem is, I want to read the file backwards, starting with
the last line.

Previously, I did:

myfile = open('myfile.txt', 'r')
mylines = myfile.readlines()
myfile.close()
for line in range(len(mylines)-1, -1, -1):
# do something with mylines[line]

This, however caused a "MemoryError," so I want to do something like

myfile = open('myfile.txt', 'r')
for line in myfile:
# do something with line
myfile.close()

Only, I want to iterate backwards, starting with the last line of the file.
Can anybody suggest a simple way of doing this? Do I need to jump around
with myfile.seek() and use myfile.readline() ?


Jul 18 '05 #1
7 16591
Jay,

Try this:

myfile = open('myfile.txt', 'r')
mylines = myfile.readlines()
myfile.close()
mylines.reverse()

Rick

Jay wrote:
I have a very large text file (being read by a CGI script on a web
server), and I get memory errors when I try to read the whole file into a
list of strings. The problem is, I want to read the file backwards,
starting with the last line.

Previously, I did:

myfile = open('myfile.txt', 'r')
mylines = myfile.readlines()
myfile.close()
for line in range(len(mylines)-1, -1, -1):
# do something with mylines[line]

This, however caused a "MemoryError," so I want to do something like

myfile = open('myfile.txt', 'r')
for line in myfile:
# do something with line
myfile.close()

Only, I want to iterate backwards, starting with the last line of the
file. Can anybody suggest a simple way of doing this? Do I need to jump
around with myfile.seek() and use myfile.readline() ?


Jul 18 '05 #2
Jay wrote:
Only, I want to iterate backwards, starting with the last line of the file.
Can anybody suggest a simple way of doing this? Do I need to jump around
with myfile.seek() and use myfile.readline() ?


Python Cookbook has a recipe. Or two.

http://aspn.activestate.com/ASPN/Coo.../Recipe/276149
http://aspn.activestate.com/ASPN/Coo.../Recipe/120686

I've not looked at them to judge the quality

Another approach is to read the lines forwards and save
the starting line position. Then iterate backwards
through the positions, seek to it and read a line.

def find_offsets(infile):
offsets = []
offset = 0
for line in infile:
offsets.append(offset)
offset += len(line)
return offsets

def iter_backwards(infile):
# make sure it's seekable and at the start
infile.seek(0)
offsets = find_offsets(infile)
for offset in offsets[::-1]:
infile.seek(offset)
yield infile.readline()

for line in iter_backwards(open("spam.py")):
print repr(line)

This won't work on MS Windows because of the
'\r\n' -> '\n' conversion. You would instead
need something like

def find_offsets(infile):
offsets = []
while 1:
offset = infile.tell()
if not infile.readline():
break
offsets.append(offset)
return offsets
Just submitted this solution to the cookbook.

Andrew
da***@dalkescientific.com
Jul 18 '05 #3
Rick Holbert <ho******@dma.org> wrote:
: Jay,

: Try this:

: myfile = open('myfile.txt', 'r')
: mylines = myfile.readlines()
: myfile.close()
: mylines.reverse()
Hi Rick,

But this probably won't work for Jay: he's running into memory issues
because the file's too large to hold in memory at once. The point is
to avoid readlines().

Here's a generator that tries to iterate backwards across a file. We
first get the file positions of each newline, and then afterwards
start going through the offsets.

###

def backfileiter(myfile):
"""Iterates the lines of a file, but in reverse order."""
myfile.seek(0)
offsets = _getLineOffsets(myfile)
myfile.seek(0)
offsets.reverse()
for i in offsets:
myfile.seek(i+1)
yield myfile.readline()

def _getLineOffsets(myfile):
"""Return a list of offsets where newlines are located."""
offsets = [-1]
i = 0
while True:
byte = myfile.read(1)
if not byte:
break
elif byte == '\n':
offsets.append(i)
i += 1
return offsets
###

For example:

###
from StringIO import StringIO
f = StringIO(""" .... hello world
.... this
.... is a
.... test""")
f.seek(0)
for line in backfileiter(f): print repr(line)

....
'test'
'is a\n'
'this\n'
'hello world\n'
'\n'
###
Hope this helps!
Jul 18 '05 #4
It's just shifting the burden perhaps, but if you're on a Unix system
you should be able to use tac(1) to reverse your file a bit faster:

import os
for line in os.popen('tac myfile.txt'):
#do something with the line

Jul 18 '05 #5
Graham Fawcett wrote:
It's just shifting the burden perhaps, but if you're on a Unix system
you should be able to use tac(1) to reverse your file a bit faster:


Huh. Hadn't heard of that one. It's not installed
on my OS X box. It's on my FreeBSD account as gtac.
Ah, but it is available on a Linux account.

Andrew
da***@dalkescientific.com
Jul 18 '05 #6
On Thu, 30 Sep 2004 17:41:14 -0700, Graham Fawcett wrote:
It's just shifting the burden perhaps, but if you're on a Unix system
you should be able to use tac(1) to reverse your file a bit faster:

import os
for line in os.popen('tac myfile.txt'):
#do something with the line


It probably isn't shifting the burden; they probably do it right.

Doing it right involves reading the file in chunks backwards, and scanning
backwards for newlines, but getting it right when lines cross boundaries,
while perhaps not *hard*, is exactly the kind of tricky programming it is
best to do once... preferably somebody else's once. :-)

This way you don't read the file twice, as the first time can take a while.
Jul 18 '05 #7
Andrew Dalke <ad****@mindspring.com> writes:
It's just shifting the burden perhaps, but if you're on a Unix system
you should be able to use tac(1) to reverse your file a bit faster:


Huh. Hadn't heard of that one. It's not installed
on my OS X box. It's on my FreeBSD account as gtac.
Ah, but it is available on a Linux account.


You can try tail(1).
Jul 18 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

14 posts views Thread by Job Lot | last post: by
7 posts views Thread by Shahid Juma | last post: by
6 posts views Thread by tkpmep | last post: by
11 posts views Thread by Matt DeFoor | last post: by
6 posts views Thread by Rajorshi Biswas | last post: by
6 posts views Thread by Neil Patel | last post: by
reply views Thread by zhoujie | last post: by
xarzu
2 posts views Thread by xarzu | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.