472,805 Members | 948 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,805 software developers and data experts.

Reading a text file backwards

Jay
I have a very large text file (being read by a CGI script on a web server),
and I get memory errors when I try to read the whole file into a list of
strings. The problem is, I want to read the file backwards, starting with
the last line.

Previously, I did:

myfile = open('myfile.txt', 'r')
mylines = myfile.readlines()
myfile.close()
for line in range(len(mylines)-1, -1, -1):
# do something with mylines[line]

This, however caused a "MemoryError," so I want to do something like

myfile = open('myfile.txt', 'r')
for line in myfile:
# do something with line
myfile.close()

Only, I want to iterate backwards, starting with the last line of the file.
Can anybody suggest a simple way of doing this? Do I need to jump around
with myfile.seek() and use myfile.readline() ?


Jul 18 '05 #1
7 17343
Jay,

Try this:

myfile = open('myfile.txt', 'r')
mylines = myfile.readlines()
myfile.close()
mylines.reverse()

Rick

Jay wrote:
I have a very large text file (being read by a CGI script on a web
server), and I get memory errors when I try to read the whole file into a
list of strings. The problem is, I want to read the file backwards,
starting with the last line.

Previously, I did:

myfile = open('myfile.txt', 'r')
mylines = myfile.readlines()
myfile.close()
for line in range(len(mylines)-1, -1, -1):
# do something with mylines[line]

This, however caused a "MemoryError," so I want to do something like

myfile = open('myfile.txt', 'r')
for line in myfile:
# do something with line
myfile.close()

Only, I want to iterate backwards, starting with the last line of the
file. Can anybody suggest a simple way of doing this? Do I need to jump
around with myfile.seek() and use myfile.readline() ?


Jul 18 '05 #2
Jay wrote:
Only, I want to iterate backwards, starting with the last line of the file.
Can anybody suggest a simple way of doing this? Do I need to jump around
with myfile.seek() and use myfile.readline() ?


Python Cookbook has a recipe. Or two.

http://aspn.activestate.com/ASPN/Coo.../Recipe/276149
http://aspn.activestate.com/ASPN/Coo.../Recipe/120686

I've not looked at them to judge the quality

Another approach is to read the lines forwards and save
the starting line position. Then iterate backwards
through the positions, seek to it and read a line.

def find_offsets(infile):
offsets = []
offset = 0
for line in infile:
offsets.append(offset)
offset += len(line)
return offsets

def iter_backwards(infile):
# make sure it's seekable and at the start
infile.seek(0)
offsets = find_offsets(infile)
for offset in offsets[::-1]:
infile.seek(offset)
yield infile.readline()

for line in iter_backwards(open("spam.py")):
print repr(line)

This won't work on MS Windows because of the
'\r\n' -> '\n' conversion. You would instead
need something like

def find_offsets(infile):
offsets = []
while 1:
offset = infile.tell()
if not infile.readline():
break
offsets.append(offset)
return offsets
Just submitted this solution to the cookbook.

Andrew
da***@dalkescientific.com
Jul 18 '05 #3
Rick Holbert <ho******@dma.org> wrote:
: Jay,

: Try this:

: myfile = open('myfile.txt', 'r')
: mylines = myfile.readlines()
: myfile.close()
: mylines.reverse()
Hi Rick,

But this probably won't work for Jay: he's running into memory issues
because the file's too large to hold in memory at once. The point is
to avoid readlines().

Here's a generator that tries to iterate backwards across a file. We
first get the file positions of each newline, and then afterwards
start going through the offsets.

###

def backfileiter(myfile):
"""Iterates the lines of a file, but in reverse order."""
myfile.seek(0)
offsets = _getLineOffsets(myfile)
myfile.seek(0)
offsets.reverse()
for i in offsets:
myfile.seek(i+1)
yield myfile.readline()

def _getLineOffsets(myfile):
"""Return a list of offsets where newlines are located."""
offsets = [-1]
i = 0
while True:
byte = myfile.read(1)
if not byte:
break
elif byte == '\n':
offsets.append(i)
i += 1
return offsets
###

For example:

###
from StringIO import StringIO
f = StringIO(""" .... hello world
.... this
.... is a
.... test""")
f.seek(0)
for line in backfileiter(f): print repr(line)

....
'test'
'is a\n'
'this\n'
'hello world\n'
'\n'
###
Hope this helps!
Jul 18 '05 #4
It's just shifting the burden perhaps, but if you're on a Unix system
you should be able to use tac(1) to reverse your file a bit faster:

import os
for line in os.popen('tac myfile.txt'):
#do something with the line

Jul 18 '05 #5
Graham Fawcett wrote:
It's just shifting the burden perhaps, but if you're on a Unix system
you should be able to use tac(1) to reverse your file a bit faster:


Huh. Hadn't heard of that one. It's not installed
on my OS X box. It's on my FreeBSD account as gtac.
Ah, but it is available on a Linux account.

Andrew
da***@dalkescientific.com
Jul 18 '05 #6
On Thu, 30 Sep 2004 17:41:14 -0700, Graham Fawcett wrote:
It's just shifting the burden perhaps, but if you're on a Unix system
you should be able to use tac(1) to reverse your file a bit faster:

import os
for line in os.popen('tac myfile.txt'):
#do something with the line


It probably isn't shifting the burden; they probably do it right.

Doing it right involves reading the file in chunks backwards, and scanning
backwards for newlines, but getting it right when lines cross boundaries,
while perhaps not *hard*, is exactly the kind of tricky programming it is
best to do once... preferably somebody else's once. :-)

This way you don't read the file twice, as the first time can take a while.
Jul 18 '05 #7
Andrew Dalke <ad****@mindspring.com> writes:
It's just shifting the burden perhaps, but if you're on a Unix system
you should be able to use tac(1) to reverse your file a bit faster:


Huh. Hadn't heard of that one. It's not installed
on my OS X box. It's on my FreeBSD account as gtac.
Ah, but it is available on a Linux account.


You can try tail(1).
Jul 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Amy L. | last post by:
Is there a way through .net to read a very large text file (400MB+) backwards line by line. In system.io the filestream class has a "seek" method but the only read method requires you to know how...
14
by: Job Lot | last post by:
I have tab delimited text file which gets populated on daily basis via automated process. New entry is written at the bottom. I need to create a utility which makes a copy of this file with 10 most...
7
by: Shahid Juma | last post by:
Hi, I have a text file which I would like to read from the end and display only a certain number of records. Is there any way of doing this? Thanks, Shahid
6
by: tkpmep | last post by:
I have a text file with many hundreds of lines of data. The data of interest to me, however, resides at the bottom of the file, in the last 20 lines. Right now, I read the entire file and discard...
11
by: Matt DeFoor | last post by:
I have some log files that I'm working with that look like this: 1000000000 3456 1234 1000000001 3456 1235 1000020002 3456 1223 1000203044 3456 986 etc. I'm trying to read the file...
6
by: Rajorshi Biswas | last post by:
Hi folks, Suppose I have a large (1 GB) text file which I want to read in reverse. The number of characters I want to read at a time is insignificant. I'm confused as to how best to do it. Upon...
6
by: Neil Patel | last post by:
I have a log file that puts the most recent record at the bottom of the file. Each line is delimited by a \r\n Does anyone know how to seek to the end of the file and start reading backwards?
1
by: stoogots2 | last post by:
I have written a Windows App in C# that needs to read a text file over the network, starting from the end of the file and reading backwards toward the beginning (looking for the last occurrence of a...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
linyimin
by: linyimin | last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
0
by: kcodez | last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
0
by: Taofi | last post by:
I try to insert a new record but the error message says the number of query names and destination fields are not the same This are my field names ID, Budgeted, Actual, Status and Differences ...
5
by: DJRhino | last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer) If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _ 310030356 Or 310030359 Or 310030362 Or...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
0
by: lllomh | last post by:
How does React native implement an English player?
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.