473,406 Members | 2,293 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Reading a text file backwards

Jay
I have a very large text file (being read by a CGI script on a web server),
and I get memory errors when I try to read the whole file into a list of
strings. The problem is, I want to read the file backwards, starting with
the last line.

Previously, I did:

myfile = open('myfile.txt', 'r')
mylines = myfile.readlines()
myfile.close()
for line in range(len(mylines)-1, -1, -1):
# do something with mylines[line]

This, however caused a "MemoryError," so I want to do something like

myfile = open('myfile.txt', 'r')
for line in myfile:
# do something with line
myfile.close()

Only, I want to iterate backwards, starting with the last line of the file.
Can anybody suggest a simple way of doing this? Do I need to jump around
with myfile.seek() and use myfile.readline() ?


Jul 18 '05 #1
7 17398
Jay,

Try this:

myfile = open('myfile.txt', 'r')
mylines = myfile.readlines()
myfile.close()
mylines.reverse()

Rick

Jay wrote:
I have a very large text file (being read by a CGI script on a web
server), and I get memory errors when I try to read the whole file into a
list of strings. The problem is, I want to read the file backwards,
starting with the last line.

Previously, I did:

myfile = open('myfile.txt', 'r')
mylines = myfile.readlines()
myfile.close()
for line in range(len(mylines)-1, -1, -1):
# do something with mylines[line]

This, however caused a "MemoryError," so I want to do something like

myfile = open('myfile.txt', 'r')
for line in myfile:
# do something with line
myfile.close()

Only, I want to iterate backwards, starting with the last line of the
file. Can anybody suggest a simple way of doing this? Do I need to jump
around with myfile.seek() and use myfile.readline() ?


Jul 18 '05 #2
Jay wrote:
Only, I want to iterate backwards, starting with the last line of the file.
Can anybody suggest a simple way of doing this? Do I need to jump around
with myfile.seek() and use myfile.readline() ?


Python Cookbook has a recipe. Or two.

http://aspn.activestate.com/ASPN/Coo.../Recipe/276149
http://aspn.activestate.com/ASPN/Coo.../Recipe/120686

I've not looked at them to judge the quality

Another approach is to read the lines forwards and save
the starting line position. Then iterate backwards
through the positions, seek to it and read a line.

def find_offsets(infile):
offsets = []
offset = 0
for line in infile:
offsets.append(offset)
offset += len(line)
return offsets

def iter_backwards(infile):
# make sure it's seekable and at the start
infile.seek(0)
offsets = find_offsets(infile)
for offset in offsets[::-1]:
infile.seek(offset)
yield infile.readline()

for line in iter_backwards(open("spam.py")):
print repr(line)

This won't work on MS Windows because of the
'\r\n' -> '\n' conversion. You would instead
need something like

def find_offsets(infile):
offsets = []
while 1:
offset = infile.tell()
if not infile.readline():
break
offsets.append(offset)
return offsets
Just submitted this solution to the cookbook.

Andrew
da***@dalkescientific.com
Jul 18 '05 #3
Rick Holbert <ho******@dma.org> wrote:
: Jay,

: Try this:

: myfile = open('myfile.txt', 'r')
: mylines = myfile.readlines()
: myfile.close()
: mylines.reverse()
Hi Rick,

But this probably won't work for Jay: he's running into memory issues
because the file's too large to hold in memory at once. The point is
to avoid readlines().

Here's a generator that tries to iterate backwards across a file. We
first get the file positions of each newline, and then afterwards
start going through the offsets.

###

def backfileiter(myfile):
"""Iterates the lines of a file, but in reverse order."""
myfile.seek(0)
offsets = _getLineOffsets(myfile)
myfile.seek(0)
offsets.reverse()
for i in offsets:
myfile.seek(i+1)
yield myfile.readline()

def _getLineOffsets(myfile):
"""Return a list of offsets where newlines are located."""
offsets = [-1]
i = 0
while True:
byte = myfile.read(1)
if not byte:
break
elif byte == '\n':
offsets.append(i)
i += 1
return offsets
###

For example:

###
from StringIO import StringIO
f = StringIO(""" .... hello world
.... this
.... is a
.... test""")
f.seek(0)
for line in backfileiter(f): print repr(line)

....
'test'
'is a\n'
'this\n'
'hello world\n'
'\n'
###
Hope this helps!
Jul 18 '05 #4
It's just shifting the burden perhaps, but if you're on a Unix system
you should be able to use tac(1) to reverse your file a bit faster:

import os
for line in os.popen('tac myfile.txt'):
#do something with the line

Jul 18 '05 #5
Graham Fawcett wrote:
It's just shifting the burden perhaps, but if you're on a Unix system
you should be able to use tac(1) to reverse your file a bit faster:


Huh. Hadn't heard of that one. It's not installed
on my OS X box. It's on my FreeBSD account as gtac.
Ah, but it is available on a Linux account.

Andrew
da***@dalkescientific.com
Jul 18 '05 #6
On Thu, 30 Sep 2004 17:41:14 -0700, Graham Fawcett wrote:
It's just shifting the burden perhaps, but if you're on a Unix system
you should be able to use tac(1) to reverse your file a bit faster:

import os
for line in os.popen('tac myfile.txt'):
#do something with the line


It probably isn't shifting the burden; they probably do it right.

Doing it right involves reading the file in chunks backwards, and scanning
backwards for newlines, but getting it right when lines cross boundaries,
while perhaps not *hard*, is exactly the kind of tricky programming it is
best to do once... preferably somebody else's once. :-)

This way you don't read the file twice, as the first time can take a while.
Jul 18 '05 #7
Andrew Dalke <ad****@mindspring.com> writes:
It's just shifting the burden perhaps, but if you're on a Unix system
you should be able to use tac(1) to reverse your file a bit faster:


Huh. Hadn't heard of that one. It's not installed
on my OS X box. It's on my FreeBSD account as gtac.
Ah, but it is available on a Linux account.


You can try tail(1).
Jul 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Amy L. | last post by:
Is there a way through .net to read a very large text file (400MB+) backwards line by line. In system.io the filestream class has a "seek" method but the only read method requires you to know how...
14
by: Job Lot | last post by:
I have tab delimited text file which gets populated on daily basis via automated process. New entry is written at the bottom. I need to create a utility which makes a copy of this file with 10 most...
7
by: Shahid Juma | last post by:
Hi, I have a text file which I would like to read from the end and display only a certain number of records. Is there any way of doing this? Thanks, Shahid
6
by: tkpmep | last post by:
I have a text file with many hundreds of lines of data. The data of interest to me, however, resides at the bottom of the file, in the last 20 lines. Right now, I read the entire file and discard...
11
by: Matt DeFoor | last post by:
I have some log files that I'm working with that look like this: 1000000000 3456 1234 1000000001 3456 1235 1000020002 3456 1223 1000203044 3456 986 etc. I'm trying to read the file...
6
by: Rajorshi Biswas | last post by:
Hi folks, Suppose I have a large (1 GB) text file which I want to read in reverse. The number of characters I want to read at a time is insignificant. I'm confused as to how best to do it. Upon...
6
by: Neil Patel | last post by:
I have a log file that puts the most recent record at the bottom of the file. Each line is delimited by a \r\n Does anyone know how to seek to the end of the file and start reading backwards?
1
by: stoogots2 | last post by:
I have written a Windows App in C# that needs to read a text file over the network, starting from the end of the file and reading backwards toward the beginning (looking for the last occurrence of a...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.