473,396 Members | 2,011 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Difference between readlines() and iterating on a file object?

Hi,

Can anyone tell me what the difference is between

for line in file.readlines( ):

and

for line in file:

where file is a file object returned from an open( ) call?

I thought that they did the same thing, but the code I am using it in has
this line called more than once on the same file object and the second time
it is ran gives different results for each.

What is the difference in implementation?

Cheers

Rich
Jul 18 '05 #1
5 3850
On Fri, 13 Aug 2004, Richard wrote:
Can anyone tell me what the difference is between

for line in file.readlines( ):

and

for line in file:

where file is a file object returned from an open( ) call?
The first form slurps every line in the file into a list, and then goes
through each item in the list in turn.

The second form skips the middleman, and simply goes through each line of
the file in turn (no interim list is created). In this context, file is
acting as a generator. Because a list isn't created, this form is both
faster and consumes less memory, overall making it much more efficient
than .readlines().
I thought that they did the same thing, but the code I am using it in has
this line called more than once on the same file object and the second time
it is ran gives different results for each.
Assuming you don't prematurely exit the for loop or access the file in
another manner while looping, both forms should give identical results.
Otherwise...
What is the difference in implementation?


Because first form slurps everything in at once, repeated calls to it
(with no intervening seek()s) will always return an empty list, whether
the for loop was stopped prematurely or not.

On the other hand, since the second form only reads one line at a time
(using file.next()), if the for loop is stopped prematurely (e.g. via
break), subsequent invocations will pick up right where the previous one
left off.

Hope this helps.

Jul 18 '05 #2
"Richard" <ri******@hmgcc.gov.uk> wrote in
news:41********@mail.hmgcc.gov.uk:
Hi,

Can anyone tell me what the difference is between

for line in file.readlines( ):
reads the entire file into memory and splits it up into a list of lines
then iterates over the list. If you break from the loop, tough you've lost
any lines that were read but you didn't handle.

and

for line in file:
reads part of the file and strips off one line at a time. Never creates a
list. Reads more only when it runs out of the block it read. If you break
from the loop you can do another 'for line in file' and get the remaining
lines.

where file is a file object returned from an open( ) call?

Jul 18 '05 #3
Christopher T King <sq******@WPI.EDU> wrote:
Assuming you don't prematurely exit the for loop or access the file in
another manner while looping, both forms should give identical results.
Otherwise...


Well, there is a corner case if some external process is writing to the
file while you're reading it. The "in file.readlines():" version will
get a snapshot of the file at the time you read it, while the "in file:"
version will do a sequence of reads over time.

Not that I think this is what's going on in the OP's case, but it's
something to be aware of.
Jul 18 '05 #4
Duncan Booth <du**********@invalid.invalid> writes:
"Richard" <ri******@hmgcc.gov.uk> wrote in
news:41********@mail.hmgcc.gov.uk:
Hi,

Can anyone tell me what the difference is between

for line in file.readlines( ):


reads the entire file into memory and splits it up into a list of lines
then iterates over the list. If you break from the loop, tough you've lost
any lines that were read but you didn't handle.

and

for line in file:


reads part of the file and strips off one line at a time. Never creates a
list. Reads more only when it runs out of the block it read. If you break
from the loop you can do another 'for line in file' and get the remaining
lines.


But this last part only works the way you expect in 2.3, I think.

Cheers,
mwh

--
Ability to type on a computer terminal is no guarantee of sanity,
intelligence, or common sense.
-- Gene Spafford's Axiom #2 of Usenet
Jul 18 '05 #5
Duncan Booth wrote:
"Richard" <ri******@hmgcc.gov.uk> wrote in
news:41********@mail.hmgcc.gov.uk:

Hi,

Can anyone tell me what the difference is between

for line in file.readlines( ):

reads the entire file into memory and splits it up into a list of lines
then iterates over the list. If you break from the loop, tough you've lost
any lines that were read but you didn't handle.

and

for line in file:

reads part of the file and strips off one line at a time. Never creates a
list. Reads more only when it runs out of the block it read. If you break
from the loop you can do another 'for line in file' and get the remaining
lines.


However, one thing that bit me was that you cant use f.tell() to get the
current position of the line in the file. If you use "for line in
fileobject:" and then you first line is fileobject.tell() that will
return the end of file position and not the position of the next line.
Might be a bit counter-intuitive.

I am learning to be a better python programmer and I have written this
small program to parse Mail box files and display emails which match the
specified text. Any comments on this will appreciated. I know I can read
the whole file using readlines(), not sure if that is good idea?
Batigol:~/pgrep hari$ cat pgrep.py
import sys

hits = {}
lines = {}
count = 0
emailstart = "From -"

def build(f, str):

global count, hits, lines

f.seek(0)
start_email = 0
end_email = 0
pointers = []
str_matched = []
found = 0

line = f.readline()

while line != '':
if line.find(emailstart) != -1:
# Start of Mail
start_email = f.tell()
if found == 1:
#print "From - inside found "
pointers.append(end_email)
found = 0
hits[count] = pointers
lines[count] = str_matched
count += 1
pointers = []
str_matched = []

if line.find(str) != -1:
# Found string
#print "Found string: "
#print "count", count
if len(pointers) == 0:
pointers.append(start_email)
found = 1
str_matched.append(line)
#lines[count] = line

end_email = f.tell()
line = f.readline()

def display(f):
global count, hits, lines

if count == 0:
sys.stdout.write("Not found! \n")
sys.stdout.flush()
sys.exit(0)

sys.stdout.write("#: Line Contents\n")
for i in range(count):
for j in range(len(lines[i])):
choice = "%s: %s" %(i, lines[i][j])
sys.stdout.write(choice)

sys.stdout.write("Enter # of email to display: ")
sys.stdout.flush()
input = sys.stdin.readline()
try:
i = int(input.strip())
f.seek(hits[i][0])
while f.tell() != hits[i][1]:
sys.stdout.write(f.readline())
except:
sys.stderr.write("Invalid choice\n")

sys.stdout.flush()

if __name__ == "__main__":
try:
f = file(sys.argv[1], "r")
except:
sys.stdout.write("Error opening file\n")
sys.exit(1)

build(f, sys.argv[2])
response = 'n'
#print response
while response == 'n':
display(f)
sys.stdout.write("Do you want to quit, y or n? ")
sys.stdout.flush()
response = sys.stdin.readline().strip()

f.close()
sys.exit(0)

Thanks,

Hari

Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Pernell Williams | last post by:
Hi all: Thank you for your responses. I have a more specific question about "file.seek() and file.readline()" versus "file.seek() and file.xreadlines". When I have the following code:
2
by: Yong Wang | last post by:
Hi, I use readlines() to read one data file. Python automatically parses the read contents into a list of lines. When I used list to print out the 1st line, it is ok. When I use the list index 2...
9
by: wordsender | last post by:
Hey guys, I can't figure this one out, why is this simple script giving me problems? logfile=file(r'test.txt','w') logfile.write('datetime') test=logfile.readlines() When I run it I get...
2
by: vch | last post by:
Does a call to file.readlines() reads all lines at once in the memory? Are the any reasons, from the performance point of view, to prefer *while* loop with readline() to *for* loop with readlines()?
3
by: Jeremy | last post by:
I have a most aggravating problem. I don't understand what is causing readlines() not to read all the lines in the file. I have the following syntax: # some initial stuff XS =...
34
by: Ross Reyes | last post by:
HI - Sorry for maybe a too simple a question but I googled and also checked my reference O'Reilly Learning Python book and I did not find a satisfactory answer. When I use readlines, what...
7
by: Wojciech Gryc | last post by:
Hi, I'm currently using Python to deal with a fairly large text file (800 MB), which I know has about 85,000 lines of text. I can confirm this because (1) I built the file myself, and (2)...
0
by: Jean-Paul Calderone | last post by:
On Mon, 12 May 2008 08:05:39 -0700 (PDT), cshirky <cshirky@gmail.comwrote: file.readlines reads the entire file into a list in memory. You may not want to do this. You could try, instead,...
7
by: Nikhil | last post by:
Hi, I am reading a file with readlines method of the filepointer object returned by the open function. Along with reading the lines, I also need to know which line number of the file is read in...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.