473,785 Members | 2,744 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

mixing for x in file: and file.readline

At one time, mixing for x in file and readline was dangerous. For
example:

for line in file:
# read some lines from a file, then break
nextline = readline() # bad

would not do what a naive user might expect because the file iterator
buffered data and readline did not read from that buffer. Hence the call
to readline might unexpectedly skip some lines.

I stumbled across this the hard way, but am wondering if it's still
present in Python 2.3. I thought I'd seen it documented recently, but
looking through the description of the file object in the Python Library
Reference, I didn't see it.

Anyone know if it's still an issue? If so, anyone have any idea how hard
it would be to fix? I'm willing to work on a patch, but would probably
need some help. And if experts have already determined it's too hard,
and are willing to expain, I'd love some idea of why that is.

-- Russell
Jul 18 '05 #1
6 4909
"Russell E. Owen" <ro***@cesmail. net> writes:
At one time, mixing for x in file and readline was dangerous. For
example:

for line in file:
# read some lines from a file, then break
nextline = readline() # bad

would not do what a naive user might expect because the file iterator
buffered data and readline did not read from that buffer. Hence the call
to readline might unexpectedly skip some lines.

I stumbled across this the hard way, but am wondering if it's still
present in Python 2.3. I thought I'd seen it documented recently, but
looking through the description of the file object in the Python Library
Reference, I didn't see it.
There was a thread-fragment about this a while back. See the message
from Steven Taschuk a few messages past this one:

http://www.google.com/groups?hl=en&l...3D30%26hl%3Den

http://tinyurl.com/n2cc
Anyone know if it's still an issue? If so, anyone have any idea how hard

[...]

Was fixed in 2.3, maybe in 2.2.3 also (not sure).
John
Jul 18 '05 #2
On Thu, Sep 11, 2003 at 01:54:53PM -0700, Russell E. Owen wrote:
At one time, mixing for x in file and readline was dangerous. For
example:

for line in file:
# read some lines from a file, then break
nextline = readline() # bad

would not do what a naive user might expect because the file iterator
buffered data and readline did not read from that buffer. Hence the call
to readline might unexpectedly skip some lines.

I stumbled across this the hard way, but am wondering if it's still
present in Python 2.3.
Yes.

After you start reading a file with 'for' or iter() the current file
position is undefined unless you continue to the end of the file. This
means that once you start you shouldn't use the read(), readline() or
tell() methods unless you first seek() to a well-defined position.

The readline() and read() methods use the buffered I/O operations supplied
by the underlying C library. You can safely intermix read() and realine()
as well as tell()ing and seek()ing around without encountering any
unexpected behavior. You can even mix read operations on the same file
from Python code and stdio calls from an extension module (after getting
the FILE* object using PyFile_AsFile).

File iteration uses its own buffering for performance. Guido has declared
that "for line in fileobj:" should always be the fastest way to read an
entire file line by line. You just can't do that with the crappy stdio
implementations out there without adding your own buffering layer. Once
you do that it is out of sync with the FILE* object's idea of the current
file position.

In Python 2.2 if you break in the middle of the loop the temporary
iterator object (xreadlines) is lost along with its readahead buffer,
leaving you at an unknown file position. The only things you can do are
to close the file or seek. In Python 2.3 the file object IS an iterator
(rather than HAS and iterator) so while the current file position is
undefined from a read/readline/tell point of view the iterator state is
still consistent so you can immediately use it in another for loop to
continue from the same position or even call its next() method directly.
Anyone know if it's still an issue? If so, anyone have any idea how hard
it would be to fix? I'm willing to work on a patch, but would probably
need some help. And if experts have already determined it's too hard,
and are willing to expain, I'd love some idea of why that is.


Really fixing it amounts to reimplementing the entire I/O layer of
Python with a different strategy and thoroughly testing on multiple
platforms.

It's possible to hide the problem in most cases by making read and
readline use the iteration readahead buffer if it's attached to the file
object and stdio if it isn't. I don't think it's a good idea. It will
require some hairy code and and seems susceptible to subtle bugs and
corner cases.

Another alternative it to make read and readline fail noisily after
iteration starts (unless cleared by seek())

Oren

Jul 18 '05 #3
Oren Tirosh <or*******@hish ome.net> writes:
On Thu, Sep 11, 2003 at 01:54:53PM -0700, Russell E. Owen wrote:
At one time, mixing for x in file and readline was dangerous. For
example:
[...] Yes. [...] In Python 2.2 if you break in the middle of the loop the temporary
iterator object (xreadlines) is lost along with its readahead buffer,
leaving you at an unknown file position. The only things you can do are
to close the file or seek. In Python 2.3 the file object IS an iterator
(rather than HAS and iterator) so while the current file position is
undefined from a read/readline/tell point of view the iterator state is
still consistent so you can immediately use it in another for loop to
continue from the same position or even call its next() method directly.

[...]

Oh, sorry for the misinformation -- I thought the repeated-iteration
and mixing-iteration-with-readline issues were the same, but clearly
not.
John
Jul 18 '05 #4
In article <ma************ *************** *******@python. org>,
Oren Tirosh <or*******@hish ome.net> wrote:
On Thu, Sep 11, 2003 at 01:54:53PM -0700, Russell E. Owen wrote:
At one time, mixing for x in file and readline was dangerous. For
example:

for line in file:
# read some lines from a file, then break
nextline = readline() # bad

would not do what a naive user might expect because the file iterator
buffered data and readline did not read from that buffer. Hence the call
to readline might unexpectedly skip some lines...

(Oren points out that it's still a problem in Python 2.3 and after some
interesting and gory detail goes on to say...)
Really fixing it amounts to reimplementing the entire I/O layer of
Python with a different strategy and thoroughly testing on multiple
platforms.

It's possible to hide the problem in most cases by making read and
readline use the iteration readahead buffer if it's attached to the file
object and stdio if it isn't. I don't think it's a good idea. It will
require some hairy code and and seems susceptible to subtle bugs and
corner cases.
I agree that fixing read would probably be too messy to justify.

But it seems to me that a simple reimplementatio n of readline() would
work fine:

def readline(self):
try:
return self.next()
except StopIteration
return ""

That's basically the way I ended up working around the problem (but I
didn't try to modify any classes). I do see two issues with that fix:
- existing code (if any) that mixes readlines and read would be harmed
- it may not be efficient enough (even implemented in C)
Another alternative it to make read and readline fail noisily after
iteration starts (unless cleared by seek())


If readlines cannot be fixed, this might be worth doing since I think
it's a common thing to want to mix readlines and iteration. If read is
the only issue, I suspect adding a warning to the documentation for file
method "read" would suffice.

I'm wondering where the problem is discussed in the manual. I'm pretty
sure I saw it recently, but when I read about file methods I saw nothing
about it.

-- Russell
Jul 18 '05 #5
In article <ow************ ************@nn tp6.u.washingto n.edu>,
"Russell E. Owen" <ow**@astro.was hington.edu> wrote:
In article <ma************ *************** *******@python. org>,
Oren Tirosh <or*******@hish ome.net> wrote:
Another alternative it to make read and readline fail noisily after
iteration starts (unless cleared by seek())


The seek workaround turns out to be very challenging, unless I'm missing
something. seek(0, 1) doesn't do anything -- no surprise, but it was
worth a try. Apparently the right thing is
seek(-n, 1) where n = # of characters in the iterator's buffer
but I havn't found any way of querying that information.

(The thought of using absolute positioning is appalling -- one would
have to keep track of how many characters had been returned by the
iterator).

A possible fix for read is to have it automatically do the seek
mentioned above (if the iteration buffer is nonempty). That'd work for
readline as well, but I still prefer the idea of having it use the
itearator -- it seems a lot simpler.

Comments?

-- Russell
Jul 18 '05 #6
On Fri, Sep 12, 2003 at 10:57:47AM -0700, Russell E. Owen wrote:
....
It's possible to hide the problem in most cases by making read and
readline use the iteration readahead buffer if it's attached to the file
object and stdio if it isn't. I don't think it's a good idea. It will
require some hairy code and and seems susceptible to subtle bugs and
corner cases.
I agree that fixing read would probably be too messy to justify.

But it seems to me that a simple reimplementatio n of readline() would
work fine:

def readline(self):
try:
return self.next()
except StopIteration
return ""

That's basically the way I ended up working around the problem (but I
didn't try to modify any classes). I do see two issues with that fix:
- existing code (if any) that mixes readlines and read would be harmed
- it may not be efficient enough (even implemented in C)


It will be very efficient. In fact, it will be faster than the current
readline implementation because it will use the readahead buffer. But
the problem is more than just mixing readline() and read(). Mixing
readline() and tell() will also be broken. It is valid (and useful) to
read a file line by line, store a tell() offset and later seek() back to
the same line. It works even if the file is in text mode doing CRLF->LF
conversions.
Another alternative it to make read and readline fail noisily after
iteration starts (unless cleared by seek())


If readlines cannot be fixed, this might be worth doing since I think
it's a common thing to want to mix readlines and iteration. If read is
the only issue, I suspect adding a warning to the documentation for file
method "read" would suffice.


The problem is that it will work on, say, Python 2.3.1 but fail silently
on earlier versions. Why not just use next() instead of readline()?
Because catching StopIteration takes a little more typing than checking
an empty string?
I'm wondering where the problem is discussed in the manual. I'm pretty
sure I saw it recently, but when I read about file methods I saw nothing
about it.


I believe it's not documented clearly enough. Docpatch time?

Oren

Jul 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
3020
by: Peter Abel | last post by:
Hi all, I'm working under W2k with Python 2.2.2 (#37, Oct 14 2002, 17:02:34) on win32 I have a file *test_data.txt* with the following content: 0123456789 0123456789 abcdefghi ABCDEFGHIJKLMNOPQ
3
4607
by: Pernell Williams | last post by:
Hi all: I am new to Python, and this is my first post (and it won't be my last!), so HELLO EVERYONE!! I am attempting to use "xreadlines", an outer loop and an inner loop in conjunction with "file.tell() and file.seek() in order to navigate through a file in order to print specific lines (for example, every 5th line). Allow me to illustrate by example:
4
30484
by: Tor Erik Sønvisen | last post by:
Hi How can I read the first line of a file and then delete this line, so that line 2 is line 1 on next read? regards
2
7430
by: Alexander Schmidt | last post by:
Hi, I am not very familiar with C++ programming, so before I do a dirty hack I ask for a more elegant solution (but only the usage of STL is allowed, no special libs). So I need to read a file in OFF format and store the values read in some file format, I suppose in arrays of floats or ints would be meaningful: My general idea:
7
3916
by: walterbyrd | last post by:
Python's lack of an EOF character is giving me a hard time. I've tried: ----- s = f.readline() while s: .. .. s = f.readline()
1
5608
by: shyaminf | last post by:
hi everybody! iam facing a problem with the transfer of file using servlet programming. i have a code for uploading a file. but i'm unable to execute it using tomcat5.5 server. kindly help me how to execute it using tomcat server5.5. the code is as follows. if you have any other coding regarding this, please send me.it's urgent. import java.io.*; import java.util.*; import javax.servlet.*; import javax.servlet.http.*;
2
2694
by: tgiles | last post by:
Hi, All! I started back programming Python again after a hiatus of several years and run into a sticky problem that I can't seem to fix, regardless of how hard I try- it it starts with tailing a log file. Basically, I'm trying to tail a log file and send the contents elsewhere in the script (here, I call it processor()). My first iteration below works perfectly fine- as long as the log file itself (logfile.log) keeps getting written...
5
2727
by: kj | last post by:
I'm trying to subclass file, overriding the readline method. The new method definition begins with def readline(self, size=None): line = self.file.readline(size) # etc., etc. ....where the self.file attribute is a regular file object. This works fine if I invoke the new method with an integer argument,
2
6134
by: Plumebee | last post by:
Hi, I am very new to programming and have just started to use Visual Basic 2005 Express Edition. I am trying to read from a text file to draw a rectangles and lines. However to begin I'm trying to get the information to show in a listbox so that I know the application is reading the file correctly. I seem to be able to open the file but it doesn't display anything in the listbox. Dim FileName As String Dim reader As...
0
9480
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
10087
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9947
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8971
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7496
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6737
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5380
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5511
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4046
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.