By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,965 Members | 1,631 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,965 IT Pros & Developers. It's quick & easy.

looping over a big file

P: n/a
Hi,

I've a couple of questions regarding the processing of a big text file
(16MB).

1) how does python handle:
for line in big_file:
is big_file all read into memory or one line is read at a time or a buffer
is used or ...?

2) is it possible to advance lines within the loop? The following doesn't
work:
for line in big_file:

line_after = big_file.readline()

the function readline (file pointer) is "out of sync" with the loop (and
this suggests bug_file is not read one line at a time in the loop).

Thanks,
Fernando Martins
Jul 21 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
"martian" <no****@hetnet.nl> wrote:
1) how does python handle:
for line in big_file:
is big_file all read into memory or one line is read at a time or a buffer
is used or ...?


The "right" way to do this is:

for line in file ("filename"):
whatever

The file object returned by file() acts as an iterator. Each time through
the loop, another line is read and returned (I'm sure there is some
block-level buffering going on at a low level).
2) is it possible to advance lines within the loop? The following doesn't
work:
for line in big_file:

line_after = big_file.readline()


You probably want something like:

for line in file ("filename"):
if skipThisLine:
continue
Jul 21 '05 #2

P: n/a
Roy Smith <ro*@panix.com> writes:
The "right" way to do this is:

for line in file ("filename"):
whatever

The file object returned by file() acts as an iterator. Each time through
the loop, another line is read and returned (I'm sure there is some
block-level buffering going on at a low level).


I disagree. That's the *convenient* way to do it, and perfectly
acceptable in many situations. But not all Python interpreters will
close the file when for loop ends. Likewise, if you get an exception
during the processing, the file may not get closed properly. Those
things may matter to you, in which case the "right" way is:

data = open("filename")
try:
for line in data:
whatever
finally:
data.close()

Guido has made a pronouncement on open vs. file. I think he prefers
open for opening files, and file for type testing, but may well be
wrong. I don't think it's critical.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Jul 21 '05 #3

P: n/a
Mike Meyer wrote:
Roy Smith <ro*@panix.com> writes:
The "right" way to do this is:

for line in file ("filename"):
whatever

The file object returned by file() acts as an iterator. Each time through
the loop, another line is read and returned (I'm sure there is some
block-level buffering going on at a low level).

I disagree. That's the *convenient* way to do it, and perfectly
acceptable in many situations. But not all Python interpreters will
close the file when for loop ends. Likewise, if you get an exception
during the processing, the file may not get closed properly. Those
things may matter to you, in which case the "right" way is:

data = open("filename")
try:
for line in data:
whatever
finally:
data.close()

Guido has made a pronouncement on open vs. file. I think he prefers
open for opening files, and file for type testing, but may well be
wrong. I don't think it's critical.


He has said that open() may be used for things other than files in the
future. So if you want to be sure you're opening a file, use file().

<wink>
--
Michael Hoffman
Jul 21 '05 #4

P: n/a
Michael Hoffman wrote:
Mike Meyer wrote:
Guido has made a pronouncement on open vs. file. I think he prefers
open for opening files, and file for type testing, but may well be
wrong. I don't think it's critical.


He has said that open() may be used for things other than files in the
future. So if you want to be sure you're opening a file, use file().


Probably this is the same sort of things as "if you want to be sure your
function is working with an integer, you have to test whether it is an
integer" (or use a statically typed language).

Which is advice that is generally rebutted around here with comments
about "duck typing" (as in, if it acts like an integer, then stop
worrying about what it actually is).

If open() can ever return things other than files, it seems likely it
will do so only under conditions that make it pretty much safe to assume
that existing code will continue to operate "as expected" (note: not
"always with a file").

I'm not going to try to picture just how this might happen, but I could
imagine, for example, some kind of support for protocol prefixes (ala
"http:" or "ftp:"), or perhaps some sort of support for encrypted or
compressed data. Or maybe it would require a prior call to some
function to enable the magic that lets open() return non-files.

If any of that is reasonable, then using open() is actually the better
approach to ensuring your code "does the right thing" in the future, and
"file" should still be used in the rare case where you actually want to
test whether something is a particular type of thing.

-Peter
Jul 21 '05 #5

P: n/a
On Sunday 03 July 2005 08:28 pm, Peter Hansen wrote:
If open() can ever return things other than files, it seems likely it
will do so only under conditions that make it pretty much safe to assume
that existing code will continue to operate "as expected" (note: not
"always with a file").


WHEN it returns things other than files. Like a StringIO object,
which can be quite handy. True, it won't be a "big file", but it'd
be nice if the same code would tolerate it. I've used this with
e.g. PIL quite a bit when working with Zope, because it isn't
really desireable to have to write the file out to disk and read
it back when you've already got it in memory.

Quack! ;-)
Terry
--
Terry Hancock ( hancock at anansispaceworks.com )
Anansi Spaceworks http://www.anansispaceworks.com

Jul 21 '05 #6

P: n/a
Jp Calderone wrote:
fileIter = iter(big_file)
for line in fileIter:
line_after = fileIter.next()

Don't mix iterating with any other file methods, since it will confuse the buffering scheme.


Isn't a file an iterable already?

[GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
foo = open('sample.txt')
bar = iter(foo)
bar is foo

True

Jul 21 '05 #7

P: n/a
sorry lost the first line in pasting:
Python 2.4.1 (#1, Jun 21 2005, 12:38:55)
:/

Jul 21 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.