473,403 Members | 2,270 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,403 software developers and data experts.

looping over a big file

Hi,

I've a couple of questions regarding the processing of a big text file
(16MB).

1) how does python handle:
for line in big_file:
is big_file all read into memory or one line is read at a time or a buffer
is used or ...?

2) is it possible to advance lines within the loop? The following doesn't
work:
for line in big_file:

line_after = big_file.readline()

the function readline (file pointer) is "out of sync" with the loop (and
this suggests bug_file is not read one line at a time in the loop).

Thanks,
Fernando Martins
Jul 21 '05 #1
7 1160
"martian" <no****@hetnet.nl> wrote:
1) how does python handle:
for line in big_file:
is big_file all read into memory or one line is read at a time or a buffer
is used or ...?


The "right" way to do this is:

for line in file ("filename"):
whatever

The file object returned by file() acts as an iterator. Each time through
the loop, another line is read and returned (I'm sure there is some
block-level buffering going on at a low level).
2) is it possible to advance lines within the loop? The following doesn't
work:
for line in big_file:

line_after = big_file.readline()


You probably want something like:

for line in file ("filename"):
if skipThisLine:
continue
Jul 21 '05 #2
Roy Smith <ro*@panix.com> writes:
The "right" way to do this is:

for line in file ("filename"):
whatever

The file object returned by file() acts as an iterator. Each time through
the loop, another line is read and returned (I'm sure there is some
block-level buffering going on at a low level).


I disagree. That's the *convenient* way to do it, and perfectly
acceptable in many situations. But not all Python interpreters will
close the file when for loop ends. Likewise, if you get an exception
during the processing, the file may not get closed properly. Those
things may matter to you, in which case the "right" way is:

data = open("filename")
try:
for line in data:
whatever
finally:
data.close()

Guido has made a pronouncement on open vs. file. I think he prefers
open for opening files, and file for type testing, but may well be
wrong. I don't think it's critical.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Jul 21 '05 #3
Mike Meyer wrote:
Roy Smith <ro*@panix.com> writes:
The "right" way to do this is:

for line in file ("filename"):
whatever

The file object returned by file() acts as an iterator. Each time through
the loop, another line is read and returned (I'm sure there is some
block-level buffering going on at a low level).

I disagree. That's the *convenient* way to do it, and perfectly
acceptable in many situations. But not all Python interpreters will
close the file when for loop ends. Likewise, if you get an exception
during the processing, the file may not get closed properly. Those
things may matter to you, in which case the "right" way is:

data = open("filename")
try:
for line in data:
whatever
finally:
data.close()

Guido has made a pronouncement on open vs. file. I think he prefers
open for opening files, and file for type testing, but may well be
wrong. I don't think it's critical.


He has said that open() may be used for things other than files in the
future. So if you want to be sure you're opening a file, use file().

<wink>
--
Michael Hoffman
Jul 21 '05 #4
Michael Hoffman wrote:
Mike Meyer wrote:
Guido has made a pronouncement on open vs. file. I think he prefers
open for opening files, and file for type testing, but may well be
wrong. I don't think it's critical.


He has said that open() may be used for things other than files in the
future. So if you want to be sure you're opening a file, use file().


Probably this is the same sort of things as "if you want to be sure your
function is working with an integer, you have to test whether it is an
integer" (or use a statically typed language).

Which is advice that is generally rebutted around here with comments
about "duck typing" (as in, if it acts like an integer, then stop
worrying about what it actually is).

If open() can ever return things other than files, it seems likely it
will do so only under conditions that make it pretty much safe to assume
that existing code will continue to operate "as expected" (note: not
"always with a file").

I'm not going to try to picture just how this might happen, but I could
imagine, for example, some kind of support for protocol prefixes (ala
"http:" or "ftp:"), or perhaps some sort of support for encrypted or
compressed data. Or maybe it would require a prior call to some
function to enable the magic that lets open() return non-files.

If any of that is reasonable, then using open() is actually the better
approach to ensuring your code "does the right thing" in the future, and
"file" should still be used in the rare case where you actually want to
test whether something is a particular type of thing.

-Peter
Jul 21 '05 #5
On Sunday 03 July 2005 08:28 pm, Peter Hansen wrote:
If open() can ever return things other than files, it seems likely it
will do so only under conditions that make it pretty much safe to assume
that existing code will continue to operate "as expected" (note: not
"always with a file").


WHEN it returns things other than files. Like a StringIO object,
which can be quite handy. True, it won't be a "big file", but it'd
be nice if the same code would tolerate it. I've used this with
e.g. PIL quite a bit when working with Zope, because it isn't
really desireable to have to write the file out to disk and read
it back when you've already got it in memory.

Quack! ;-)
Terry
--
Terry Hancock ( hancock at anansispaceworks.com )
Anansi Spaceworks http://www.anansispaceworks.com

Jul 21 '05 #6
Jp Calderone wrote:
fileIter = iter(big_file)
for line in fileIter:
line_after = fileIter.next()

Don't mix iterating with any other file methods, since it will confuse the buffering scheme.


Isn't a file an iterable already?

[GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
foo = open('sample.txt')
bar = iter(foo)
bar is foo

True

Jul 21 '05 #7
sorry lost the first line in pasting:
Python 2.4.1 (#1, Jun 21 2005, 12:38:55)
:/

Jul 21 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: kaptain kernel | last post by:
i've got a while loop thats iterating through a text file and pumping the contents into a database. the file is quite large (over 150mb). the looping causes my CPU load to race up to 100 per...
2
by: Ivo | last post by:
Hi, I have an audio file (.mid or .wav or .mp3) in an object element: <object id="snd" classid="CLSID:22D6F312-B0F6-11D0-94AB-0080C74C7E95"...
2
by: Cappy | last post by:
I am writing an XML menu structure. I have the following XML file <MenuItems> <Item> <Name>Homepa5ge</Name> <URL>/index.aspx</URL> <Alt>Return to homepage</Alt>...
1
by: Ryan | last post by:
Hello. I was hoping that someone may be able to assist with an issue that I am experiencing. I have created an Access DB which imports an Excel File with a particular layout and field naming. ...
22
Dököll
by: Dököll | last post by:
Hiya, Partners! I have been into it for 12 hours straight this week-end, my son is very unhappy. Looks like I am getting pretty close but need your help, Again. I will post my first...
2
by: hayz | last post by:
Flash sound file looping problems hello there I'm definitely a newb so please bare some patience. I have a flash sound file on the index page of a site i'm working on. First off i need the...
4
by: planetmatt | last post by:
I am a Python beginner. I am trying to loop through a CSV file which I can do. What I want to change though is for the loop to start at row 2 in the file thus excluding column headers. At...
0
by: jags_32 | last post by:
We have a pretty simple data flow that fetches data from our Source ERP system and dumps it into a SQL Server table. This functionality works, what we are trying to do now is to extend this...
10
by: afromanam | last post by:
Regards, Please help What I'm trying to do is this: (and I can't use reports since I must export to Excel) I export some queries to different tabs in an excel workbook I then loop through...
1
by: vijayarl | last post by:
Hi Everyone, i have the written this logic : basically a file operation open (CONFIGFILE, "$config_file") or die; while (<CONFIGFILE>) { chomp;
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.