looping over a big file

Hi,

I've a couple of questions regarding the processing of a big text file
(16MB).

1) how does python handle:

for line in big_file:
is big_file all read into memory or one line is read at a time or a buffer
is used or ...?

2) is it possible to advance lines within the loop? The following doesn't
work:
for line in big_file:

line_after = big_file.readline()

the function readline (file pointer) is "out of sync" with the loop (and
this suggests bug_file is not read one line at a time in the loop).

Thanks,
Fernando Martins

Jul 21 '05 #1

Subscribe Post Reply

1160

Roy Smith

"martian" <no****@hetnet.nl> wrote:

1) how does python handle:
for line in big_file:
is big_file all read into memory or one line is read at a time or a buffer
is used or ...?

The "right" way to do this is:

for line in file ("filename"):
whatever

The file object returned by file() acts as an iterator. Each time through
the loop, another line is read and returned (I'm sure there is some
block-level buffering going on at a low level).
2) is it possible to advance lines within the loop? The following doesn't
work:
for line in big_file:

line_after = big_file.readline()

You probably want something like:

for line in file ("filename"):
if skipThisLine:
continue

Jul 21 '05 #2

Mike Meyer

Roy Smith <ro*@panix.com> writes:

The "right" way to do this is:

for line in file ("filename"):
whatever

The file object returned by file() acts as an iterator. Each time through
the loop, another line is read and returned (I'm sure there is some
block-level buffering going on at a low level).

I disagree. That's the *convenient* way to do it, and perfectly
acceptable in many situations. But not all Python interpreters will
close the file when for loop ends. Likewise, if you get an exception
during the processing, the file may not get closed properly. Those
things may matter to you, in which case the "right" way is:

data = open("filename")
try:
for line in data:
whatever
finally:
data.close()

Guido has made a pronouncement on open vs. file. I think he prefers
open for opening files, and file for type testing, but may well be
wrong. I don't think it's critical.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Jul 21 '05 #3

Michael Hoffman

Mike Meyer wrote:

Roy Smith <ro*@panix.com> writes:
The "right" way to do this is:

for line in file ("filename"):
whatever

The file object returned by file() acts as an iterator. Each time through
the loop, another line is read and returned (I'm sure there is some
block-level buffering going on at a low level).

I disagree. That's the *convenient* way to do it, and perfectly
acceptable in many situations. But not all Python interpreters will
close the file when for loop ends. Likewise, if you get an exception
during the processing, the file may not get closed properly. Those
things may matter to you, in which case the "right" way is:

data = open("filename")
try:
for line in data:
whatever
finally:
data.close()

Guido has made a pronouncement on open vs. file. I think he prefers
open for opening files, and file for type testing, but may well be
wrong. I don't think it's critical.

He has said that open() may be used for things other than files in the
future. So if you want to be sure you're opening a file, use file().

<wink>
--
Michael Hoffman

Jul 21 '05 #4

Peter Hansen

Michael Hoffman wrote:

Mike Meyer wrote:
Guido has made a pronouncement on open vs. file. I think he prefers
open for opening files, and file for type testing, but may well be
wrong. I don't think it's critical.

He has said that open() may be used for things other than files in the
future. So if you want to be sure you're opening a file, use file().

Probably this is the same sort of things as "if you want to be sure your
function is working with an integer, you have to test whether it is an
integer" (or use a statically typed language).

Which is advice that is generally rebutted around here with comments
about "duck typing" (as in, if it acts like an integer, then stop
worrying about what it actually is).

If open() can ever return things other than files, it seems likely it
will do so only under conditions that make it pretty much safe to assume
that existing code will continue to operate "as expected" (note: not
"always with a file").

I'm not going to try to picture just how this might happen, but I could
imagine, for example, some kind of support for protocol prefixes (ala
"http:" or "ftp:"), or perhaps some sort of support for encrypted or
compressed data. Or maybe it would require a prior call to some
function to enable the magic that lets open() return non-files.

If any of that is reasonable, then using open() is actually the better
approach to ensuring your code "does the right thing" in the future, and
"file" should still be used in the rare case where you actually want to
test whether something is a particular type of thing.

-Peter

Jul 21 '05 #5

Terry Hancock

On Sunday 03 July 2005 08:28 pm, Peter Hansen wrote:

If open() can ever return things other than files, it seems likely it
will do so only under conditions that make it pretty much safe to assume
that existing code will continue to operate "as expected" (note: not
"always with a file").

WHEN it returns things other than files. Like a StringIO object,
which can be quite handy. True, it won't be a "big file", but it'd
be nice if the same code would tolerate it. I've used this with
e.g. PIL quite a bit when working with Zope, because it isn't
really desireable to have to write the file out to disk and read
it back when you've already got it in memory.

Quack! ;-)
Terry
--
Terry Hancock ( hancock at anansispaceworks.com )
Anansi Spaceworks http://www.anansispaceworks.com

Jul 21 '05 #6

Asun Friere

Jp Calderone wrote:

fileIter = iter(big_file)
for line in fileIter:
line_after = fileIter.next()

Don't mix iterating with any other file methods, since it will confuse the buffering scheme.

Isn't a file an iterable already?

[GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

foo = open('sample.txt')
bar = iter(foo)
bar is foo

True

Jul 21 '05 #7

Asun Friere

sorry lost the first line in pasting:
Python 2.4.1 (#1, Jun 21 2005, 12:38:55)
:/

Jul 21 '05 #8

Similar topics

looping eating 100 per cpu

by: kaptain kernel | last post by:

i've got a while loop thats iterating through a text file and pumping the contents into a database. the file is quite large (over 150mb). the looping causes my CPU load to race up to 100 per...

PHP

looping an audio file

by: Ivo | last post by:

Hi, I have an audio file (.mid or .wav or .mp3) in an object element: <object id="snd" classid="CLSID:22D6F312-B0F6-11D0-94AB-0080C74C7E95"...

Javascript

Looping through nested XML

by: Cappy | last post by:

I am writing an XML menu structure. I have the following XML file <MenuItems> <Item> <Name>Homepa5ge</Name> <URL>/index.aspx</URL> <Alt>Return to homepage</Alt>...

.NET Framework

Looping Issue with DoCmd.OutputTo

by: Ryan | last post by:

Hello. I was hoping that someone may be able to assist with an issue that I am experiencing. I have created an Access DB which imports an Excel File with a particular layout and field naming. ...

Microsoft Access / VBA

Left$ Function, Looping through

by: Dököll | last post by:

Hiya, Partners! I have been into it for 12 hours straight this week-end, my son is very unhappy. Looks like I am getting pretty close but need your help, Again. I will post my first...

Visual Basic 4 / 5 / 6

Flash sound file looping problems

by: hayz | last post by:

Flash sound file looping problems hello there I'm definitely a newb so please bare some patience. I have a flash sound file on the index page of a site i'm working on. First off i need the...

Flash / Actionscript

Looping through File Question

by: planetmatt | last post by:

I am a Python beginner. I am trying to loop through a CSV file which I can do. What I want to change though is for the loop to start at row 2 in the file thus excluding column headers. At...

Python

FOREACH LOOP container and looping through record set ...

by: jags_32 | last post by:

We have a pretty simple data flow that fetches data from our Source ERP system and dumps it into a SQL Server table. This functionality works, what we are trying to do now is to extend this...

Microsoft SQL Server

Set conditional formatting in excel workbook, looping through all the sheets

by: afromanam | last post by:

Regards, Please help What I'm trying to do is this: (and I can't use reports since I must export to Excel) I export some queries to different tabs in an excel workbook I then loop through...

Microsoft Access / VBA

Need to terminate the looping after reading the file contents!!!!!

by: vijayarl | last post by:

Hi Everyone, i have the written this logic : basically a file operation open (CONFIGFILE, "$config_file") or die; while (<CONFIGFILE>) { chomp;

Perl

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA