By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,050 Members | 1,020 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,050 IT Pros & Developers. It's quick & easy.

Re: "Faster" I/O in a script

P: n/a
On Jun 2, 2:08*am, "kalakouentin" <kalakouen...@yahoo.comwrote:
Do you know a way to actually load my data in a more
"batch-like" way so I will avoid the constant line by line reading?
If your files will fit in memory, you can just do

text = file.readlines()

and Python will read the entire file into a list of strings named
'text,' where each item in the list corresponds to one 'line' of the
file.
Jun 27 '08 #1
Share this Question
Share on Google+
2 Replies


P: n/a
mi***********@gmail.com wrote:
On Jun 2, 2:08 am, "kalakouentin" <kalakouen...@yahoo.comwrote:

> Do you know a way to actually load my data in a more
"batch-like" way so I will avoid the constant line by line reading?

If your files will fit in memory, you can just do

text = file.readlines()

and Python will read the entire file into a list of strings named
'text,' where each item in the list corresponds to one 'line' of the
file.
No that won't help. That has to do *all* the same work (reading blocks
and finding line endings) as the iterator PLUS allocate and build a list.

Better to just use the iterator.

for line in file:
...

Gary Herron
--
http://mail.python.org/mailman/listinfo/python-list
Jun 27 '08 #2

P: n/a
Gary Herron wrote:
mi***********@gmail.com wrote:
>On Jun 2, 2:08 am, "kalakouentin" <kalakouen...@yahoo.comwrote:

>> Do you know a way to actually load my data in a more
"batch-like" way so I will avoid the constant line by line reading?

If your files will fit in memory, you can just do

text = file.readlines()

and Python will read the entire file into a list of strings named
'text,' where each item in the list corresponds to one 'line' of the
file.

No that won't help. That has to do *all* the same work (reading blocks
and finding line endings) as the iterator PLUS allocate and build a list.
Better to just use the iterator.

for line in file:
...
Actually this *can* be much slower. Suppose I want to search a file to
see if a substring is present.

st = "some substring that is not actually in the file"
f = <50 MB log file>

Method 1:

for i in file(f):
if st in i:
break

--0.472416 seconds

Method 2:

Read whole file:

fh = file(f)
rl = fh.read()
fh.close()

--0.098834 seconds

"st in rl" test --0.037251 (total: .136 seconds)

Method 3:

mmap the file:

mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ)
"st in mm" test --3.589938 (<-- see my post the other day)

mm.find(st) --0.186895

Summary:

If you can afford the memory, it can be more efficient (more than 3
times faster in this example) to read the file into memory and process
it at once (if possible).

Mmapping the file and processing it at once is roughly as fast (I didnt
measure the difference carefully), but has the advantage that if there
are parts of the file you do not touch you don't fault them into memory.
You could also play more games and mmap chunks at a time to limit the
memory use (but you'd have to be careful with mmapping that doesn't
match record boundaries).

Kris
Jun 27 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.