By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
428,583 Members | 617 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 428,583 IT Pros & Developers. It's quick & easy.

use fileinput to read a specific line

P: n/a
hi everybody
im a newbie in python
i need to read line 4 from a header file
using linecache will crash my computer due to memory loading, because
i am working on 2000 files each is 8mb

fileinput don't load the file into memory first
how do i use fileinput module to read a specific line from a file?

for line in fileinput.Fileinput('sample.txt')
????

Jan 8 '08 #1
Share this Question
Share on Google+
11 Replies


P: n/a
On Jan 7, 7:15 pm, jo3c <JO3chi...@gmail.comwrote:
hi everybody
im a newbie in python
i need to read line 4 from a header file
using linecache will crash my computer due to memory loading, because
i am working on 2000 files each is 8mb

fileinput don't load the file into memory first
how do i use fileinput module to read a specific line from a file?

for line in fileinput.Fileinput('sample.txt')
????
Assuming it's a text file, you could use something like this:

lnum = 0 # line number

for line in file("sample.txt"):
lnum += 1
if lnum >= 4: break

The variable "line" should end up with the contents of line 4 if I am
not mistaken. To handle multiple files, just wrap that code like this:

for file0 in files:

lnum = 0 # line number

for line in file(file0):
lnum += 1
if lnum >= 4: break

# do something with "line"

where "files" is a list of the files to be read.

That's not tested.
Jan 8 '08 #2

P: n/a
Given that the OP is talking 2000 files to be processed, I think I'd
recommend explicit open() and close() calls to avoid having lots of I/O
structures floating around...
Good point. I didn't think of that. It could also be done as follows:

for fileN in files:

lnum = 0 # line number
input = file(fileN)

for line in input:
lnum += 1
if lnum >= 4: break

input.close()

# do something with "line"

Six of one or half a dozen of the other, I suppose.
Jan 8 '08 #3

P: n/a
On Jan 7, 9:41 pm, Dennis Lee Bieber <wlfr...@ix.netcom.comwrote:
On Mon, 7 Jan 2008 20:10:58 -0800 (PST), "Russ P."
<Russ.Paie...@gmail.comdeclaimed the following in comp.lang.python:
for file0 in files:
lnum = 0 # line number
for line in file(file0):
lnum += 1
if lnum >= 4: break
# do something with "line"
where "files" is a list of the files to be read.

Given that the OP is talking 2000 files to be processed, I think I'd
recommend explicit open() and close() calls to avoid having lots of I/O
structures floating around...

for fid in file_list:
fin = open(fid)
jnk = fin.readline()
jnk = fin.readline()
jnk = fin.readline()
ln = fin.readline()
fin.close()

Yes, coding three junk reads does mean maintenance will be a pain
(we now need the 5th line, not the fourth -- and would need to add
another jnk = line)... I'd maybe consider replacing all four readline()
with:

for cnt in xrange(4):
ln = fin.readline()

since it doesn't need the overhead of a separate line counter/test and
will leave the fourth input line in "ln" on exit.
--
Wulfraed Dennis Lee Bieber KD6MOG
wlfr...@ix.netcom.com wulfr...@bestiaria.com
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: web-a...@bestiaria.com)
HTTP://www.bestiaria.com/
One second thought, I wonder if the reference counting mechanism would
be "smart" enough to automatically close the previous file on each
iteration of the outer loop. If so, the files don't need to be
explicitly closed.
Jan 8 '08 #4

P: n/a
On Jan 8, 2:08 pm, "Russ P." <Russ.Paie...@gmail.comwrote:
Given that the OP is talking 2000 files to be processed, I think I'd
recommend explicit open() and close() calls to avoid having lots of I/O
structures floating around...

Good point. I didn't think of that. It could also be done as follows:

for fileN in files:

lnum = 0 # line number
input = file(fileN)

for line in input:
lnum += 1
if lnum >= 4: break

input.close()

# do something with "line"

Six of one or half a dozen of the other, I suppose.
this is what i did using glob

import glob
for files in glob.glob('/*.txt'):
x = open(files)
x.readline()
x.readline()
x.readline()
y = x.readline()
# do something with y
x.close()

Jan 8 '08 #5

P: n/a
jo3c wrote:
i need to read line 4 from a header file
http://docs.python.org/lib/module-linecache.html

~/2delete $ cat data.txt
L1
L2
L3
L4

~/2delete $ python
Python 2.5.1 (r251:54863, May 2 2007, 16:56:35)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>import linecache
linecache.getline("data.txt", 2)
'L2\n'
>>linecache.getline("data.txt", 5)
''
>>linecache.getline("data.txt", 1)
'L1\n'
>>>

--
http://noneisyours.marcher.name
http://feeds.feedburner.com/NoneIsYours

You are not free to read this message,
by doing so, you have violated my licence
and are required to urinate publicly. Thank you.

Jan 8 '08 #6

P: n/a
Martin Marcher wrote:
>i need to read line 4 from a header file

http://docs.python.org/lib/module-linecache.html
I guess you missed the "using linecache will crash my computer due to
memory loading, because i am working on 2000 files each is 8mb" part.

</F>

Jan 8 '08 #7

P: n/a
jo3c wrote:
hi everybody
im a newbie in python
i need to read line 4 from a header file
using linecache will crash my computer due to memory loading, because
i am working on 2000 files each is 8mb

fileinput don't load the file into memory first
how do i use fileinput module to read a specific line from a file?

for line in fileinput.Fileinput('sample.txt')
????
I could have sworn that I posted working code (including an explanation
why linecache wouldn't work) the last time you asked about this... yes,
here it is again:
i have a 2000 files with header and data
i need to get the date information from the header
then insert it into my database
i am doing it in batch so i use glob.glob('/mydata/*/*/*.txt')
to get the date on line 4 in the txt file i use
linecache.getline('/mydata/myfile.txt/, 4)

but if i use
linecache.getline('glob.glob('/mydata/*/*/*.txt', 4) won't work
glob.glob returns a list of filenames, so you need to call getline once
for each file in the list.

but using linecache is absolutely the wrong tool for this; it's designed
for *repeated* access to arbitrary lines in a file, so it keeps all the
data in memory. that is, all the lines, for all 2000 files.

if the files are small, and you want to keep the code short, it's easier
to just grab the file's content and using indexing on the resulting list:

for filename in glob.glob('/mydata/*/*/*.txt'):
line = list(open(filename))[4-1]
... do something with line ...

(note that line numbers usually start with 1, but Python's list indexing
starts at 0).

if the files might be large, use something like this instead:

for filename in glob.glob('/mydata/*/*/*.txt'):
f = open(filename)
# skip first three lines
f.readline(); f.readline(); f.readline()
# grab the line we want
line = f.readline()
... do something with line ...

</F>

Jan 8 '08 #8

P: n/a
Russ P. wrote:
On Jan 7, 9:41 pm, Dennis Lee Bieber <wlfr...@ix.netcom.comwrote:
> Given that the OP is talking 2000 files to be processed, I think I'd
recommend explicit open() and close() calls to avoid having lots of I/O
structures floating around...
[effectively]
>for fid in file_list:
fin = open(fid)
for cnt in xrange(4):
ln = fin.readline()
fin.close()
One second thought, I wonder if the reference counting mechanism would
be "smart" enough to automatically close the previous file on each
iteration of the outer loop. If so, the files don't need to be
explicitly closed.
I _hate_ relying on that, but context managers mean you don't have to.
There are good reasons to close as early as you can. For example,
readers of files from zip files will eventually either be slower or
not work until the other readers close.

Here is what I imagine you want (2.5 or better):

from __future__ import with_statement

def pairing(names, position):
for filename in names:
with open(filename) as f:
for n, line in enumerate(f):
if n == position:
break
else:
line = None # indicate a short file
yield filename, line
...
for name, line in pairing(glob.glob('*.txt'), 3):
do_something(name, line)

--Scott David Daniels
Sc***********@Acm.Org
Jan 8 '08 #9

P: n/a
Steven D'Aprano wrote:
Python guarantees[1] that files will be closed, but doesn't specify when
they will be closed. I understand that Jython doesn't automatically close
files until the program terminates, so even if you could rely on the ref
counter to close the files in CPython, it won't be safe to do so in
Jython.
From what I can tell, Java's GC automatically closes file streams, so
Jython will behave pretty much like CPython in most cases. I sure
haven't been able to make Jython run out by file handles by opening tons
of files and discarding the file objects without closing them. Has anyone?

</F>

Jan 8 '08 #10

P: n/a
Fredrik Lundh wrote:
Martin Marcher wrote:
>>i need to read line 4 from a header file

http://docs.python.org/lib/module-linecache.html

I guess you missed the "using linecache will crash my computer due to
memory loading, because i am working on 2000 files each is 8mb" part.
oops sorry indeed

still the enumerate version seems fine:
>>for no, line in enumerate(file("data.txt", "r")):
.... print no, line
....

someone posted this already i think (or was it another thread?)

--
http://noneisyours.marcher.name
http://feeds.feedburner.com/NoneIsYours

You are not free to read this message,
by doing so, you have violated my licence
and are required to urinate publicly. Thank you.

Jan 8 '08 #11

P: n/a
Fredrik Lundh <fr*****@pythonware.comwrites:
From what I can tell, Java's GC automatically closes file streams,
so Jython will behave pretty much like CPython in most cases.
The finalizer does close the reclaimed streams, but since it is
triggered by GC, you have to wait for GC to occur for the stream to
get closed. That means that something like:

open('foo', 'w').write(some_contents)

may leave 'foo' empty until the next GC. Fortunately this pattern is
much rarer than open('foo').read(), but both work equally well in
CPython, and will continue to work, despite many people's dislike for
them. (For the record, I don't use them in production code, but
open(...).read() is great for throwaway scripts and one-liners.)
I sure haven't been able to make Jython run out by file handles by
opening tons of files and discarding the file objects without
closing them.
Java's generational GC is supposed to be quick to reclaim recently
discarded objects. That might lead to swift finalization of open
files similar to what CPython's reference counting does in practice.

It could also be that Jython internally allocates so many Java objects
that the GC is triggered frequently, again resulting in swift
reclamation of file objects. It would be interesting to monitor (at
the OS level) the number of open files maintained by the process at
any given time during the execution of such a loop.
Jan 8 '08 #12

This discussion thread is closed

Replies have been disabled for this discussion.