473,385 Members | 1,342 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

use fileinput to read a specific line

hi everybody
im a newbie in python
i need to read line 4 from a header file
using linecache will crash my computer due to memory loading, because
i am working on 2000 files each is 8mb

fileinput don't load the file into memory first
how do i use fileinput module to read a specific line from a file?

for line in fileinput.Fileinput('sample.txt')
????

Jan 8 '08 #1
11 11337
On Jan 7, 7:15 pm, jo3c <JO3chi...@gmail.comwrote:
hi everybody
im a newbie in python
i need to read line 4 from a header file
using linecache will crash my computer due to memory loading, because
i am working on 2000 files each is 8mb

fileinput don't load the file into memory first
how do i use fileinput module to read a specific line from a file?

for line in fileinput.Fileinput('sample.txt')
????
Assuming it's a text file, you could use something like this:

lnum = 0 # line number

for line in file("sample.txt"):
lnum += 1
if lnum >= 4: break

The variable "line" should end up with the contents of line 4 if I am
not mistaken. To handle multiple files, just wrap that code like this:

for file0 in files:

lnum = 0 # line number

for line in file(file0):
lnum += 1
if lnum >= 4: break

# do something with "line"

where "files" is a list of the files to be read.

That's not tested.
Jan 8 '08 #2
Given that the OP is talking 2000 files to be processed, I think I'd
recommend explicit open() and close() calls to avoid having lots of I/O
structures floating around...
Good point. I didn't think of that. It could also be done as follows:

for fileN in files:

lnum = 0 # line number
input = file(fileN)

for line in input:
lnum += 1
if lnum >= 4: break

input.close()

# do something with "line"

Six of one or half a dozen of the other, I suppose.
Jan 8 '08 #3
On Jan 7, 9:41 pm, Dennis Lee Bieber <wlfr...@ix.netcom.comwrote:
On Mon, 7 Jan 2008 20:10:58 -0800 (PST), "Russ P."
<Russ.Paie...@gmail.comdeclaimed the following in comp.lang.python:
for file0 in files:
lnum = 0 # line number
for line in file(file0):
lnum += 1
if lnum >= 4: break
# do something with "line"
where "files" is a list of the files to be read.

Given that the OP is talking 2000 files to be processed, I think I'd
recommend explicit open() and close() calls to avoid having lots of I/O
structures floating around...

for fid in file_list:
fin = open(fid)
jnk = fin.readline()
jnk = fin.readline()
jnk = fin.readline()
ln = fin.readline()
fin.close()

Yes, coding three junk reads does mean maintenance will be a pain
(we now need the 5th line, not the fourth -- and would need to add
another jnk = line)... I'd maybe consider replacing all four readline()
with:

for cnt in xrange(4):
ln = fin.readline()

since it doesn't need the overhead of a separate line counter/test and
will leave the fourth input line in "ln" on exit.
--
Wulfraed Dennis Lee Bieber KD6MOG
wlfr...@ix.netcom.com wulfr...@bestiaria.com
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: web-a...@bestiaria.com)
HTTP://www.bestiaria.com/
One second thought, I wonder if the reference counting mechanism would
be "smart" enough to automatically close the previous file on each
iteration of the outer loop. If so, the files don't need to be
explicitly closed.
Jan 8 '08 #4
On Jan 8, 2:08 pm, "Russ P." <Russ.Paie...@gmail.comwrote:
Given that the OP is talking 2000 files to be processed, I think I'd
recommend explicit open() and close() calls to avoid having lots of I/O
structures floating around...

Good point. I didn't think of that. It could also be done as follows:

for fileN in files:

lnum = 0 # line number
input = file(fileN)

for line in input:
lnum += 1
if lnum >= 4: break

input.close()

# do something with "line"

Six of one or half a dozen of the other, I suppose.
this is what i did using glob

import glob
for files in glob.glob('/*.txt'):
x = open(files)
x.readline()
x.readline()
x.readline()
y = x.readline()
# do something with y
x.close()

Jan 8 '08 #5
jo3c wrote:
i need to read line 4 from a header file
http://docs.python.org/lib/module-linecache.html

~/2delete $ cat data.txt
L1
L2
L3
L4

~/2delete $ python
Python 2.5.1 (r251:54863, May 2 2007, 16:56:35)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>import linecache
linecache.getline("data.txt", 2)
'L2\n'
>>linecache.getline("data.txt", 5)
''
>>linecache.getline("data.txt", 1)
'L1\n'
>>>

--
http://noneisyours.marcher.name
http://feeds.feedburner.com/NoneIsYours

You are not free to read this message,
by doing so, you have violated my licence
and are required to urinate publicly. Thank you.

Jan 8 '08 #6
Martin Marcher wrote:
>i need to read line 4 from a header file

http://docs.python.org/lib/module-linecache.html
I guess you missed the "using linecache will crash my computer due to
memory loading, because i am working on 2000 files each is 8mb" part.

</F>

Jan 8 '08 #7
jo3c wrote:
hi everybody
im a newbie in python
i need to read line 4 from a header file
using linecache will crash my computer due to memory loading, because
i am working on 2000 files each is 8mb

fileinput don't load the file into memory first
how do i use fileinput module to read a specific line from a file?

for line in fileinput.Fileinput('sample.txt')
????
I could have sworn that I posted working code (including an explanation
why linecache wouldn't work) the last time you asked about this... yes,
here it is again:
i have a 2000 files with header and data
i need to get the date information from the header
then insert it into my database
i am doing it in batch so i use glob.glob('/mydata/*/*/*.txt')
to get the date on line 4 in the txt file i use
linecache.getline('/mydata/myfile.txt/, 4)

but if i use
linecache.getline('glob.glob('/mydata/*/*/*.txt', 4) won't work
glob.glob returns a list of filenames, so you need to call getline once
for each file in the list.

but using linecache is absolutely the wrong tool for this; it's designed
for *repeated* access to arbitrary lines in a file, so it keeps all the
data in memory. that is, all the lines, for all 2000 files.

if the files are small, and you want to keep the code short, it's easier
to just grab the file's content and using indexing on the resulting list:

for filename in glob.glob('/mydata/*/*/*.txt'):
line = list(open(filename))[4-1]
... do something with line ...

(note that line numbers usually start with 1, but Python's list indexing
starts at 0).

if the files might be large, use something like this instead:

for filename in glob.glob('/mydata/*/*/*.txt'):
f = open(filename)
# skip first three lines
f.readline(); f.readline(); f.readline()
# grab the line we want
line = f.readline()
... do something with line ...

</F>

Jan 8 '08 #8
Russ P. wrote:
On Jan 7, 9:41 pm, Dennis Lee Bieber <wlfr...@ix.netcom.comwrote:
> Given that the OP is talking 2000 files to be processed, I think I'd
recommend explicit open() and close() calls to avoid having lots of I/O
structures floating around...
[effectively]
>for fid in file_list:
fin = open(fid)
for cnt in xrange(4):
ln = fin.readline()
fin.close()
One second thought, I wonder if the reference counting mechanism would
be "smart" enough to automatically close the previous file on each
iteration of the outer loop. If so, the files don't need to be
explicitly closed.
I _hate_ relying on that, but context managers mean you don't have to.
There are good reasons to close as early as you can. For example,
readers of files from zip files will eventually either be slower or
not work until the other readers close.

Here is what I imagine you want (2.5 or better):

from __future__ import with_statement

def pairing(names, position):
for filename in names:
with open(filename) as f:
for n, line in enumerate(f):
if n == position:
break
else:
line = None # indicate a short file
yield filename, line
...
for name, line in pairing(glob.glob('*.txt'), 3):
do_something(name, line)

--Scott David Daniels
Sc***********@Acm.Org
Jan 8 '08 #9
Steven D'Aprano wrote:
Python guarantees[1] that files will be closed, but doesn't specify when
they will be closed. I understand that Jython doesn't automatically close
files until the program terminates, so even if you could rely on the ref
counter to close the files in CPython, it won't be safe to do so in
Jython.
From what I can tell, Java's GC automatically closes file streams, so
Jython will behave pretty much like CPython in most cases. I sure
haven't been able to make Jython run out by file handles by opening tons
of files and discarding the file objects without closing them. Has anyone?

</F>

Jan 8 '08 #10
Fredrik Lundh wrote:
Martin Marcher wrote:
>>i need to read line 4 from a header file

http://docs.python.org/lib/module-linecache.html

I guess you missed the "using linecache will crash my computer due to
memory loading, because i am working on 2000 files each is 8mb" part.
oops sorry indeed

still the enumerate version seems fine:
>>for no, line in enumerate(file("data.txt", "r")):
.... print no, line
....

someone posted this already i think (or was it another thread?)

--
http://noneisyours.marcher.name
http://feeds.feedburner.com/NoneIsYours

You are not free to read this message,
by doing so, you have violated my licence
and are required to urinate publicly. Thank you.

Jan 8 '08 #11
Fredrik Lundh <fr*****@pythonware.comwrites:
From what I can tell, Java's GC automatically closes file streams,
so Jython will behave pretty much like CPython in most cases.
The finalizer does close the reclaimed streams, but since it is
triggered by GC, you have to wait for GC to occur for the stream to
get closed. That means that something like:

open('foo', 'w').write(some_contents)

may leave 'foo' empty until the next GC. Fortunately this pattern is
much rarer than open('foo').read(), but both work equally well in
CPython, and will continue to work, despite many people's dislike for
them. (For the record, I don't use them in production code, but
open(...).read() is great for throwaway scripts and one-liners.)
I sure haven't been able to make Jython run out by file handles by
opening tons of files and discarding the file objects without
closing them.
Java's generational GC is supposed to be quick to reclaim recently
discarded objects. That might lead to swift finalization of open
files similar to what CPython's reference counting does in practice.

It could also be that Jython internally allocates so many Java objects
that the GC is triggered frequently, again resulting in swift
reclamation of file objects. It would be interesting to monitor (at
the OS level) the number of open files maintained by the process at
any given time during the execution of such a loop.
Jan 8 '08 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Daniel Yoo | last post by:
Hi everyone, I'm was wondering: would it be a good idea to have the FileInput class support a read() method? I found myself running into a small problem while using xml.sax.parse in combination...
3
by: Chris Connett | last post by:
I just started working with the fileinput module, and reading the bit about inplace modification, it seems very dangerous: From the doc: --- Optional in-place filtering: if the keyword argument...
5
by: the.theorist | last post by:
I was writing a small script the other day with the following CLI prog * I've used getopt to parse out the possible options, so we'll ignore that part, and assume for the rest of the discussion...
6
by: cyberco | last post by:
Using fileinput.input('test.txt') I probably forgot to process all lines or so, since I get the error 'input() already active' when i try to call fileinput.input('test.txt') again. But how can I...
0
by: cyberco | last post by:
Opening, reading and writing to a file works fine in mod_python, but using fileinput (with inplace editing) gives me a 'permission denied' with exactly the same fileName: ...
10
by: wo_shi_big_stomach | last post by:
Newbie to python writing a script to recurse a directory tree and delete the first line of a file if it contains a given string. I get the same error on a Mac running OS X 10.4.8 and FreeBSD 6.1. ...
0
by: Phoe6 | last post by:
Hi All, I am able to use urlib2 through proxy. I give proxy credentials and use # Set the Proxy Address proxy_ip = "10.0.1.1:80" proxy_user = 'senthil_or' proxy_password_orig='password'
4
by: Adam Funk | last post by:
I'm using this sort of standard thing: for line in fileinput.input(): do_stuff(line) and wondering whether it reads until it hits an EOF and then passes lines (one at a time) into the...
3
by: Robert | last post by:
I would like to count lines in a file using the fileinput module and I am getting an unusual output. ------------------------------------------------------------------------------...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.