473,624 Members | 2,302 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

use fileinput to read a specific line

hi everybody
im a newbie in python
i need to read line 4 from a header file
using linecache will crash my computer due to memory loading, because
i am working on 2000 files each is 8mb

fileinput don't load the file into memory first
how do i use fileinput module to read a specific line from a file?

for line in fileinput.Filei nput('sample.tx t')
????

Jan 8 '08 #1
11 11368
On Jan 7, 7:15 pm, jo3c <JO3chi...@gmai l.comwrote:
hi everybody
im a newbie in python
i need to read line 4 from a header file
using linecache will crash my computer due to memory loading, because
i am working on 2000 files each is 8mb

fileinput don't load the file into memory first
how do i use fileinput module to read a specific line from a file?

for line in fileinput.Filei nput('sample.tx t')
????
Assuming it's a text file, you could use something like this:

lnum = 0 # line number

for line in file("sample.tx t"):
lnum += 1
if lnum >= 4: break

The variable "line" should end up with the contents of line 4 if I am
not mistaken. To handle multiple files, just wrap that code like this:

for file0 in files:

lnum = 0 # line number

for line in file(file0):
lnum += 1
if lnum >= 4: break

# do something with "line"

where "files" is a list of the files to be read.

That's not tested.
Jan 8 '08 #2
Given that the OP is talking 2000 files to be processed, I think I'd
recommend explicit open() and close() calls to avoid having lots of I/O
structures floating around...
Good point. I didn't think of that. It could also be done as follows:

for fileN in files:

lnum = 0 # line number
input = file(fileN)

for line in input:
lnum += 1
if lnum >= 4: break

input.close()

# do something with "line"

Six of one or half a dozen of the other, I suppose.
Jan 8 '08 #3
On Jan 7, 9:41 pm, Dennis Lee Bieber <wlfr...@ix.net com.comwrote:
On Mon, 7 Jan 2008 20:10:58 -0800 (PST), "Russ P."
<Russ.Paie...@g mail.comdeclaim ed the following in comp.lang.pytho n:
for file0 in files:
lnum = 0 # line number
for line in file(file0):
lnum += 1
if lnum >= 4: break
# do something with "line"
where "files" is a list of the files to be read.

Given that the OP is talking 2000 files to be processed, I think I'd
recommend explicit open() and close() calls to avoid having lots of I/O
structures floating around...

for fid in file_list:
fin = open(fid)
jnk = fin.readline()
jnk = fin.readline()
jnk = fin.readline()
ln = fin.readline()
fin.close()

Yes, coding three junk reads does mean maintenance will be a pain
(we now need the 5th line, not the fourth -- and would need to add
another jnk = line)... I'd maybe consider replacing all four readline()
with:

for cnt in xrange(4):
ln = fin.readline()

since it doesn't need the overhead of a separate line counter/test and
will leave the fourth input line in "ln" on exit.
--
Wulfraed Dennis Lee Bieber KD6MOG
wlfr...@ix.netc om.com wulfr...@bestia ria.com
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: web-a...@bestiaria. com)
HTTP://www.bestiaria.com/
One second thought, I wonder if the reference counting mechanism would
be "smart" enough to automatically close the previous file on each
iteration of the outer loop. If so, the files don't need to be
explicitly closed.
Jan 8 '08 #4
On Jan 8, 2:08 pm, "Russ P." <Russ.Paie...@g mail.comwrote:
Given that the OP is talking 2000 files to be processed, I think I'd
recommend explicit open() and close() calls to avoid having lots of I/O
structures floating around...

Good point. I didn't think of that. It could also be done as follows:

for fileN in files:

lnum = 0 # line number
input = file(fileN)

for line in input:
lnum += 1
if lnum >= 4: break

input.close()

# do something with "line"

Six of one or half a dozen of the other, I suppose.
this is what i did using glob

import glob
for files in glob.glob('/*.txt'):
x = open(files)
x.readline()
x.readline()
x.readline()
y = x.readline()
# do something with y
x.close()

Jan 8 '08 #5
jo3c wrote:
i need to read line 4 from a header file
http://docs.python.org/lib/module-linecache.html

~/2delete $ cat data.txt
L1
L2
L3
L4

~/2delete $ python
Python 2.5.1 (r251:54863, May 2 2007, 16:56:35)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright" , "credits" or "license" for more information.
>>import linecache
linecache.get line("data.txt" , 2)
'L2\n'
>>linecache.get line("data.txt" , 5)
''
>>linecache.get line("data.txt" , 1)
'L1\n'
>>>

--
http://noneisyours.marcher.name
http://feeds.feedburner.com/NoneIsYours

You are not free to read this message,
by doing so, you have violated my licence
and are required to urinate publicly. Thank you.

Jan 8 '08 #6
Martin Marcher wrote:
>i need to read line 4 from a header file

http://docs.python.org/lib/module-linecache.html
I guess you missed the "using linecache will crash my computer due to
memory loading, because i am working on 2000 files each is 8mb" part.

</F>

Jan 8 '08 #7
jo3c wrote:
hi everybody
im a newbie in python
i need to read line 4 from a header file
using linecache will crash my computer due to memory loading, because
i am working on 2000 files each is 8mb

fileinput don't load the file into memory first
how do i use fileinput module to read a specific line from a file?

for line in fileinput.Filei nput('sample.tx t')
????
I could have sworn that I posted working code (including an explanation
why linecache wouldn't work) the last time you asked about this... yes,
here it is again:
i have a 2000 files with header and data
i need to get the date information from the header
then insert it into my database
i am doing it in batch so i use glob.glob('/mydata/*/*/*.txt')
to get the date on line 4 in the txt file i use
linecache.getli ne('/mydata/myfile.txt/, 4)

but if i use
linecache.getli ne('glob.glob('/mydata/*/*/*.txt', 4) won't work
glob.glob returns a list of filenames, so you need to call getline once
for each file in the list.

but using linecache is absolutely the wrong tool for this; it's designed
for *repeated* access to arbitrary lines in a file, so it keeps all the
data in memory. that is, all the lines, for all 2000 files.

if the files are small, and you want to keep the code short, it's easier
to just grab the file's content and using indexing on the resulting list:

for filename in glob.glob('/mydata/*/*/*.txt'):
line = list(open(filen ame))[4-1]
... do something with line ...

(note that line numbers usually start with 1, but Python's list indexing
starts at 0).

if the files might be large, use something like this instead:

for filename in glob.glob('/mydata/*/*/*.txt'):
f = open(filename)
# skip first three lines
f.readline(); f.readline(); f.readline()
# grab the line we want
line = f.readline()
... do something with line ...

</F>

Jan 8 '08 #8
Russ P. wrote:
On Jan 7, 9:41 pm, Dennis Lee Bieber <wlfr...@ix.net com.comwrote:
> Given that the OP is talking 2000 files to be processed, I think I'd
recommend explicit open() and close() calls to avoid having lots of I/O
structures floating around...
[effectively]
>for fid in file_list:
fin = open(fid)
for cnt in xrange(4):
ln = fin.readline()
fin.close()
One second thought, I wonder if the reference counting mechanism would
be "smart" enough to automatically close the previous file on each
iteration of the outer loop. If so, the files don't need to be
explicitly closed.
I _hate_ relying on that, but context managers mean you don't have to.
There are good reasons to close as early as you can. For example,
readers of files from zip files will eventually either be slower or
not work until the other readers close.

Here is what I imagine you want (2.5 or better):

from __future__ import with_statement

def pairing(names, position):
for filename in names:
with open(filename) as f:
for n, line in enumerate(f):
if n == position:
break
else:
line = None # indicate a short file
yield filename, line
...
for name, line in pairing(glob.gl ob('*.txt'), 3):
do_something(na me, line)

--Scott David Daniels
Sc***********@A cm.Org
Jan 8 '08 #9
Steven D'Aprano wrote:
Python guarantees[1] that files will be closed, but doesn't specify when
they will be closed. I understand that Jython doesn't automatically close
files until the program terminates, so even if you could rely on the ref
counter to close the files in CPython, it won't be safe to do so in
Jython.
From what I can tell, Java's GC automatically closes file streams, so
Jython will behave pretty much like CPython in most cases. I sure
haven't been able to make Jython run out by file handles by opening tons
of files and discarding the file objects without closing them. Has anyone?

</F>

Jan 8 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1378
by: Daniel Yoo | last post by:
Hi everyone, I'm was wondering: would it be a good idea to have the FileInput class support a read() method? I found myself running into a small problem while using xml.sax.parse in combination with fileinput. Here's a snippet that demonstrates the problem: ### import xml.sax import fileinput
3
1841
by: Chris Connett | last post by:
I just started working with the fileinput module, and reading the bit about inplace modification, it seems very dangerous: From the doc: --- Optional in-place filtering: if the keyword argument inplace=1 is passed to input() or to the FileInput constructor, the file is moved to a backup file and standard output is directed to the input file **(if a file of the same name as the backup file already exists, it
5
2479
by: the.theorist | last post by:
I was writing a small script the other day with the following CLI prog * I've used getopt to parse out the possible options, so we'll ignore that part, and assume for the rest of the discussion that args is a list of file names (if any provided). I used this bit of code to detect wether i want stdinput or not. if len(args)==0:
6
8711
by: cyberco | last post by:
Using fileinput.input('test.txt') I probably forgot to process all lines or so, since I get the error 'input() already active' when i try to call fileinput.input('test.txt') again. But how can I 'close' the previous version I opened? I have no handle to it or so...
0
1224
by: cyberco | last post by:
Opening, reading and writing to a file works fine in mod_python, but using fileinput (with inplace editing) gives me a 'permission denied' with exactly the same fileName: ========================= fileinput.input(fileName, inplace=1) ========================= I suspect that this has to do with the temporary file it creates, or am I wrong?
10
2941
by: wo_shi_big_stomach | last post by:
Newbie to python writing a script to recurse a directory tree and delete the first line of a file if it contains a given string. I get the same error on a Mac running OS X 10.4.8 and FreeBSD 6.1. Here's the script: # start of program # p.pl - fix broken SMTP headers in email files #
0
1557
by: Phoe6 | last post by:
Hi All, I am able to use urlib2 through proxy. I give proxy credentials and use # Set the Proxy Address proxy_ip = "10.0.1.1:80" proxy_user = 'senthil_or' proxy_password_orig='password'
4
3245
by: Adam Funk | last post by:
I'm using this sort of standard thing: for line in fileinput.input(): do_stuff(line) and wondering whether it reads until it hits an EOF and then passes lines (one at a time) into the variable line. This appears to be the behaviour when it's reading STDIN interactively (i.e. from the keyboard).
3
3589
by: Robert | last post by:
I would like to count lines in a file using the fileinput module and I am getting an unusual output. ------------------------------------------------------------------------------ #!/usr/bin/python import fileinput # cycle through files for line in fileinput.input(): if (fileinput.isfirstline()): if (fileinput.lineno 1):
0
8177
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8681
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8629
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8341
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8488
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
5570
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4084
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2611
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1793
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.