By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,548 Members | 1,495 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,548 IT Pros & Developers. It's quick & easy.

reading specific lines of a file

P: n/a
Hi All,

I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

Yi Xing

Jul 15 '06 #1
Share this Question
Share on Google+
12 Replies


P: n/a
If the line number of the first line is 0 :

source=open('afile.txt')
for i,line in enumerate(source):
if i == line_num:
break
print line

Pierre

Jul 15 '06 #2

P: n/a
>>>>Yi Xing <yx***@stanford.edu(YX) wrote:
>YXHi All,
YXI want to read specific lines of a huge txt file (I know the line #). Each
YXline might have different sizes. Is there a convenient and fast way of
YXdoing this in Python? Thanks.
Not fast. You have to read all preceding lines.
If you have to do this many times while the file does not change, you could
build an index into the file.
--
Piet van Oostrum <pi**@cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
Private email: pi**@vanoostrum.org
Jul 15 '06 #3

P: n/a

Yi Xing wrote:
I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.
#!/usr/bin/env python

import os,sys
line = int(sys.argv[1])
path = sys.argv[2]
os.system("sed -n %dp %s"%(line,path))
Some might argue that this is not really doing
it in Python. In fact, I would argue that! But if
you're at a command prompt and you want to
see line 7358, it's much easier to type
% sed -n 7358p
than it is to write the python one-liner.

Jul 15 '06 #4

P: n/a
Yi Xing wrote:
Hi All,

I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

Yi Xing
I once had to do a lot of random access of lines in a multi gigabyte
log file. I found that a very fast way to do this was to build an
index file containing the int offset in bytes of each line in the log
file.

I could post the code if you're interested.

Peace,
~Simon

Jul 15 '06 #5

P: n/a
In <ma***************************************@python. org>, Yi Xing wrote:
I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.
Don't know how efficient the `linecache` module in the standard library is
implemented but you might have a look at it.

Ciao,
Marc 'BlackJack' Rintsch
Jul 15 '06 #6

P: n/a
Yi,
Use the linecache module. The documentation states that :
"""
The linecache module allows one to get any line from any file, while
attempting to optimize internally, using a cache, the common case where
many lines are read from a single file.
>>import linecache
linecache.getline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\012'
"""

Please note that you cannot really skip over the lines unless each has
a fixed known size. (and if all lines have a fixed, known size then
they can be considered as 'records' and you can use seek() and other
random access magic. That is why sometimes it is a lot faster to use
fixed length rows in a database =increase the speed of search but at
the expense of wasted space! - but this is a another topic for another
discussion...).

So the point is that you won't be able to jump to line 15000 without
reading lines 0-14999. You can either iterate over the rows by yourself
or simply use the 'linecache' module like shown above. If I were you I
would use the linecache, but of course you don't mention anything about
the context of your project so it is hard to say.

Hope this helps,
Nick Vatamaniuc
Yi Xing wrote:
Hi All,

I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

Yi Xing
Jul 16 '06 #7

P: n/a
On 16/07/2006 2:54 PM, Nick Vatamaniuc top-posted:
Yi,
Use the linecache module.
Yi, *don't* use the linecache module without carefully comparing the
documentation and the implementation with your requirements.

You will find that you have the source code on your computer -- mine
(Windows box) is at c:\Python24\Lib\linecache.py. When you read right
down to the end (it's not a large file, only 108 lines), you'll find this:

try:
fp = open(fullname, 'rU')
lines = fp.readlines()
fp.close()
except IOError, msg:
## print '*** Cannot open', fullname, ':', msg
return []
size, mtime = stat.st_size, stat.st_mtime
cache[filename] = size, mtime, lines, fullname

Looks like it's caching the *whole* of *each* file. Not unreasonable
given it appears to have been written to get source lines to include in
tracebacks.

It might just not be what you want if as you say you have "a huge txt
file". How many megabytes is "huge"?

Cheers,
John

The documentation states that :
"""
The linecache module allows one to get any line from any file, while
attempting to optimize internally, using a cache, the common case where
many lines are read from a single file.
>>>import linecache
linecache.getline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\012'
"""

Please note that you cannot really skip over the lines unless each has
a fixed known size. (and if all lines have a fixed, known size then
they can be considered as 'records' and you can use seek() and other
random access magic. That is why sometimes it is a lot faster to use
fixed length rows in a database =increase the speed of search but at
the expense of wasted space! - but this is a another topic for another
discussion...).

So the point is that you won't be able to jump to line 15000 without
reading lines 0-14999. You can either iterate over the rows by yourself
or simply use the 'linecache' module like shown above. If I were you I
would use the linecache, but of course you don't mention anything about
the context of your project so it is hard to say.

Hope this helps,
Nick Vatamaniuc
Yi Xing wrote:
>Hi All,

I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

Yi Xing
Jul 16 '06 #8

P: n/a
Bill Pursell wrote:
Some might argue that this is not really doing
it in Python. In fact, I would argue that! But if
you're at a command prompt and you want to
see line 7358, it's much easier to type
% sed -n 7358p
than it is to write the python one-liner.
'sed' is not recognized as an internal or external command,
operable program or batch file.

</F>

Jul 16 '06 #9

P: n/a
In message <ma***************************************@python. org>, Fredrik
Lundh wrote:
Bill Pursell wrote:
>Some might argue that this is not really doing
it in Python. In fact, I would argue that! But if
you're at a command prompt and you want to
see line 7358, it's much easier to type
% sed -n 7358p
than it is to write the python one-liner.

'sed' is not recognized as an internal or external command,
operable program or batch file.
You're not using Windows, are you?
Jul 16 '06 #10

P: n/a
In message <ma***************************************@python. org>, Yi Xing
wrote:
I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.
file("myfile.txt").readlines()[LineNr]

Convenient, yes. Fast, no. :)
Jul 16 '06 #11

P: n/a
On 16/07/2006 5:16 PM, Fredrik Lundh wrote:
Bill Pursell wrote:
>Some might argue that this is not really doing
it in Python. In fact, I would argue that! But if
you're at a command prompt and you want to
see line 7358, it's much easier to type
% sed -n 7358p
aarrbejaysus #1: You *don't* type the '%', you *do* need to specify an
input file somehow.
>than it is to write the python one-liner.

'sed' is not recognized as an internal or external command,
operable program or batch file.
aarrbejaysus #2: Download the installer from

http://gnuwin32.sourceforge.net/packages/sed.htm

Jul 16 '06 #12

P: n/a
John Machin wrote:
>'sed' is not recognized as an internal or external command,
operable program or batch file.

aarrbejaysus #2: Download the installer from

http://gnuwin32.sourceforge.net/packages/sed.htm
in a way, this kind of advice reminds me of

http://thedailywtf.com/forums/thread/80949.aspx

</F>

Jul 16 '06 #13

This discussion thread is closed

Replies have been disabled for this discussion.