473,398 Members | 2,120 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

reading specific lines of a file

Hi All,

I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

Yi Xing

Jul 15 '06 #1
12 3024
If the line number of the first line is 0 :

source=open('afile.txt')
for i,line in enumerate(source):
if i == line_num:
break
print line

Pierre

Jul 15 '06 #2
>>>>Yi Xing <yx***@stanford.edu(YX) wrote:
>YXHi All,
YXI want to read specific lines of a huge txt file (I know the line #). Each
YXline might have different sizes. Is there a convenient and fast way of
YXdoing this in Python? Thanks.
Not fast. You have to read all preceding lines.
If you have to do this many times while the file does not change, you could
build an index into the file.
--
Piet van Oostrum <pi**@cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
Private email: pi**@vanoostrum.org
Jul 15 '06 #3

Yi Xing wrote:
I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.
#!/usr/bin/env python

import os,sys
line = int(sys.argv[1])
path = sys.argv[2]
os.system("sed -n %dp %s"%(line,path))
Some might argue that this is not really doing
it in Python. In fact, I would argue that! But if
you're at a command prompt and you want to
see line 7358, it's much easier to type
% sed -n 7358p
than it is to write the python one-liner.

Jul 15 '06 #4
Yi Xing wrote:
Hi All,

I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

Yi Xing
I once had to do a lot of random access of lines in a multi gigabyte
log file. I found that a very fast way to do this was to build an
index file containing the int offset in bytes of each line in the log
file.

I could post the code if you're interested.

Peace,
~Simon

Jul 15 '06 #5
In <ma***************************************@python. org>, Yi Xing wrote:
I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.
Don't know how efficient the `linecache` module in the standard library is
implemented but you might have a look at it.

Ciao,
Marc 'BlackJack' Rintsch
Jul 15 '06 #6
Yi,
Use the linecache module. The documentation states that :
"""
The linecache module allows one to get any line from any file, while
attempting to optimize internally, using a cache, the common case where
many lines are read from a single file.
>>import linecache
linecache.getline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\012'
"""

Please note that you cannot really skip over the lines unless each has
a fixed known size. (and if all lines have a fixed, known size then
they can be considered as 'records' and you can use seek() and other
random access magic. That is why sometimes it is a lot faster to use
fixed length rows in a database =increase the speed of search but at
the expense of wasted space! - but this is a another topic for another
discussion...).

So the point is that you won't be able to jump to line 15000 without
reading lines 0-14999. You can either iterate over the rows by yourself
or simply use the 'linecache' module like shown above. If I were you I
would use the linecache, but of course you don't mention anything about
the context of your project so it is hard to say.

Hope this helps,
Nick Vatamaniuc
Yi Xing wrote:
Hi All,

I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

Yi Xing
Jul 16 '06 #7
On 16/07/2006 2:54 PM, Nick Vatamaniuc top-posted:
Yi,
Use the linecache module.
Yi, *don't* use the linecache module without carefully comparing the
documentation and the implementation with your requirements.

You will find that you have the source code on your computer -- mine
(Windows box) is at c:\Python24\Lib\linecache.py. When you read right
down to the end (it's not a large file, only 108 lines), you'll find this:

try:
fp = open(fullname, 'rU')
lines = fp.readlines()
fp.close()
except IOError, msg:
## print '*** Cannot open', fullname, ':', msg
return []
size, mtime = stat.st_size, stat.st_mtime
cache[filename] = size, mtime, lines, fullname

Looks like it's caching the *whole* of *each* file. Not unreasonable
given it appears to have been written to get source lines to include in
tracebacks.

It might just not be what you want if as you say you have "a huge txt
file". How many megabytes is "huge"?

Cheers,
John

The documentation states that :
"""
The linecache module allows one to get any line from any file, while
attempting to optimize internally, using a cache, the common case where
many lines are read from a single file.
>>>import linecache
linecache.getline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\012'
"""

Please note that you cannot really skip over the lines unless each has
a fixed known size. (and if all lines have a fixed, known size then
they can be considered as 'records' and you can use seek() and other
random access magic. That is why sometimes it is a lot faster to use
fixed length rows in a database =increase the speed of search but at
the expense of wasted space! - but this is a another topic for another
discussion...).

So the point is that you won't be able to jump to line 15000 without
reading lines 0-14999. You can either iterate over the rows by yourself
or simply use the 'linecache' module like shown above. If I were you I
would use the linecache, but of course you don't mention anything about
the context of your project so it is hard to say.

Hope this helps,
Nick Vatamaniuc
Yi Xing wrote:
>Hi All,

I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

Yi Xing
Jul 16 '06 #8
Bill Pursell wrote:
Some might argue that this is not really doing
it in Python. In fact, I would argue that! But if
you're at a command prompt and you want to
see line 7358, it's much easier to type
% sed -n 7358p
than it is to write the python one-liner.
'sed' is not recognized as an internal or external command,
operable program or batch file.

</F>

Jul 16 '06 #9
In message <ma***************************************@python. org>, Fredrik
Lundh wrote:
Bill Pursell wrote:
>Some might argue that this is not really doing
it in Python. In fact, I would argue that! But if
you're at a command prompt and you want to
see line 7358, it's much easier to type
% sed -n 7358p
than it is to write the python one-liner.

'sed' is not recognized as an internal or external command,
operable program or batch file.
You're not using Windows, are you?
Jul 16 '06 #10
In message <ma***************************************@python. org>, Yi Xing
wrote:
I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.
file("myfile.txt").readlines()[LineNr]

Convenient, yes. Fast, no. :)
Jul 16 '06 #11
On 16/07/2006 5:16 PM, Fredrik Lundh wrote:
Bill Pursell wrote:
>Some might argue that this is not really doing
it in Python. In fact, I would argue that! But if
you're at a command prompt and you want to
see line 7358, it's much easier to type
% sed -n 7358p
aarrbejaysus #1: You *don't* type the '%', you *do* need to specify an
input file somehow.
>than it is to write the python one-liner.

'sed' is not recognized as an internal or external command,
operable program or batch file.
aarrbejaysus #2: Download the installer from

http://gnuwin32.sourceforge.net/packages/sed.htm

Jul 16 '06 #12
John Machin wrote:
>'sed' is not recognized as an internal or external command,
operable program or batch file.

aarrbejaysus #2: Download the installer from

http://gnuwin32.sourceforge.net/packages/sed.htm
in a way, this kind of advice reminds me of

http://thedailywtf.com/forums/thread/80949.aspx

</F>

Jul 16 '06 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

20
by: sahukar praveen | last post by:
Hello, I have a question. I try to print a ascii file in reverse order( bottom-top). Here is the logic. 1. Go to the botton of the file fseek(). move one character back to avoid the EOF. 2....
40
by: googler | last post by:
I'm trying to read from an input text file and print it out. I can do this by reading each character, but I want to implement it in a more efficient way. So I thought my program should read one...
3
by: lizii | last post by:
i have a file - which on each line has some data i need to fill into a box - now although reading in the data is simple enough and putting it in the correct box will be no problem, as i can just...
3
by: The Cool Giraffe | last post by:
Regarding the following code i have a problem. void read () { fstream file; ios::open_mode opMode = ios::in; file.open ("some.txt", opMode); char *ch = new char; vector <charv; while...
21
by: Stephen.Schoenberger | last post by:
Hello, My C is a bit rusty (.NET programmer normally but need to do this in C) and I need to read in a text file that is setup as a table. The general form of the file is 00000000 USNIST00Z...
2
by: Francesco Pietra | last post by:
Please, how to adapt the following script (to delete blank lines) to delete lines containing a specific word, or words? f=open("output.pdb", "r") for line in f: line=line.rstrip() if line:...
3
by: Ahmad Jalil Qarshi | last post by:
Hi, I have a text file having size about 2 GB. The text file format is like: Numeric valueAlphaNumeric values Numeric valueAlphaNumeric values Numeric valueAlphaNumeric values For example...
2
by: masha2011 | last post by:
Hello all, I am very new to programming and I have recently started using python. I am interested in creating a file to essentially compare data points. I am curious if it is possible to read...
3
by: astroumut | last post by:
Hi, I need a help regarding reading a line in a text file. I have some files with columns but some header information at the top. Some of these header lines begin with a *, and some of them with...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.