473,666 Members | 2,258 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

reading specific lines of a file

Hi All,

I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

Yi Xing

Jul 15 '06 #1
12 3041
If the line number of the first line is 0 :

source=open('af ile.txt')
for i,line in enumerate(sourc e):
if i == line_num:
break
print line

Pierre

Jul 15 '06 #2
>>>>Yi Xing <yx***@stanford .edu(YX) wrote:
>YXHi All,
YXI want to read specific lines of a huge txt file (I know the line #). Each
YXline might have different sizes. Is there a convenient and fast way of
YXdoing this in Python? Thanks.
Not fast. You have to read all preceding lines.
If you have to do this many times while the file does not change, you could
build an index into the file.
--
Piet van Oostrum <pi**@cs.uu.n l>
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C 4]
Private email: pi**@vanoostrum .org
Jul 15 '06 #3

Yi Xing wrote:
I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.
#!/usr/bin/env python

import os,sys
line = int(sys.argv[1])
path = sys.argv[2]
os.system("sed -n %dp %s"%(line,path) )
Some might argue that this is not really doing
it in Python. In fact, I would argue that! But if
you're at a command prompt and you want to
see line 7358, it's much easier to type
% sed -n 7358p
than it is to write the python one-liner.

Jul 15 '06 #4
Yi Xing wrote:
Hi All,

I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

Yi Xing
I once had to do a lot of random access of lines in a multi gigabyte
log file. I found that a very fast way to do this was to build an
index file containing the int offset in bytes of each line in the log
file.

I could post the code if you're interested.

Peace,
~Simon

Jul 15 '06 #5
In <ma************ *************** ************@py thon.org>, Yi Xing wrote:
I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.
Don't know how efficient the `linecache` module in the standard library is
implemented but you might have a look at it.

Ciao,
Marc 'BlackJack' Rintsch
Jul 15 '06 #6
Yi,
Use the linecache module. The documentation states that :
"""
The linecache module allows one to get any line from any file, while
attempting to optimize internally, using a cache, the common case where
many lines are read from a single file.
>>import linecache
linecache.get line('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\012'
"""

Please note that you cannot really skip over the lines unless each has
a fixed known size. (and if all lines have a fixed, known size then
they can be considered as 'records' and you can use seek() and other
random access magic. That is why sometimes it is a lot faster to use
fixed length rows in a database =increase the speed of search but at
the expense of wasted space! - but this is a another topic for another
discussion...).

So the point is that you won't be able to jump to line 15000 without
reading lines 0-14999. You can either iterate over the rows by yourself
or simply use the 'linecache' module like shown above. If I were you I
would use the linecache, but of course you don't mention anything about
the context of your project so it is hard to say.

Hope this helps,
Nick Vatamaniuc
Yi Xing wrote:
Hi All,

I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

Yi Xing
Jul 16 '06 #7
On 16/07/2006 2:54 PM, Nick Vatamaniuc top-posted:
Yi,
Use the linecache module.
Yi, *don't* use the linecache module without carefully comparing the
documentation and the implementation with your requirements.

You will find that you have the source code on your computer -- mine
(Windows box) is at c:\Python24\Lib \linecache.py. When you read right
down to the end (it's not a large file, only 108 lines), you'll find this:

try:
fp = open(fullname, 'rU')
lines = fp.readlines()
fp.close()
except IOError, msg:
## print '*** Cannot open', fullname, ':', msg
return []
size, mtime = stat.st_size, stat.st_mtime
cache[filename] = size, mtime, lines, fullname

Looks like it's caching the *whole* of *each* file. Not unreasonable
given it appears to have been written to get source lines to include in
tracebacks.

It might just not be what you want if as you say you have "a huge txt
file". How many megabytes is "huge"?

Cheers,
John

The documentation states that :
"""
The linecache module allows one to get any line from any file, while
attempting to optimize internally, using a cache, the common case where
many lines are read from a single file.
>>>import linecache
linecache.ge tline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\012'
"""

Please note that you cannot really skip over the lines unless each has
a fixed known size. (and if all lines have a fixed, known size then
they can be considered as 'records' and you can use seek() and other
random access magic. That is why sometimes it is a lot faster to use
fixed length rows in a database =increase the speed of search but at
the expense of wasted space! - but this is a another topic for another
discussion...).

So the point is that you won't be able to jump to line 15000 without
reading lines 0-14999. You can either iterate over the rows by yourself
or simply use the 'linecache' module like shown above. If I were you I
would use the linecache, but of course you don't mention anything about
the context of your project so it is hard to say.

Hope this helps,
Nick Vatamaniuc
Yi Xing wrote:
>Hi All,

I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

Yi Xing
Jul 16 '06 #8
Bill Pursell wrote:
Some might argue that this is not really doing
it in Python. In fact, I would argue that! But if
you're at a command prompt and you want to
see line 7358, it's much easier to type
% sed -n 7358p
than it is to write the python one-liner.
'sed' is not recognized as an internal or external command,
operable program or batch file.

</F>

Jul 16 '06 #9
In message <ma************ *************** ************@py thon.org>, Fredrik
Lundh wrote:
Bill Pursell wrote:
>Some might argue that this is not really doing
it in Python. In fact, I would argue that! But if
you're at a command prompt and you want to
see line 7358, it's much easier to type
% sed -n 7358p
than it is to write the python one-liner.

'sed' is not recognized as an internal or external command,
operable program or batch file.
You're not using Windows, are you?
Jul 16 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

20
33034
by: sahukar praveen | last post by:
Hello, I have a question. I try to print a ascii file in reverse order( bottom-top). Here is the logic. 1. Go to the botton of the file fseek(). move one character back to avoid the EOF. 2. From here read a character, print it, move the file pointer (FILE*) to 2 steps back (using fseek(fp, -2, SEEK_CUR)) to read the previous character. This seems to be ok if the file has a single line (i.e. no new line character). The above logic...
40
4509
by: googler | last post by:
I'm trying to read from an input text file and print it out. I can do this by reading each character, but I want to implement it in a more efficient way. So I thought my program should read one line at a time and print it out. How can I do this? I wrote the code below but it's not correct since the fscanf reads one word (terminating in whitespace or newline) at a time, instead of reading the whole line. #include <stdio.h> void...
3
1802
by: lizii | last post by:
i have a file - which on each line has some data i need to fill into a box - now although reading in the data is simple enough and putting it in the correct box will be no problem, as i can just read a lilne then put into the corresponding box...it just seems like a task i should be able to complete in a few lines...rather than quite a few commands. for example what i do at the moment is: read a line place into first text box
3
2823
by: The Cool Giraffe | last post by:
Regarding the following code i have a problem. void read () { fstream file; ios::open_mode opMode = ios::in; file.open ("some.txt", opMode); char *ch = new char; vector <charv; while (!file.eof ()) { do {
21
3047
by: Stephen.Schoenberger | last post by:
Hello, My C is a bit rusty (.NET programmer normally but need to do this in C) and I need to read in a text file that is setup as a table. The general form of the file is 00000000 USNIST00Z 00000000_00 0 000 000 000 0000 000 I need to read the file line by line and eventually parse out each piece of the file and store in arrays that correspond to the specific
2
19742
by: Francesco Pietra | last post by:
Please, how to adapt the following script (to delete blank lines) to delete lines containing a specific word, or words? f=open("output.pdb", "r") for line in f: line=line.rstrip() if line: print line f.close()
3
2160
by: Ahmad Jalil Qarshi | last post by:
Hi, I have a text file having size about 2 GB. The text file format is like: Numeric valueAlphaNumeric values Numeric valueAlphaNumeric values Numeric valueAlphaNumeric values For example consider following chunk of actual data:
2
3086
by: masha2011 | last post by:
Hello all, I am very new to programming and I have recently started using python. I am interested in creating a file to essentially compare data points. I am curious if it is possible to read specific lines from files and then write those lines to another file in separate columns (for comparative purposes)? If so could someone show me how I can alter my code to do this? Below is the code that I am using and wish to alter. The files that...
3
16530
by: astroumut | last post by:
Hi, I need a help regarding reading a line in a text file. I have some files with columns but some header information at the top. Some of these header lines begin with a *, and some of them with nothing. I have to read the 9th line and 5th column, then extract the number from here and save it to another text file. I was thinking of some solutions. 1) I can delete the first 8 lines and read the columns straightforward. 2) I first read...
0
8356
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8866
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8781
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8639
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7385
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6192
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5663
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4366
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2011
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.