efficient text file search.

noro

Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
if 'string' in line:
print 'FOUND'

?

BTW:
does "for line in f: " read a block of line to te memory or is it
simply calls f.readline() many times?

thanks
amit

Sep 11 '06 #1

Subscribe Post Reply

3732

Luuk

"noro" <am******@gmail.comschreef in bericht
news:11**********************@h48g2000cwc.googlegr oups.com...

Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
if 'string' in line:
print 'FOUND'

yes, more efficient would be:
grep (http://www.gnu.org/software/grep/)

Sep 11 '06 #2

noro

:)

via python...

Luuk wrote:

"noro" <am******@gmail.comschreef in bericht
news:11**********************@h48g2000cwc.googlegr oups.com...
Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
if 'string' in line:
print 'FOUND'

yes, more efficient would be:
grep (http://www.gnu.org/software/grep/)

Sep 11 '06 #3

Luuk

"noro" <am******@gmail.comschreef in bericht
news:11********************@h48g2000cwc.googlegrou ps.com...

:)

via python...

Luuk wrote:
>"noro" <am******@gmail.comschreef in bericht
news:11**********************@h48g2000cwc.googleg roups.com...
Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
if 'string' in line:
print 'FOUND'

yes, more efficient would be:
grep (http://www.gnu.org/software/grep/)

ok, a more serious answer:

some googling turned op the following.
Second paragraph of chapter 14 of http://www.amk.ca/python/2.1/

a.. The speed of line-oriented file I/O has been improved because people
often complain about its lack of speed, and because it's often been used as
a naïve benchmark. The readline() method of file objects has therefore been
rewritten to be much faster. The exact amount of the speedup will vary from
platform to platform depending on how slow the C library's getc() was, but
is around 66%, and potentially much faster on some particular operating
systems. Tim Peters did much of the benchmarking and coding for this change,
motivated by a discussion in comp.lang.python.
A new module and method for file objects was also added, contributed by Jeff
Epler. The new method, xreadlines(), is similar to the existing xrange()
built-in. xreadlines() returns an opaque sequence object that only supports
being iterated over, reading a line on every iteration but not reading the
entire file into memory as the existing readlines() method does. You'd use
it like this:
for line in sys.stdin.xreadlines():
# ... do something for each line ...
...
For a fuller discussion of the line I/O changes, see the python-dev summary
for January 1-15, 2001 at http://www.amk.ca/python/dev/2001-01-1.html.

Sep 11 '06 #4

Kent Johnson

noro wrote:

Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
if 'string' in line:
print 'FOUND'

Probably better to read the whole file at once if it isn't too big:
f = file('somefile')
data = f.read()
if 'string' in data:
print 'FOUND'

Sep 11 '06 #5

Ant

noro wrote:

Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
if 'string' in line:
print 'FOUND'

break
^^^
Add a 'break' after the print statement - that way you won't have to
read the entire file unless the string isn't there. That's probably not
the sort of advice you're after though :-)

Can't see why reading the entire file in as the other poster suggested
would help, and seeing as "for line in f:" is now regarded as the
pythonic way of working with lines of text in a file, then I'd assume
that the implementation would be at least as fast as "for line in
f.xreadlines(): "

Sep 11 '06 #6

John Machin

Luuk wrote:
[snip]

some googling turned op the following.
Second paragraph of chapter 14 of http://www.amk.ca/python/2.1/

[snip]

For a fuller discussion of the line I/O changes, see the python-dev summary
for January 1-15, 2001 at http://www.amk.ca/python/dev/2001-01-1.html.

That is *HISTORY*. That is Python 2.1. That is the year 2001.
xreadlines is as dead as a dodo.

Sep 11 '06 #7

Luuk

"John Machin" <sj******@lexicon.netschreef in bericht
news:11**********************@d34g2000cwd.googlegr oups.com...

>
Luuk wrote:
[snip]
>some googling turned op the following.
Second paragraph of chapter 14 of http://www.amk.ca/python/2.1/
[snip]
>For a fuller discussion of the line I/O changes, see the python-dev
summary
for January 1-15, 2001 at http://www.amk.ca/python/dev/2001-01-1.html.

That is *HISTORY*. That is Python 2.1. That is the year 2001.
xreadlines is as dead as a dodo.

Thats's why i started my reply with:
"some googling turned op the following."
i did not state that further googling was unneeded ;-)

Sep 11 '06 #8

George Sakkis

noro wrote:

Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
if 'string' in line:
print 'FOUND'

?

Is this something you want to do only once for a given file ? The
replies so far seem to imply so and in this case I doubt that you can
do anything more efficient. OTOH, if the same file is to be searched
repeatedly for different strings, an appropriate indexing scheme can
speed things up considerably on average.

George

Sep 11 '06 #9

noro

OK, am not sure why, but

fList=file('somefile').read()
if fList.find('string') != -1:
print 'FOUND'

works much much faster.

it is strange since i thought 'for line in file('somefile')' is
optemized and read pages to the memory,
i guess not..

George Sakkis wrote:

noro wrote:

Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
if 'string' in line:
print 'FOUND'

?

Is this something you want to do only once for a given file ? The
replies so far seem to imply so and in this case I doubt that you can
do anything more efficient. OTOH, if the same file is to be searched
repeatedly for different strings, an appropriate indexing scheme can
speed things up considerably on average.

George

Sep 11 '06 #10

Sion Arrowsmith

noro <am******@gmail.comwrote:

>OK, am not sure why, but

fList=file('somefile').read()
if fList.find('string') != -1:
print 'FOUND'

works much much faster.

it is strange since i thought 'for line in file('somefile')' is
optemized and read pages to the memory,

Step back and think about what each is doing at a high level of
description: file.read reads the contents of the file into memory
in one go, end of story. file.[x]readlines reads (some or all of)
the contents of the file into memeory, does a linear searches on it
for end of line characters, and copies out the line(s) into some
new bits of memory. Line-by-line processing has a *lot* more work
to do (unless you're read()ing a really big file which is going to
make heavy demands on memory allocation) and it should be no
surprise that it's slower.

--
\S -- si***@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
___ | "Frankly I have no feelings towards penguins one way or the other"
\X/ | -- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump

Sep 12 '06 #11

Similar topics

Efficient Text File Copy

by: Materialised | last post by:

Hi everyone, What I am wanting to do, is to copy, a simple plain text file, to another file, but omitting duplicate items. The way I thought of doing this, involved copying all the items into...

C / C++

Fastest way to search text file for string

by: Julie | last post by:

What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't...

C# / C Sharp

reading a text file

by: rodchar | last post by:

hey all, is there a quick way to read.all the contents of a text file, search for the text, and if it finds that text to read the entire line? thanks, rodchar

Visual Basic .NET

Full Text File Search with Indexing Service on Windows

by: Chung Leong | last post by:

Here's a short tutorial on how to the OLE-DB extension to access Windows Indexing Service. Impress your office-mates with a powerful full-text search feature on your intranet. It's easier than you...

PHP

Full Text File Search with Indexing Service on Windows (cont.)

by: Chung Leong | last post by:

Here's the rest of the tutorial I started earlier: Aside from text within a document, Indexing Service let you search on meta information stored in the files. For example, MusicArtist and...

PHP

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp