Fast File Input

Scott Brady Drummonds

Hi, everyone,

I'm a relative novice to Python and am trying to reduce the processing time
for a very large text file that I am reading into my Python script. I'm
currently reading each line one at a time (readline()), stripping the
leading and trailing whitespace (strip()) and splitting it's delimited data
(split()). For my large input files, this text processing is taking many
hours.

If I were working in C, I'd consider using a lower level I/O library,
minimizing text processing, and reducing memory redundancy. However, I have
no idea at all what to do to optimize this process in Python.

Can anyone offer some suggestions?

Thanks,
Scott

--
Remove ".nospam" from the user ID in my e-mail to reply via e-mail.

Jul 18 '05 #1

Subscribe Post Reply

2273

Scott Brady Drummonds wrote:

Hi, everyone,

I'm a relative novice to Python and am trying to reduce the processing time
for a very large text file that I am reading into my Python script. I'm
currently reading each line one at a time (readline()), stripping the
leading and trailing whitespace (strip()) and splitting it's delimited data
(split()). For my large input files, this text processing is taking many
hours.

If I were working in C, I'd consider using a lower level I/O library,
minimizing text processing, and reducing memory redundancy. However, I have
no idea at all what to do to optimize this process in Python.

Can anyone offer some suggestions?

This actually improved a lot with python version 2
but is still quite slow as you can see here:
http://www.pixelbeat.org/readline/
There are a few notes within the python script there.

Pádraig.

Jul 18 '05 #2

Terry Reedy

"Scott Brady Drummonds" <sc**********************@intel.com> wrote in
message news:c1**********@news01.intel.com...

Hi, everyone,

I'm a relative novice to Python and am trying to reduce the processing time for a very large text file that I am reading into my Python script. I'm
currently reading each line one at a time (readline()), stripping the
leading and trailing whitespace (strip()) and splitting it's delimited data (split()). For my large input files, this text processing is taking many
hours.

for line in file('somefile.txt'): ...
will be faster because the file iterator reads a much larger block with
each disk access.

Do you really need strip()? Clipping \n off the last item after split()
*might* be faster.

Terry J. Reedy

Jul 18 '05 #3

Andrei

Scott Brady Drummonds wrote on Wed, 25 Feb 2004 08:35:43 -0800:

Hi, everyone,

I'm a relative novice to Python and am trying to reduce the processing time
for a very large text file that I am reading into my Python script. I'm
currently reading each line one at a time (readline()), stripping the
leading and trailing whitespace (strip()) and splitting it's delimited data
(split()). For my large input files, this text processing is taking many
hours.

An easy improvement is using "for line in sometextfile:" instead of
repetitive readline(). Not sure how much time this will save you (depends
on what you're doing after reading), but it can make a difference at
virtually no cost. You might also want to try rstrip() instead of strip()
(not sure if it's faster, but perhaps it is).

--
Yours,

Andrei

=====
Real contact info (decode with rot13):
ce******@jnanqbb.ay. Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V ernq
gur yvfg, fb gurer'f ab arrq gb PP.

Jul 18 '05 #4

Eddie Corns

>"Scott Brady Drummonds" <sc**********************@intel.com> wrote in

message news:c1**********@news01.intel.com...
Hi, everyone,

I'm a relative novice to Python and am trying to reduce the processing

time
for a very large text file that I am reading into my Python script. I'm
currently reading each line one at a time (readline()), stripping the
leading and trailing whitespace (strip()) and splitting it's delimited

data
(split()). For my large input files, this text processing is taking many
hours.

If you mean delimited in the CSV sense then I believe that the CSV modules are
optimised for this. Included in 2.3 IIRC.

Eddie

Jul 18 '05 #5

Skip Montanaro

Pádraig> This actually improved a lot with python version 2
Pádraig> but is still quite slow as you can see here:
Pádraig> http://www.pixelbeat.org/readline/
Pádraig> There are a few notes within the python script there.

Your page doesn't mention precisely which version of Python 2 you used.I
suspect a rather old one (2.0? 2.1?) because of the style of loop you used
to read from sys.stdin. Eliminating comments, your python2 script was:

import sys

while 1:
line = sys.stdin.readline()
if line == '':
break
try:
print line,
except:
pass

Running that using the CVS version of Python feeding it my machine's
dictionary as input I got this time(1) output (fastest real time of four runs):

% time python readltst.py < /usr/share/dict/words > /dev/null

real 0m1.384s
user 0m1.290s
sys 0m0.060s

Rewriting it to eliminate the try/except statement (why did you have that
there?) got it to:

% time python readltst.py < /usr/share/dict/words > /dev/null

real 0m1.373s
user 0m1.270s
sys 0m0.040s

Further rewriting it as the more modern:

import sys

for line in sys.stdin:
print line,

yielded:

% time python readltst2.py < /usr/share/dict/words > /dev/null

real 0m0.660s
user 0m0.600s
sys 0m0.060s

My guess is that your python2 times are probably at least a factor of 2too
large if you accept that people will use a recent version of Python in which
file objects are iterators.

Skip

Jul 18 '05 #6

by: Andre | last post by:

Hi, I need to fetch a value in an Access db, so i use a form to pass the parameter for the SQL statement fromVB client to ASP. After the submit line in VBscript, i expect the value in order to...

ASP / Active Server Pages

FAST way to put the content of a file in a DataTable

by: DraguVaso | last post by:

Hi, I need a FAST way to put the content of a file in a datatable (one record for each line in the file). I have a routine for it, but it takes me too much time (2-5 seconds for each file) to...

Visual Basic .NET

HELP!! FAST way to read Parts of Big Files

by: DraguVaso | last post by:

Hi, I have files I need to read, which contains records with a variable lenght. What I need to do is Copy a Part of such a File to a new File, based on the a Begin- and End-record. I used...

Visual Basic .NET

Urgent: Fast way to read Parts of Big Files

by: DraguVaso | last post by:

Hi, I have files I need to read, which contains records with a variable lenght. What I need to do is Copy a Part of such a File to a new File, based on the a Begin- and End-record. I used...

Visual Basic .NET

fast text processing

by: Alexis Gallagher | last post by:

(I tried to post this yesterday but I think my ISP ate it. Apologies if this is a double-post.) Is it possible to do very fast string processing in python? My bioinformatics application needs to...

Python

building an index for large text files for fast access

by: Yi Xing | last post by:

Hi, I need to read specific lines of huge text files. Each time, I know exactly which line(s) I want to read. readlines() or readline() in a loop is just too slow. Since different lines have...

Python

Asp Validation for login and registration (High importance and priority) respond fast

by: satishknight | last post by:

Hi, Can some one tell me how to change the validation sequence for the code pasted below, actually what I want it when any one enters the wrong login information (already registered users) then it...

ASP / Active Server Pages

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Fast File Input

Similar topics