By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,803 Members | 1,519 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,803 IT Pros & Developers. It's quick & easy.

Python slow for filter scripts

P: n/a
Hello,

Up to now I mostly wrote simple filter scripts in Perl, e.g.

while(<>) {
# do something with $_, regexp matching, replacements etc.
print;
}

Now I learned Python and like it much more as a language.

However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

Is that normal, does it disqualify Python for simple filter scripts?
Jul 18 '05 #1
Share this Question
Share on Google+
13 Replies


P: n/a

"Peter Mutsaers" <pl*@gmx.li> wrote in message
news:97**************************@posting.google.c om...
However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

There are several ways to copy a file in Python. Some are much faster
than others. I believe 'for line in file: print line' might be
fastest with python code. There is also shutil.copyfile(src,dst) or
one of other variants. Which did you use?
Is that normal, does it disqualify Python for simple filter scripts?


I have read that Perl is optimized for file read/write in a way that
Python is not, so this may not be most representative comparison for
your actual app. In any case, relevance of relative speed depends on
absolute speed (think about milleseconds versus hours).

Terry J. Reedy
Jul 18 '05 #2

P: n/a
Peter Mutsaers wrote:
Hello,

Up to now I mostly wrote simple filter scripts in Perl, e.g.

while(<>) {
# do something with $_, regexp matching, replacements etc.
print;
}

Now I learned Python and like it much more as a language.

However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

Is that normal, does it disqualify Python for simple filter scripts?


You don't show the Python script you use, so there's no way for us to
tell whether it is possible to do it more efficiently.

Also, what size file did you use? Unless you tried it with a large
enough file, so that the time was proportional to the file size, you may
just have measured the difference in the startup time for perl vs. python.

Finally, the relative performance of two languages on Task X is not a
very good predictor of their relative performance on Task Y, so you are
probably better off doing a comparison of the actual task you are
interested in.

David

Jul 18 '05 #3

P: n/a
In article <riCnb.36947$mZ5.185175@attbi_s54>,
David C. Fox <da*******@post.harvard.edu> wrote:
Peter Mutsaers wrote:

Jul 18 '05 #4

P: n/a
Peter Mutsaers wrote:
Hello,

Up to now I mostly wrote simple filter scripts in Perl, e.g.

while(<>) {
# do something with $_, regexp matching, replacements etc.
print;
}

Now I learned Python and like it much more as a language.

However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

Is that normal, does it disqualify Python for simple filter scripts?


It really depends on what you're doing. I tried the following:

cio.pl:
while(<>) {
print;
}

cio.py:
import sys
import fileinput
import shutil

emit = sys.stdout.write

def io_1(emit=emit):
for line in sys.stdin: emit(line)

def io_2(emit=emit):
for line in fileinput.input(): emit(line)

def io_3():
shutil.copyfileobj(sys.stdin, sys.stdout)
if __name__=='__main__':
import __main__

def usage():
sys.stdout = sys.stderr
print "Usage: %s N" % sys.argv[0]
print "N indicates what stdin->stdout copy function to run"
ns = [x[3:] for x in dir(__main__) if x[:3]=='io_']
ns.sort()
print "valid values for N:", ns
print "invalid args:", sys.argv[1:]
sys.exit()
if len(sys.argv) != 2: usage()
func = getattr(__main__, 'io_'+sys.argv[1], None)
if func is None: usage()
sys.argv.pop()
func()

and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing
either /dev/null or a tempfile on my own Linux box. I see...:

[alex@lancelot bo]$ ls -l /x/kjv.txt
-rw-rw-r-- 1 alex alex 4404445 Mar 29 2003 /x/kjv.txt

[alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps

[alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/tmp/kjv
0.04user 0.06system 0:00.19elapsed 51%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps

So, Perl is taking 80 to 100 milliseconds of CPU time (elapsed is
mostly dependent on what else is going on in the machine, and thus
by %CPU available, of course). Let's see Python now:
[alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/tmp/kjv
0.30user 0.01system 0:00.62elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps

Python with fileinput IS slower -- 270 to 300 msecs CPU, about a
factor of 3. However, that IS mostly fileinput's issue. Videat:
[alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/tmp/kjv
0.06user 0.07system 0:00.29elapsed 44%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps

a plain line by line copy takes 100-130 msec -- a bit slower than Perl,
but nothing major. Can we do better yet...?

[alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/tmp/kjv
0.02user 0.06system 0:00.16elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps

....sure! Bulk copy, 50-80 msec, FASTER than Perl. Of course, I'm sure
you can program it faster in Perl, too. After all, cat takes 20-60
msec CPU, so thee's clearly space to do better.
What kind of files do your scripts most often process? For me, a
textfile of 4.4 MB is larger than typical. How much do those few
tens of milliseconds' difference matter? You know your apps, I
don't, but I _would_ find it rather strange if they "disqualified"
either language. Anything below about a second is typically fine
with me, so even the slowest of these programs could still handle
files of about 6 MB, assuming the 50% CPU it got is pretty typical,
while still taking no more than about 1 second's elapsed time.
Of course, you can easily edit my script and play with many other
I/O methods, until you find one that best suits you. Personally,
I tend to use fileinput just because it's so handy (like perl's <>),
not caring all that much about those "wasted" milliseconds...:-)
Alex

Jul 18 '05 #5

P: n/a
Alex Martelli <al***@aleax.it> wrote:
and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing


Can you post URL for the Bible?

--
William Park, Open Geometry Consulting, <op**********@yahoo.ca>
Linux solution for data management and processing.
Jul 18 '05 #6

P: n/a
William Park wrote:
Alex Martelli <al***@aleax.it> wrote:
and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing


Can you post URL for the Bible?


I originally got it from some MySQL stuff by Paul DuBois and decided
it would make a good general dataset for reproducible tests and
benchmarks. A little googling suggests that it comes from:

http://unbound.biola.edu/zips/index.cfm?lang=English

and specifically from the kjv.zip referenced there under the "King
James Version" anchor (unzipped, of course).
Alex

Jul 18 '05 #7

P: n/a
Alex Martelli <al***@aleax.it> wrote in message news:<VV***********************@news2.tin.it>...
It really depends on what you're doing. I tried the following:


Me too!

I ran these on a hp-ux 11.00 box. The input was a 560K html file that
I had laying around.

Overall, python was about 3 times slower than perl...and remarkable
consistant for the three different methods.

stang@ettin$ ll ./print.html
-rw-r--r-- 1 stan users 567154 Oct 29 10:30
../print.html
stang@ettin$ time cio.pl < ./print.html > /dev/null

real 0m0.10s
user 0m0.06s
sys 0m0.03s
stang@ettin$ time cio.pl < ./print.html > /tmp/test

real 0m0.18s
user 0m0.06s
sys 0m0.04s
stang@ettin$ time cio.py 1 < ./print.html > /dev/null

real 0m0.85s
user 0m0.30s
sys 0m0.11s
stang@ettin$ time cio.py 1 < ./print.html > /tmp/test

real 0m0.45s
user 0m0.29s
sys 0m0.11s
stang@ettin$ time cio.py 2 < ./print.html > /dev/null

real 0m0.76s
user 0m0.64s
sys 0m0.11s
stang@ettin$ time cio.py 2 < ./print.html > /tmp/test

real 0m0.81s
user 0m0.64s
sys 0m0.12s
stang@ettin$ time cio.py 3 < ./print.html > /dev/null

real 0m0.43s
user 0m0.16s
sys 0m0.10s
stang@ettin$ time cio.py 3 < ./print.html > /tmp/test

real 0m0.33s
user 0m0.17s
sys 0m0.12s
stang@ettin$

--Stan Graves
st**@SoundInMotionDJ.com
http://www.SoundInMotionDJ.com
Jul 18 '05 #8

P: n/a

Stan> Overall, python was about 3 times slower than perl...and
Stan> remarkable consistant for the three different methods.

What version of Python did you use? Note that 2.3 is significantly faster
than 2.2 in a number of ways.

Skip

Jul 18 '05 #9

P: n/a
Alex Martelli <al***@aleax.it> wrote:
[alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps [alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps [alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps [alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps


But, nothing can beat
time cat < kjv.txt > /dev/null
:-)

--
William Park, Open Geometry Consulting, <op**********@yahoo.ca>
Linux solution for data management and processing.
Jul 18 '05 #10

P: n/a
On 29 Oct 2003 06:34:22 GMT, William Park <op**********@yahoo.ca> wrote:
Alex Martelli <al***@aleax.it> wrote:
and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing


Can you post URL for the Bible?

Try Project Gutenburg, at

http://www.gutenberg.net/

or their new host at

http://www.ibiblio.org/gutenberg/

They have a number of bibles in various languages, and a ton (>10,000 e-texts) of other stuff,
also some audio texts, apparently. BTW I read somewhere that the BBC is going to make all their
archives, video and audio, freely available on the net, except where there is some legal reason
they can't. I guess they're a kind of FEF -- Free Entertainment Foundation (thank you British
telly owners ;-)

Apparently a new King James e-text is at (long URL, or use their search for "bible" (w/o qutoes)
and go to entry #16):

http://www.ibiblio.org/gutenberg/cgi...org/gutenberg/

They also have the Koran, BTW. It's interesting to compare word frequencies, e.g., the 20 most frequent
(unless I goofed) in the texts I downloaded:

"C:\Info\Linguistics\Gutenberg\bible\bible11.t xt"
6647: 'LORD'
6649: 'him'
6856: 'is'
6893: 'be'
6971: 'they'
7249: 'for'
7972: 'a'
8388: 'his'
8854: 'I'
8940: 'unto'
9666: 'he'
9760: 'shall'
12353: 'in'
12592: 'that'
12846: 'And'
13429: 'to'
34472: 'of'
38891: 'and'
62135: 'the'

"C:\Info\Linguistics\Gutenberg\koran\koran10.t xt"
1739: 'ye'
1752: 'with'
1956: 'And'
1979: 'for'
1991: 'who'
2037: 'be'
2108: 'not'
2186: 'that'
2254: 'shall'
2366: 'them'
2575: 'a'
2644: 'they'
2799: 'is'
2900: 'in'
3320: 'God'
5144: 'to'
6855: 'of'
6896: 'and'
10982: 'the'

Both start with the-and-of-to ;-)
(I hope this does not offend anyone ;-)

Regards,
Bengt Richter
Jul 18 '05 #11

P: n/a
William Park <op**********@yahoo.ca> writes:
Alex Martelli <al***@aleax.it> wrote:
[alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps


But, nothing can beat
time cat < kjv.txt > /dev/null
:-)


time true ?

(thinking of

http://mail.python.org/pipermail/pyt...ry/033785.html

)

Cheers,
mwh

--
58. Fools ignore complexity. Pragmatists suffer it. Some can avoid
it. Geniuses remove it.
-- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html
Jul 18 '05 #12

P: n/a
Peter Mutsaers schrieb:
I found that on my (linux) PC, the Python version was 4 times slower.

Is that normal, does it disqualify Python for simple filter scripts?


Have a look at the win32 language shootout (http://dada.perl.it/shootout/)
with lots of benchmarks for lots of languages, among them python,
cygperl (perl with cygwin calls, I assume) and perl (perl with native
win32 calls). cygwin is usually faster than python, perl slower. You
can also look at Doug Bagley's Linux based shootout but it's older
(Perl 5.6, Python 2.1) and currently not maintained.

My impression is that on an average perl is faster than python
but not by an order of magnitude but by ~ 20%.

Your test is probably closest to the "reverse file" benchmark
where cygperl : python : perl = 0.68 : 1.68 : 12.72 on win32
and perl : python = 1.06 : 1.17 on Linux.

Mit freundlichen Gruessen,

Peter Maas

--
-------------------------------------------------------------------
Peter Maas, M+R Infosysteme, D-52070 Aachen, Hubert-Wienen-Str. 24
Tel +49-241-93878-0 Fax +49-241-93878-20 eMail pe********@mplusr.de
-------------------------------------------------------------------

Jul 18 '05 #13

P: n/a
Peter Maas <fp********@netscape.net> writes:
[...]
Your test is probably closest to the "reverse file" benchmark
where cygperl : python : perl = 0.68 : 1.68 : 12.72 on win32
and perl : python = 1.06 : 1.17 on Linux.

[...]

Why on earth is cygperl faster than perl? Is that really correct?
John
Jul 18 '05 #14

This discussion thread is closed

Replies have been disabled for this discussion.