472,347 Members | 1,726 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,347 software developers and data experts.

Python slow for filter scripts

Hello,

Up to now I mostly wrote simple filter scripts in Perl, e.g.

while(<>) {
# do something with $_, regexp matching, replacements etc.
print;
}

Now I learned Python and like it much more as a language.

However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

Is that normal, does it disqualify Python for simple filter scripts?
Jul 18 '05 #1
13 3901

"Peter Mutsaers" <pl*@gmx.li> wrote in message
news:97**************************@posting.google.c om...
However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

There are several ways to copy a file in Python. Some are much faster
than others. I believe 'for line in file: print line' might be
fastest with python code. There is also shutil.copyfile(src,dst) or
one of other variants. Which did you use?
Is that normal, does it disqualify Python for simple filter scripts?


I have read that Perl is optimized for file read/write in a way that
Python is not, so this may not be most representative comparison for
your actual app. In any case, relevance of relative speed depends on
absolute speed (think about milleseconds versus hours).

Terry J. Reedy
Jul 18 '05 #2
Peter Mutsaers wrote:
Hello,

Up to now I mostly wrote simple filter scripts in Perl, e.g.

while(<>) {
# do something with $_, regexp matching, replacements etc.
print;
}

Now I learned Python and like it much more as a language.

However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

Is that normal, does it disqualify Python for simple filter scripts?


You don't show the Python script you use, so there's no way for us to
tell whether it is possible to do it more efficiently.

Also, what size file did you use? Unless you tried it with a large
enough file, so that the time was proportional to the file size, you may
just have measured the difference in the startup time for perl vs. python.

Finally, the relative performance of two languages on Task X is not a
very good predictor of their relative performance on Task Y, so you are
probably better off doing a comparison of the actual task you are
interested in.

David

Jul 18 '05 #3
In article <riCnb.36947$mZ5.185175@attbi_s54>,
David C. Fox <da*******@post.harvard.edu> wrote:
Peter Mutsaers wrote:

Jul 18 '05 #4
Peter Mutsaers wrote:
Hello,

Up to now I mostly wrote simple filter scripts in Perl, e.g.

while(<>) {
# do something with $_, regexp matching, replacements etc.
print;
}

Now I learned Python and like it much more as a language.

However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

Is that normal, does it disqualify Python for simple filter scripts?


It really depends on what you're doing. I tried the following:

cio.pl:
while(<>) {
print;
}

cio.py:
import sys
import fileinput
import shutil

emit = sys.stdout.write

def io_1(emit=emit):
for line in sys.stdin: emit(line)

def io_2(emit=emit):
for line in fileinput.input(): emit(line)

def io_3():
shutil.copyfileobj(sys.stdin, sys.stdout)
if __name__=='__main__':
import __main__

def usage():
sys.stdout = sys.stderr
print "Usage: %s N" % sys.argv[0]
print "N indicates what stdin->stdout copy function to run"
ns = [x[3:] for x in dir(__main__) if x[:3]=='io_']
ns.sort()
print "valid values for N:", ns
print "invalid args:", sys.argv[1:]
sys.exit()
if len(sys.argv) != 2: usage()
func = getattr(__main__, 'io_'+sys.argv[1], None)
if func is None: usage()
sys.argv.pop()
func()

and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing
either /dev/null or a tempfile on my own Linux box. I see...:

[alex@lancelot bo]$ ls -l /x/kjv.txt
-rw-rw-r-- 1 alex alex 4404445 Mar 29 2003 /x/kjv.txt

[alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps

[alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/tmp/kjv
0.04user 0.06system 0:00.19elapsed 51%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps

So, Perl is taking 80 to 100 milliseconds of CPU time (elapsed is
mostly dependent on what else is going on in the machine, and thus
by %CPU available, of course). Let's see Python now:
[alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/tmp/kjv
0.30user 0.01system 0:00.62elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps

Python with fileinput IS slower -- 270 to 300 msecs CPU, about a
factor of 3. However, that IS mostly fileinput's issue. Videat:
[alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/tmp/kjv
0.06user 0.07system 0:00.29elapsed 44%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps

a plain line by line copy takes 100-130 msec -- a bit slower than Perl,
but nothing major. Can we do better yet...?

[alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/tmp/kjv
0.02user 0.06system 0:00.16elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps

....sure! Bulk copy, 50-80 msec, FASTER than Perl. Of course, I'm sure
you can program it faster in Perl, too. After all, cat takes 20-60
msec CPU, so thee's clearly space to do better.
What kind of files do your scripts most often process? For me, a
textfile of 4.4 MB is larger than typical. How much do those few
tens of milliseconds' difference matter? You know your apps, I
don't, but I _would_ find it rather strange if they "disqualified"
either language. Anything below about a second is typically fine
with me, so even the slowest of these programs could still handle
files of about 6 MB, assuming the 50% CPU it got is pretty typical,
while still taking no more than about 1 second's elapsed time.
Of course, you can easily edit my script and play with many other
I/O methods, until you find one that best suits you. Personally,
I tend to use fileinput just because it's so handy (like perl's <>),
not caring all that much about those "wasted" milliseconds...:-)
Alex

Jul 18 '05 #5
Alex Martelli <al***@aleax.it> wrote:
and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing


Can you post URL for the Bible?

--
William Park, Open Geometry Consulting, <op**********@yahoo.ca>
Linux solution for data management and processing.
Jul 18 '05 #6
William Park wrote:
Alex Martelli <al***@aleax.it> wrote:
and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing


Can you post URL for the Bible?


I originally got it from some MySQL stuff by Paul DuBois and decided
it would make a good general dataset for reproducible tests and
benchmarks. A little googling suggests that it comes from:

http://unbound.biola.edu/zips/index.cfm?lang=English

and specifically from the kjv.zip referenced there under the "King
James Version" anchor (unzipped, of course).
Alex

Jul 18 '05 #7
Alex Martelli <al***@aleax.it> wrote in message news:<VV***********************@news2.tin.it>...
It really depends on what you're doing. I tried the following:


Me too!

I ran these on a hp-ux 11.00 box. The input was a 560K html file that
I had laying around.

Overall, python was about 3 times slower than perl...and remarkable
consistant for the three different methods.

stang@ettin$ ll ./print.html
-rw-r--r-- 1 stan users 567154 Oct 29 10:30
../print.html
stang@ettin$ time cio.pl < ./print.html > /dev/null

real 0m0.10s
user 0m0.06s
sys 0m0.03s
stang@ettin$ time cio.pl < ./print.html > /tmp/test

real 0m0.18s
user 0m0.06s
sys 0m0.04s
stang@ettin$ time cio.py 1 < ./print.html > /dev/null

real 0m0.85s
user 0m0.30s
sys 0m0.11s
stang@ettin$ time cio.py 1 < ./print.html > /tmp/test

real 0m0.45s
user 0m0.29s
sys 0m0.11s
stang@ettin$ time cio.py 2 < ./print.html > /dev/null

real 0m0.76s
user 0m0.64s
sys 0m0.11s
stang@ettin$ time cio.py 2 < ./print.html > /tmp/test

real 0m0.81s
user 0m0.64s
sys 0m0.12s
stang@ettin$ time cio.py 3 < ./print.html > /dev/null

real 0m0.43s
user 0m0.16s
sys 0m0.10s
stang@ettin$ time cio.py 3 < ./print.html > /tmp/test

real 0m0.33s
user 0m0.17s
sys 0m0.12s
stang@ettin$

--Stan Graves
st**@SoundInMotionDJ.com
http://www.SoundInMotionDJ.com
Jul 18 '05 #8

Stan> Overall, python was about 3 times slower than perl...and
Stan> remarkable consistant for the three different methods.

What version of Python did you use? Note that 2.3 is significantly faster
than 2.2 in a number of ways.

Skip

Jul 18 '05 #9
Alex Martelli <al***@aleax.it> wrote:
[alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps [alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps [alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps [alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps


But, nothing can beat
time cat < kjv.txt > /dev/null
:-)

--
William Park, Open Geometry Consulting, <op**********@yahoo.ca>
Linux solution for data management and processing.
Jul 18 '05 #10
On 29 Oct 2003 06:34:22 GMT, William Park <op**********@yahoo.ca> wrote:
Alex Martelli <al***@aleax.it> wrote:
and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing


Can you post URL for the Bible?

Try Project Gutenburg, at

http://www.gutenberg.net/

or their new host at

http://www.ibiblio.org/gutenberg/

They have a number of bibles in various languages, and a ton (>10,000 e-texts) of other stuff,
also some audio texts, apparently. BTW I read somewhere that the BBC is going to make all their
archives, video and audio, freely available on the net, except where there is some legal reason
they can't. I guess they're a kind of FEF -- Free Entertainment Foundation (thank you British
telly owners ;-)

Apparently a new King James e-text is at (long URL, or use their search for "bible" (w/o qutoes)
and go to entry #16):

http://www.ibiblio.org/gutenberg/cgi...org/gutenberg/

They also have the Koran, BTW. It's interesting to compare word frequencies, e.g., the 20 most frequent
(unless I goofed) in the texts I downloaded:

"C:\Info\Linguistics\Gutenberg\bible\bible11.t xt"
6647: 'LORD'
6649: 'him'
6856: 'is'
6893: 'be'
6971: 'they'
7249: 'for'
7972: 'a'
8388: 'his'
8854: 'I'
8940: 'unto'
9666: 'he'
9760: 'shall'
12353: 'in'
12592: 'that'
12846: 'And'
13429: 'to'
34472: 'of'
38891: 'and'
62135: 'the'

"C:\Info\Linguistics\Gutenberg\koran\koran10.t xt"
1739: 'ye'
1752: 'with'
1956: 'And'
1979: 'for'
1991: 'who'
2037: 'be'
2108: 'not'
2186: 'that'
2254: 'shall'
2366: 'them'
2575: 'a'
2644: 'they'
2799: 'is'
2900: 'in'
3320: 'God'
5144: 'to'
6855: 'of'
6896: 'and'
10982: 'the'

Both start with the-and-of-to ;-)
(I hope this does not offend anyone ;-)

Regards,
Bengt Richter
Jul 18 '05 #11
William Park <op**********@yahoo.ca> writes:
Alex Martelli <al***@aleax.it> wrote:
[alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps


But, nothing can beat
time cat < kjv.txt > /dev/null
:-)


time true ?

(thinking of

http://mail.python.org/pipermail/pyt...ry/033785.html

)

Cheers,
mwh

--
58. Fools ignore complexity. Pragmatists suffer it. Some can avoid
it. Geniuses remove it.
-- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html
Jul 18 '05 #12
Peter Mutsaers schrieb:
I found that on my (linux) PC, the Python version was 4 times slower.

Is that normal, does it disqualify Python for simple filter scripts?


Have a look at the win32 language shootout (http://dada.perl.it/shootout/)
with lots of benchmarks for lots of languages, among them python,
cygperl (perl with cygwin calls, I assume) and perl (perl with native
win32 calls). cygwin is usually faster than python, perl slower. You
can also look at Doug Bagley's Linux based shootout but it's older
(Perl 5.6, Python 2.1) and currently not maintained.

My impression is that on an average perl is faster than python
but not by an order of magnitude but by ~ 20%.

Your test is probably closest to the "reverse file" benchmark
where cygperl : python : perl = 0.68 : 1.68 : 12.72 on win32
and perl : python = 1.06 : 1.17 on Linux.

Mit freundlichen Gruessen,

Peter Maas

--
-------------------------------------------------------------------
Peter Maas, M+R Infosysteme, D-52070 Aachen, Hubert-Wienen-Str. 24
Tel +49-241-93878-0 Fax +49-241-93878-20 eMail pe********@mplusr.de
-------------------------------------------------------------------

Jul 18 '05 #13
Peter Maas <fp********@netscape.net> writes:
[...]
Your test is probably closest to the "reverse file" benchmark
where cygperl : python : perl = 0.68 : 1.68 : 12.72 on win32
and perl : python = 1.06 : 1.17 on Linux.

[...]

Why on earth is cygperl faster than perl? Is that really correct?
John
Jul 18 '05 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

28
by: Erik Johnson | last post by:
This is somewhat a NEWBIE question... My company maintains a small RDBS driven website. We currently generate HTML using PHP. I've hacked a bit...
68
by: Lad | last post by:
Is anyone capable of providing Python advantages over PHP if there are any? Cheers, L.
134
by: Joseph Garvin | last post by:
As someone who learned C first, when I came to Python everytime I read about a new feature it was like, "Whoa! I can do that?!" Slicing, dir(),...
20
by: xeys_00 | last post by:
I posted a article earlier pertaining programming for my boss. Now I am gonna ask a question about programming for myself. I just finished my first...
27
by: Josh | last post by:
We have a program written in VB6 (over 100,000 lines of code and 230 UI screens) that we want to get out of VB and into a better language. The...
2
by: sri2097 | last post by:
Hi all, I have to select a particular file (using the 'Browse') button in Windows. After this I need to populate the 'Open Dialogue Box' with the...
12
by: rurpy | last post by:
Is there an effcient way (more so than cgi) of using Python with Microsoft IIS? Something equivalent to Perl-ISAPI?
0
by: concettolabs | last post by:
In today's business world, businesses are increasingly turning to PowerApps to develop custom business applications. PowerApps is a powerful tool...
0
by: teenabhardwaj | last post by:
How would one discover a valid source for learning news, comfort, and help for engineering designs? Covering through piles of books takes a lot of...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...
2
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.