473,320 Members | 1,976 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Python slow for filter scripts

Hello,

Up to now I mostly wrote simple filter scripts in Perl, e.g.

while(<>) {
# do something with $_, regexp matching, replacements etc.
print;
}

Now I learned Python and like it much more as a language.

However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

Is that normal, does it disqualify Python for simple filter scripts?
Jul 18 '05 #1
13 3991

"Peter Mutsaers" <pl*@gmx.li> wrote in message
news:97**************************@posting.google.c om...
However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

There are several ways to copy a file in Python. Some are much faster
than others. I believe 'for line in file: print line' might be
fastest with python code. There is also shutil.copyfile(src,dst) or
one of other variants. Which did you use?
Is that normal, does it disqualify Python for simple filter scripts?


I have read that Perl is optimized for file read/write in a way that
Python is not, so this may not be most representative comparison for
your actual app. In any case, relevance of relative speed depends on
absolute speed (think about milleseconds versus hours).

Terry J. Reedy
Jul 18 '05 #2
Peter Mutsaers wrote:
Hello,

Up to now I mostly wrote simple filter scripts in Perl, e.g.

while(<>) {
# do something with $_, regexp matching, replacements etc.
print;
}

Now I learned Python and like it much more as a language.

However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

Is that normal, does it disqualify Python for simple filter scripts?


You don't show the Python script you use, so there's no way for us to
tell whether it is possible to do it more efficiently.

Also, what size file did you use? Unless you tried it with a large
enough file, so that the time was proportional to the file size, you may
just have measured the difference in the startup time for perl vs. python.

Finally, the relative performance of two languages on Task X is not a
very good predictor of their relative performance on Task Y, so you are
probably better off doing a comparison of the actual task you are
interested in.

David

Jul 18 '05 #3
In article <riCnb.36947$mZ5.185175@attbi_s54>,
David C. Fox <da*******@post.harvard.edu> wrote:
Peter Mutsaers wrote:

Jul 18 '05 #4
Peter Mutsaers wrote:
Hello,

Up to now I mostly wrote simple filter scripts in Perl, e.g.

while(<>) {
# do something with $_, regexp matching, replacements etc.
print;
}

Now I learned Python and like it much more as a language.

However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

Is that normal, does it disqualify Python for simple filter scripts?


It really depends on what you're doing. I tried the following:

cio.pl:
while(<>) {
print;
}

cio.py:
import sys
import fileinput
import shutil

emit = sys.stdout.write

def io_1(emit=emit):
for line in sys.stdin: emit(line)

def io_2(emit=emit):
for line in fileinput.input(): emit(line)

def io_3():
shutil.copyfileobj(sys.stdin, sys.stdout)
if __name__=='__main__':
import __main__

def usage():
sys.stdout = sys.stderr
print "Usage: %s N" % sys.argv[0]
print "N indicates what stdin->stdout copy function to run"
ns = [x[3:] for x in dir(__main__) if x[:3]=='io_']
ns.sort()
print "valid values for N:", ns
print "invalid args:", sys.argv[1:]
sys.exit()
if len(sys.argv) != 2: usage()
func = getattr(__main__, 'io_'+sys.argv[1], None)
if func is None: usage()
sys.argv.pop()
func()

and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing
either /dev/null or a tempfile on my own Linux box. I see...:

[alex@lancelot bo]$ ls -l /x/kjv.txt
-rw-rw-r-- 1 alex alex 4404445 Mar 29 2003 /x/kjv.txt

[alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps

[alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/tmp/kjv
0.04user 0.06system 0:00.19elapsed 51%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps

So, Perl is taking 80 to 100 milliseconds of CPU time (elapsed is
mostly dependent on what else is going on in the machine, and thus
by %CPU available, of course). Let's see Python now:
[alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/tmp/kjv
0.30user 0.01system 0:00.62elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps

Python with fileinput IS slower -- 270 to 300 msecs CPU, about a
factor of 3. However, that IS mostly fileinput's issue. Videat:
[alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/tmp/kjv
0.06user 0.07system 0:00.29elapsed 44%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps

a plain line by line copy takes 100-130 msec -- a bit slower than Perl,
but nothing major. Can we do better yet...?

[alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/tmp/kjv
0.02user 0.06system 0:00.16elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps

....sure! Bulk copy, 50-80 msec, FASTER than Perl. Of course, I'm sure
you can program it faster in Perl, too. After all, cat takes 20-60
msec CPU, so thee's clearly space to do better.
What kind of files do your scripts most often process? For me, a
textfile of 4.4 MB is larger than typical. How much do those few
tens of milliseconds' difference matter? You know your apps, I
don't, but I _would_ find it rather strange if they "disqualified"
either language. Anything below about a second is typically fine
with me, so even the slowest of these programs could still handle
files of about 6 MB, assuming the 50% CPU it got is pretty typical,
while still taking no more than about 1 second's elapsed time.
Of course, you can easily edit my script and play with many other
I/O methods, until you find one that best suits you. Personally,
I tend to use fileinput just because it's so handy (like perl's <>),
not caring all that much about those "wasted" milliseconds...:-)
Alex

Jul 18 '05 #5
Alex Martelli <al***@aleax.it> wrote:
and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing


Can you post URL for the Bible?

--
William Park, Open Geometry Consulting, <op**********@yahoo.ca>
Linux solution for data management and processing.
Jul 18 '05 #6
William Park wrote:
Alex Martelli <al***@aleax.it> wrote:
and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing


Can you post URL for the Bible?


I originally got it from some MySQL stuff by Paul DuBois and decided
it would make a good general dataset for reproducible tests and
benchmarks. A little googling suggests that it comes from:

http://unbound.biola.edu/zips/index.cfm?lang=English

and specifically from the kjv.zip referenced there under the "King
James Version" anchor (unzipped, of course).
Alex

Jul 18 '05 #7
Alex Martelli <al***@aleax.it> wrote in message news:<VV***********************@news2.tin.it>...
It really depends on what you're doing. I tried the following:


Me too!

I ran these on a hp-ux 11.00 box. The input was a 560K html file that
I had laying around.

Overall, python was about 3 times slower than perl...and remarkable
consistant for the three different methods.

stang@ettin$ ll ./print.html
-rw-r--r-- 1 stan users 567154 Oct 29 10:30
../print.html
stang@ettin$ time cio.pl < ./print.html > /dev/null

real 0m0.10s
user 0m0.06s
sys 0m0.03s
stang@ettin$ time cio.pl < ./print.html > /tmp/test

real 0m0.18s
user 0m0.06s
sys 0m0.04s
stang@ettin$ time cio.py 1 < ./print.html > /dev/null

real 0m0.85s
user 0m0.30s
sys 0m0.11s
stang@ettin$ time cio.py 1 < ./print.html > /tmp/test

real 0m0.45s
user 0m0.29s
sys 0m0.11s
stang@ettin$ time cio.py 2 < ./print.html > /dev/null

real 0m0.76s
user 0m0.64s
sys 0m0.11s
stang@ettin$ time cio.py 2 < ./print.html > /tmp/test

real 0m0.81s
user 0m0.64s
sys 0m0.12s
stang@ettin$ time cio.py 3 < ./print.html > /dev/null

real 0m0.43s
user 0m0.16s
sys 0m0.10s
stang@ettin$ time cio.py 3 < ./print.html > /tmp/test

real 0m0.33s
user 0m0.17s
sys 0m0.12s
stang@ettin$

--Stan Graves
st**@SoundInMotionDJ.com
http://www.SoundInMotionDJ.com
Jul 18 '05 #8

Stan> Overall, python was about 3 times slower than perl...and
Stan> remarkable consistant for the three different methods.

What version of Python did you use? Note that 2.3 is significantly faster
than 2.2 in a number of ways.

Skip

Jul 18 '05 #9
Alex Martelli <al***@aleax.it> wrote:
[alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps [alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps [alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps [alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps


But, nothing can beat
time cat < kjv.txt > /dev/null
:-)

--
William Park, Open Geometry Consulting, <op**********@yahoo.ca>
Linux solution for data management and processing.
Jul 18 '05 #10
On 29 Oct 2003 06:34:22 GMT, William Park <op**********@yahoo.ca> wrote:
Alex Martelli <al***@aleax.it> wrote:
and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing


Can you post URL for the Bible?

Try Project Gutenburg, at

http://www.gutenberg.net/

or their new host at

http://www.ibiblio.org/gutenberg/

They have a number of bibles in various languages, and a ton (>10,000 e-texts) of other stuff,
also some audio texts, apparently. BTW I read somewhere that the BBC is going to make all their
archives, video and audio, freely available on the net, except where there is some legal reason
they can't. I guess they're a kind of FEF -- Free Entertainment Foundation (thank you British
telly owners ;-)

Apparently a new King James e-text is at (long URL, or use their search for "bible" (w/o qutoes)
and go to entry #16):

http://www.ibiblio.org/gutenberg/cgi...org/gutenberg/

They also have the Koran, BTW. It's interesting to compare word frequencies, e.g., the 20 most frequent
(unless I goofed) in the texts I downloaded:

"C:\Info\Linguistics\Gutenberg\bible\bible11.t xt"
6647: 'LORD'
6649: 'him'
6856: 'is'
6893: 'be'
6971: 'they'
7249: 'for'
7972: 'a'
8388: 'his'
8854: 'I'
8940: 'unto'
9666: 'he'
9760: 'shall'
12353: 'in'
12592: 'that'
12846: 'And'
13429: 'to'
34472: 'of'
38891: 'and'
62135: 'the'

"C:\Info\Linguistics\Gutenberg\koran\koran10.t xt"
1739: 'ye'
1752: 'with'
1956: 'And'
1979: 'for'
1991: 'who'
2037: 'be'
2108: 'not'
2186: 'that'
2254: 'shall'
2366: 'them'
2575: 'a'
2644: 'they'
2799: 'is'
2900: 'in'
3320: 'God'
5144: 'to'
6855: 'of'
6896: 'and'
10982: 'the'

Both start with the-and-of-to ;-)
(I hope this does not offend anyone ;-)

Regards,
Bengt Richter
Jul 18 '05 #11
William Park <op**********@yahoo.ca> writes:
Alex Martelli <al***@aleax.it> wrote:
[alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps


But, nothing can beat
time cat < kjv.txt > /dev/null
:-)


time true ?

(thinking of

http://mail.python.org/pipermail/pyt...ry/033785.html

)

Cheers,
mwh

--
58. Fools ignore complexity. Pragmatists suffer it. Some can avoid
it. Geniuses remove it.
-- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html
Jul 18 '05 #12
Peter Mutsaers schrieb:
I found that on my (linux) PC, the Python version was 4 times slower.

Is that normal, does it disqualify Python for simple filter scripts?


Have a look at the win32 language shootout (http://dada.perl.it/shootout/)
with lots of benchmarks for lots of languages, among them python,
cygperl (perl with cygwin calls, I assume) and perl (perl with native
win32 calls). cygwin is usually faster than python, perl slower. You
can also look at Doug Bagley's Linux based shootout but it's older
(Perl 5.6, Python 2.1) and currently not maintained.

My impression is that on an average perl is faster than python
but not by an order of magnitude but by ~ 20%.

Your test is probably closest to the "reverse file" benchmark
where cygperl : python : perl = 0.68 : 1.68 : 12.72 on win32
and perl : python = 1.06 : 1.17 on Linux.

Mit freundlichen Gruessen,

Peter Maas

--
-------------------------------------------------------------------
Peter Maas, M+R Infosysteme, D-52070 Aachen, Hubert-Wienen-Str. 24
Tel +49-241-93878-0 Fax +49-241-93878-20 eMail pe********@mplusr.de
-------------------------------------------------------------------

Jul 18 '05 #13
Peter Maas <fp********@netscape.net> writes:
[...]
Your test is probably closest to the "reverse file" benchmark
where cygperl : python : perl = 0.68 : 1.68 : 12.72 on win32
and perl : python = 1.06 : 1.17 on Linux.

[...]

Why on earth is cygperl faster than perl? Is that really correct?
John
Jul 18 '05 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

28
by: Erik Johnson | last post by:
This is somewhat a NEWBIE question... My company maintains a small RDBS driven website. We currently generate HTML using PHP. I've hacked a bit in Python, and generally think it is a rather...
68
by: Lad | last post by:
Is anyone capable of providing Python advantages over PHP if there are any? Cheers, L.
134
by: Joseph Garvin | last post by:
As someone who learned C first, when I came to Python everytime I read about a new feature it was like, "Whoa! I can do that?!" Slicing, dir(), getattr/setattr, the % operator, all of this was very...
20
by: xeys_00 | last post by:
I posted a article earlier pertaining programming for my boss. Now I am gonna ask a question about programming for myself. I just finished my first C++ Class. Next semester is a class on...
27
by: Josh | last post by:
We have a program written in VB6 (over 100,000 lines of code and 230 UI screens) that we want to get out of VB and into a better language. The program is over 10 years old and has already been...
2
by: sri2097 | last post by:
Hi all, I have to select a particular file (using the 'Browse') button in Windows. After this I need to populate the 'Open Dialogue Box' with the path of the file I need (I have the entier path of...
12
by: rurpy | last post by:
Is there an effcient way (more so than cgi) of using Python with Microsoft IIS? Something equivalent to Perl-ISAPI?
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.