473,554 Members | 2,949 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Python slow for filter scripts

Hello,

Up to now I mostly wrote simple filter scripts in Perl, e.g.

while(<>) {
# do something with $_, regexp matching, replacements etc.
print;
}

Now I learned Python and like it much more as a language.

However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

Is that normal, does it disqualify Python for simple filter scripts?
Jul 18 '05 #1
13 4010

"Peter Mutsaers" <pl*@gmx.li> wrote in message
news:97******** *************** ***@posting.goo gle.com...
However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

There are several ways to copy a file in Python. Some are much faster
than others. I believe 'for line in file: print line' might be
fastest with python code. There is also shutil.copyfile (src,dst) or
one of other variants. Which did you use?
Is that normal, does it disqualify Python for simple filter scripts?


I have read that Perl is optimized for file read/write in a way that
Python is not, so this may not be most representative comparison for
your actual app. In any case, relevance of relative speed depends on
absolute speed (think about milleseconds versus hours).

Terry J. Reedy
Jul 18 '05 #2
Peter Mutsaers wrote:
Hello,

Up to now I mostly wrote simple filter scripts in Perl, e.g.

while(<>) {
# do something with $_, regexp matching, replacements etc.
print;
}

Now I learned Python and like it much more as a language.

However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

Is that normal, does it disqualify Python for simple filter scripts?


You don't show the Python script you use, so there's no way for us to
tell whether it is possible to do it more efficiently.

Also, what size file did you use? Unless you tried it with a large
enough file, so that the time was proportional to the file size, you may
just have measured the difference in the startup time for perl vs. python.

Finally, the relative performance of two languages on Task X is not a
very good predictor of their relative performance on Task Y, so you are
probably better off doing a comparison of the actual task you are
interested in.

David

Jul 18 '05 #3
In article <riCnb.36947$mZ 5.185175@attbi_ s54>,
David C. Fox <da*******@post .harvard.edu> wrote:
Peter Mutsaers wrote:

Jul 18 '05 #4
Peter Mutsaers wrote:
Hello,

Up to now I mostly wrote simple filter scripts in Perl, e.g.

while(<>) {
# do something with $_, regexp matching, replacements etc.
print;
}

Now I learned Python and like it much more as a language.

However, I tried the most simple while(<>) {print;} in Perl versus
Python, just a copy from stdin to stdout, to see how fast the basic
filter can be.

I found that on my (linux) PC, the Python version was 4 times slower.

Is that normal, does it disqualify Python for simple filter scripts?


It really depends on what you're doing. I tried the following:

cio.pl:
while(<>) {
print;
}

cio.py:
import sys
import fileinput
import shutil

emit = sys.stdout.writ e

def io_1(emit=emit) :
for line in sys.stdin: emit(line)

def io_2(emit=emit) :
for line in fileinput.input (): emit(line)

def io_3():
shutil.copyfile obj(sys.stdin, sys.stdout)
if __name__=='__ma in__':
import __main__

def usage():
sys.stdout = sys.stderr
print "Usage: %s N" % sys.argv[0]
print "N indicates what stdin->stdout copy function to run"
ns = [x[3:] for x in dir(__main__) if x[:3]=='io_']
ns.sort()
print "valid values for N:", ns
print "invalid args:", sys.argv[1:]
sys.exit()
if len(sys.argv) != 2: usage()
func = getattr(__main_ _, 'io_'+sys.argv[1], None)
if func is None: usage()
sys.argv.pop()
func()

and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing
either /dev/null or a tempfile on my own Linux box. I see...:

[alex@lancelot bo]$ ls -l /x/kjv.txt
-rw-rw-r-- 1 alex alex 4404445 Mar 29 2003 /x/kjv.txt

[alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgd ata 0maxresident)k
0inputs+0output s (330major+61min or)pagefaults 0swaps

[alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/tmp/kjv
0.04user 0.06system 0:00.19elapsed 51%CPU (0avgtext+0avgd ata 0maxresident)k
0inputs+0output s (330major+61min or)pagefaults 0swaps

So, Perl is taking 80 to 100 milliseconds of CPU time (elapsed is
mostly dependent on what else is going on in the machine, and thus
by %CPU available, of course). Let's see Python now:
[alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgd ata 0maxresident)k
0inputs+0output s (448major+278mi nor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/tmp/kjv
0.30user 0.01system 0:00.62elapsed 49%CPU (0avgtext+0avgd ata 0maxresident)k
0inputs+0output s (448major+278mi nor)pagefaults 0swaps

Python with fileinput IS slower -- 270 to 300 msecs CPU, about a
factor of 3. However, that IS mostly fileinput's issue. Videat:
[alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgd ata 0maxresident)k
0inputs+0output s (447major+276mi nor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/tmp/kjv
0.06user 0.07system 0:00.29elapsed 44%CPU (0avgtext+0avgd ata 0maxresident)k
0inputs+0output s (447major+276mi nor)pagefaults 0swaps

a plain line by line copy takes 100-130 msec -- a bit slower than Perl,
but nothing major. Can we do better yet...?

[alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgd ata 0maxresident)k
0inputs+0output s (447major+275mi nor)pagefaults 0swaps

[alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/tmp/kjv
0.02user 0.06system 0:00.16elapsed 49%CPU (0avgtext+0avgd ata 0maxresident)k
0inputs+0output s (447major+275mi nor)pagefaults 0swaps

....sure! Bulk copy, 50-80 msec, FASTER than Perl. Of course, I'm sure
you can program it faster in Perl, too. After all, cat takes 20-60
msec CPU, so thee's clearly space to do better.
What kind of files do your scripts most often process? For me, a
textfile of 4.4 MB is larger than typical. How much do those few
tens of milliseconds' difference matter? You know your apps, I
don't, but I _would_ find it rather strange if they "disqualifi ed"
either language. Anything below about a second is typically fine
with me, so even the slowest of these programs could still handle
files of about 6 MB, assuming the 50% CPU it got is pretty typical,
while still taking no more than about 1 second's elapsed time.
Of course, you can easily edit my script and play with many other
I/O methods, until you find one that best suits you. Personally,
I tend to use fileinput just because it's so handy (like perl's <>),
not caring all that much about those "wasted" milliseconds... :-)
Alex

Jul 18 '05 #5
Alex Martelli <al***@aleax.it > wrote:
and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing


Can you post URL for the Bible?

--
William Park, Open Geometry Consulting, <op**********@y ahoo.ca>
Linux solution for data management and processing.
Jul 18 '05 #6
William Park wrote:
Alex Martelli <al***@aleax.it > wrote:
and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing


Can you post URL for the Bible?


I originally got it from some MySQL stuff by Paul DuBois and decided
it would make a good general dataset for reproducible tests and
benchmarks. A little googling suggests that it comes from:

http://unbound.biola.edu/zips/index.cfm?lang=English

and specifically from the kjv.zip referenced there under the "King
James Version" anchor (unzipped, of course).
Alex

Jul 18 '05 #7
Alex Martelli <al***@aleax.it > wrote in message news:<VV******* *************** *@news2.tin.it> ...
It really depends on what you're doing. I tried the following:


Me too!

I ran these on a hp-ux 11.00 box. The input was a 560K html file that
I had laying around.

Overall, python was about 3 times slower than perl...and remarkable
consistant for the three different methods.

stang@ettin$ ll ./print.html
-rw-r--r-- 1 stan users 567154 Oct 29 10:30
../print.html
stang@ettin$ time cio.pl < ./print.html > /dev/null

real 0m0.10s
user 0m0.06s
sys 0m0.03s
stang@ettin$ time cio.pl < ./print.html > /tmp/test

real 0m0.18s
user 0m0.06s
sys 0m0.04s
stang@ettin$ time cio.py 1 < ./print.html > /dev/null

real 0m0.85s
user 0m0.30s
sys 0m0.11s
stang@ettin$ time cio.py 1 < ./print.html > /tmp/test

real 0m0.45s
user 0m0.29s
sys 0m0.11s
stang@ettin$ time cio.py 2 < ./print.html > /dev/null

real 0m0.76s
user 0m0.64s
sys 0m0.11s
stang@ettin$ time cio.py 2 < ./print.html > /tmp/test

real 0m0.81s
user 0m0.64s
sys 0m0.12s
stang@ettin$ time cio.py 3 < ./print.html > /dev/null

real 0m0.43s
user 0m0.16s
sys 0m0.10s
stang@ettin$ time cio.py 3 < ./print.html > /tmp/test

real 0m0.33s
user 0m0.17s
sys 0m0.12s
stang@ettin$

--Stan Graves
st**@SoundInMot ionDJ.com
http://www.SoundInMotionDJ.com
Jul 18 '05 #8

Stan> Overall, python was about 3 times slower than perl...and
Stan> remarkable consistant for the three different methods.

What version of Python did you use? Note that 2.3 is significantly faster
than 2.2 in a number of ways.

Skip

Jul 18 '05 #9
Alex Martelli <al***@aleax.it > wrote:
[alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgd ata 0maxresident)k
0inputs+0output s (330major+61min or)pagefaults 0swaps [alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgd ata 0maxresident)k
0inputs+0output s (448major+278mi nor)pagefaults 0swaps [alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgd ata 0maxresident)k
0inputs+0output s (447major+276mi nor)pagefaults 0swaps [alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgd ata 0maxresident)k
0inputs+0output s (447major+275mi nor)pagefaults 0swaps


But, nothing can beat
time cat < kjv.txt > /dev/null
:-)

--
William Park, Open Geometry Consulting, <op**********@y ahoo.ca>
Linux solution for data management and processing.
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

28
4586
by: Erik Johnson | last post by:
This is somewhat a NEWBIE question... My company maintains a small RDBS driven website. We currently generate HTML using PHP. I've hacked a bit in Python, and generally think it is a rather cool language. I've done Perl and like it, there are a few features of PHP I like but overall am not too excited about it. I have found PHP's...
68
5799
by: Lad | last post by:
Is anyone capable of providing Python advantages over PHP if there are any? Cheers, L.
134
6007
by: Joseph Garvin | last post by:
As someone who learned C first, when I came to Python everytime I read about a new feature it was like, "Whoa! I can do that?!" Slicing, dir(), getattr/setattr, the % operator, all of this was very different from C. I'm curious -- what is everyone's favorite trick from a non-python language? And -- why isn't it in Python? Here's my...
20
2134
by: xeys_00 | last post by:
I posted a article earlier pertaining programming for my boss. Now I am gonna ask a question about programming for myself. I just finished my first C++ Class. Next semester is a class on encryption(and it's probably gonna be a math class too). And finally back in programming in the fall with C++ and Java 1. The C++ will cover pointers, and...
27
3752
by: Josh | last post by:
We have a program written in VB6 (over 100,000 lines of code and 230 UI screens) that we want to get out of VB and into a better language. The program is over 10 years old and has already been ported from VB3 to VB6, a job which took over two years. We would like to port it to Python, but we need to continue to offer upgrades and fixes to the...
2
2057
by: sri2097 | last post by:
Hi all, I have to select a particular file (using the 'Browse') button in Windows. After this I need to populate the 'Open Dialogue Box' with the path of the file I need (I have the entier path of the file I need). Then I need to select the 'Open' Button. Only after this the file I want is attached. Any idea as to how this can be done...
12
2327
by: rurpy | last post by:
Is there an effcient way (more so than cgi) of using Python with Microsoft IIS? Something equivalent to Perl-ISAPI?
0
7496
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8008
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
6114
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5135
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3538
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3525
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1992
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1109
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
810
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.