472,373 Members | 1,530 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,373 software developers and data experts.

How can I speed this function up?

This is just some dummy code to mimic what's being done in the real
code. The actual code is python which is used as a scripting language in
a third party app. The data structure returned by the app is more or
less like the "data" list in the code below. The test for "ELEMENT" is
necessary ... it just evaluates to true every time in this test code. In
the real app perhaps 90% of tests will also be true.

So my question is how can I speed up what's happening inside the
function write_data()? Only allowed to use vanilla python (no psycho or
other libraries outside of a vanilla python install).

I have a vested interest in showing a colleague that a python app can
yield results in a time comparable to his C-app, which he feels is mch
faster. I'd like to know what I can do within the constraints of the
python language to get the best speed possible. Hope someone can help.

def write_data1(out, data):
for i in data:
if i[0] is 'ELEMENT':
out.write("%s %06d " % (i[0], i[1]))
for j in i[2]:
out.write("%d " % (j))
out.write("\n")

import timeit

# basic data mimicing data returned from 3rd party app
data = []
for i in range(500000):
data.append(("ELEMENT", i, (1,2,3,4,5,6)))

# write data out to file
fname = "test2.txt"
out = open(fname,'w')
start= timeit.time.clock()
write_data2(out, data)
out.close()
print timeit.time.clock()-start
Nov 18 '06 #1
15 1387
Chris wrote:
This is just some dummy code to mimic what's being done in the real
code. The actual code is python which is used as a scripting language in
a third party app. The data structure returned by the app is more or
less like the "data" list in the code below. The test for "ELEMENT" is
necessary ... it just evaluates to true every time in this test code. In
the real app perhaps 90% of tests will also be true.

So my question is how can I speed up what's happening inside the
function write_data()? Only allowed to use vanilla python (no psycho or
other libraries outside of a vanilla python install).

I have a vested interest in showing a colleague that a python app can
yield results in a time comparable to his C-app, which he feels is mch
faster. I'd like to know what I can do within the constraints of the
python language to get the best speed possible. Hope someone can help.

def write_data1(out, data):
for i in data:
if i[0] is 'ELEMENT':
out.write("%s %06d " % (i[0], i[1]))
for j in i[2]:
out.write("%d " % (j))
out.write("\n")

import timeit

# basic data mimicing data returned from 3rd party app
data = []
for i in range(500000):
data.append(("ELEMENT", i, (1,2,3,4,5,6)))

# write data out to file
fname = "test2.txt"
out = open(fname,'w')
start= timeit.time.clock()
write_data2(out, data)
out.close()
print timeit.time.clock()-start

with this function I went from 8.04 s to 6.61 s. Now running up against
my limited knowledge of python. Any chance of getting faster?

def write_data4(out, data):
for i in data:
if i[0] is 'ELEMENT':
strx = "%s %06d " % (i[0], i[1])
strx="".join([strx] + ["%d " % (j) for j in i[2]] + "\n"])
out.write(strx)

Nov 18 '06 #2

"Chris" <cf*****@bigpond.net.auwrote in message
news:ko*******************@news-server.bigpond.net.au...
def write_data1(out, data):
for i in data:
if i[0] is 'ELEMENT':
Testing for equality with 'is' is a bit of a cheat since it is
implementation dependent,
but since you have a somewhat unfair constraint ....
out.write("%s %06d " % (i[0], i[1]))
Since i[0] is tested to be "ELEMENT', this should be the same as
out.write("ELEMENT %06d " % i[1])
which saves constructing a tuple as well as an interpolation.
for j in i[2]:
out.write("%d " % (j))
out.write("\n")
tjr

Nov 18 '06 #3
Hi, Chris.
I made a trivial testing framework for this cute problem and tried a
couple of modifications. I also added the 10% of non-ELEMENT lines you
mentioned. First thing, your updated algorithm didn't really get me much
faster results than the original. I guess that my disk array sort of
hides the multiple write penalty. But I experimented with various
algorithms. Here's the code in its entirety:
http://www.rafb.net/paste/results/ZuW4fK85.html My results (Python 2.4,
32bit Fedora Core) were:

[ksh@lapoire tmp]# python test.py
Preparing data...
[write_data1] Preparing output file...
[write_data1] Writing...
[write_data1] Done in 10.73 seconds.
[write_data4] Preparing output file...
[write_data4] Writing...
[write_data4] Done in 10.46 seconds.
[write_data_flush] Preparing output file...
[write_data_flush] Writing...
[write_data_flush] Done in 9.09 seconds.
[write_data_per_line] Preparing output file...
[write_data_per_line] Writing...
[write_data_per_line] Done in 9.71 seconds.
[write_data_once] Preparing output file...
[write_data_once] Writing...
[write_data_once] Done in 7.82 seconds.

I'm pretty sure that your measures will vary (observing your results
you seem to have a faster CPU but slower disk(s)). But you can just take
what works best for you. I'm also quite confident that you won't be able
to catch up C since as you can see Python's data structures are far more
flexible and thus require more processing overhead.

Regards,
Łukasz

Nov 18 '06 #4
At Friday 17/11/2006 23:40, Chris wrote:
>This is just some dummy code to mimic what's being done in the real
code. The actual code is python which is used as a scripting language in
a third party app. The data structure returned by the app is more or
less like the "data" list in the code below. The test for "ELEMENT" is
necessary ... it just evaluates to true every time in this test code. In
the real app perhaps 90% of tests will also be true.

So my question is how can I speed up what's happening inside the
function write_data()? Only allowed to use vanilla python (no psycho or
other libraries outside of a vanilla python install).

I have a vested interest in showing a colleague that a python app can
yield results in a time comparable to his C-app, which he feels is mch
faster. I'd like to know what I can do within the constraints of the
python language to get the best speed possible. Hope someone can help.
If you can assume that all items have 6 numbers, it appears best to
unroll the inner iteration. Below is my best attempt with ideas from
other replies too, including some alternatives. The timing is only
approximate and had a wide dispersion; median of three. But it's
clear that the main gain comes from calling out.write only once:

Notice that you can't, in general, use i[0] is 'ELEMENT' unless you
can guarantee that i[0] is an interned string (and if it comes from
another process, chances are it isn't). Using intern(i[0]) is
'ELEMENT' would work, but slows down your program.

# initial: 11.66s
def write_data1(out, data):
write = out.write
for i in data:
if i[0] == 'ELEMENT': # sorry but can't guarantee identity

# 6.21s
write("ELEMENT %06d %s\n" % (i[1], "%d %d %d %d %d %d " % i[2]))

# 6.92s
# write("ELEMENT %06d %s \n" % (i[1], " ".join(map(str,i[2]))))

# 8.30s [i]
# i2 = i[2]
# write("ELEMENT %06d %d %d %d %d %d %d \n" % (i[1],
i2[0], i2[1], i2[2], i2[3], i2[4], i2[5]))

# 7.04s __getitem__
# i2 = i[2].__getitem__
# write("ELEMENT %06d %d %d %d %d %d %d \n" % (i[1],
i2(0), i2(1), i2(2), i2(3), i2(4), i2(5)))

--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam gratis!
Abr tu cuenta ya! - http://correo.yahoo.com.ar
Nov 18 '06 #5
Chris wrote:
This is just some dummy code to mimic what's being done in the real
code. The actual code is python which is used as a scripting language in
a third party app. The data structure returned by the app is more or
less like the "data" list in the code below. The test for "ELEMENT" is
necessary ... it just evaluates to true every time in this test code. In
the real app perhaps 90% of tests will also be true.

So my question is how can I speed up what's happening inside the
function write_data()? Only allowed to use vanilla python (no psycho or
other libraries outside of a vanilla python install).
Try collecting your output into bigger chunks before writing it out. For
example, take a look at:

def write_data2(out, data):
buffer = []
append = buffer.append
extend = buffer.extend
for i in data:
if i[0] == 'ELEMENT':
append("ELEMENT %06d " % i[1])
extend(map(str, i[2]))
append('\n')
out.write(''.join(buffer))
def write_data3(out, data):
buffer = []
append = buffer.append
for i in data:
if i[0] == 'ELEMENT':
append(("ELEMENT %06d %s" % (i[1],' '.join(map(str,i[2])))))
out.write('\n'.join(buffer))
Both of these run almost twice as fast as the original below (although
admittedly I didn't check that they were actually right). Using some of
the other suggestions mentioned in this thread may make things better
still. It's possible that some intermediate chunk size might be better
than collecting everything into one string, I dunno.

cStringIO might be helpful here as a buffer instead of using lists, but
I don't have time to try it right now.

-tim

>
I have a vested interest in showing a colleague that a python app can
yield results in a time comparable to his C-app, which he feels is mch
faster. I'd like to know what I can do within the constraints of the
python language to get the best speed possible. Hope someone can help.

def write_data1(out, data):
for i in data:
if i[0] is 'ELEMENT':
out.write("%s %06d " % (i[0], i[1]))
for j in i[2]:
out.write("%d " % (j))
out.write("\n")

import timeit

# basic data mimicing data returned from 3rd party app
data = []
for i in range(500000):
data.append(("ELEMENT", i, (1,2,3,4,5,6)))

# write data out to file
fname = "test2.txt"
out = open(fname,'w')
start= timeit.time.clock()
write_data2(out, data)
out.close()
print timeit.time.clock()-start

Nov 18 '06 #6
Chris wrote:
I have a vested interest in showing a colleague that a python app can
yield results in a time comparable to his C-app, which he feels is mch
faster. I'd like to know what I can do within the constraints of the
python language to get the best speed possible. Hope someone can help.
Fight smart!
How long did the C-app take to write?
How robust are the C and the Python versions w.r.t. unforeseen inputs?
Mimic the software life-cycle:
* How long would it take to make each program work on Windows?, Mac?
* How long would it take to 'fully' test each program?
How easy is it to explain each prog. to an audience that have
programmed, but never in C or Python?
How long would it take to add another feature?

Best and best speed can have many meanings. good luck.

- Paddy.

Nov 18 '06 #7


On Nov 18, 2:05 pm, Chris <cfri...@bigpond.net.auwrote:
with this function I went from 8.04 s to 6.61 s.
And your code became less understandable.
Now running up against
my limited knowledge of python. Any chance of getting faster?
You have saved 1.4 *seconds*. What is the normal running time for this
app with 0.5M records? What is 1.4 seconds as a percentage of that?

Please consider that you are barking up the wrong gum tree. Competing
with a C app on speed is not something that experienced Python
programmers would take on lightly.

Talk to your colleague about some of these factors: time to write code,
robustness, clarity, ease of maintenance.

Cheers,
John

Nov 18 '06 #8


On Nov 18, 4:23 pm, Gabriel Genellina <gagsl...@yahoo.com.arwrote:
If you can assume that all items have 6 numbers, it appears best to
unroll the inner iteration.
Is this meant to be some kind of joke?
If so, you should have festooned it with smilies.
If not, please proceed straight to http://www.thedailyWTF.com and
nominate yourself.

Nov 18 '06 #9
At Saturday 18/11/2006 05:09, John Machin wrote:
If you can assume that all items have 6 numbers, it appears best to
unroll the inner iteration.

Is this meant to be some kind of joke?
If so, you should have festooned it with smilies.
If not, please proceed straight to http://www.thedailyWTF.com and
nominate yourself.
....?
--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam gratis!
Abr tu cuenta ya! - http://correo.yahoo.com.ar
Nov 18 '06 #10
Just to show how much a system set up
impacts these results:
Result from suse10.1 64 , python 2.4
with AMD FX-55 cpu and about 12 active apps
running in the background. 7200rpm sata drives.

Preparing data...
[write_data1] Preparing output file...
[write_data1] Writing...
[write_data1] Done in 5.43 seconds.
[write_data4] Preparing output file...
[write_data4] Writing...
[write_data4] Done in 4.41 seconds.
[write_data_flush] Preparing output file...
[write_data_flush] Writing...
[write_data_flush] Done in 5.41 seconds.
[write_data_per_line] Preparing output file...
[write_data_per_line] Writing...
[write_data_per_line] Done in 4.4 seconds.
[write_data_once] Preparing output file...
[write_data_once] Writing...
[write_data_once] Done in 4.28 seconds.

Nov 18 '06 #11

Gabriel Genellina wrote:
At Saturday 18/11/2006 05:09, John Machin wrote:
If you can assume that all items have 6 numbers, it appears best to
unroll the inner iteration.
Is this meant to be some kind of joke?
If so, you should have festooned it with smilies.
If not, please proceed straight to http://www.thedailyWTF.com and
nominate yourself.

...?
We already have a case where the best response to the OP was like
Paddy's response, *not* to answer the question literally.

Then: "loop unrolling"? "assume" with no comments and no assertions?

Nov 18 '06 #12
Chris wrote:
This is just some dummy code to mimic what's being done in the real
code. The actual code is python which is used as a scripting language in
a third party app. The data structure returned by the app is more or
less like the "data" list in the code below. The test for "ELEMENT" is
necessary ... it just evaluates to true every time in this test code. In
the real app perhaps 90% of tests will also be true.
As others have said, without info about what's happening in C, there's
no way to know what's equivalent or fast enough.
So my question is how can I speed up what's happening inside the
function write_data()? Only allowed to use vanilla python (no psycho or
other libraries outside of a vanilla python install).
Generally, don't create objects, don't perform repeated operations. In
this case, batch up I/O.
def write_data1(out, data):
for i in data:
if i[0] is 'ELEMENT':
out.write("%s %06d " % (i[0], i[1]))
for j in i[2]:
out.write("%d " % (j))
out.write("\n")
def write_data1(out, data, map=map, str=str):
SPACE_JOIN = ' '.join
lines = [("ELEMENT %06d " % i1) + SPACE_JOIN(map(str, i2))
for i0, i1, i2 in data if i0 == 'ELEMENT']
out.write('\n'.join(lines))

While perhaps a bit obfuscated, it's a bit faster than the original.
Part of what makes this hard to read is the crappy variable names. I
didn't know what to call them. This version assumes that data will
always be a sequence of 3-element items.

The original version took about 11.5 seconds, the version above takes
just over 5 seconds.

YMMV,
n

Nov 18 '06 #13
nn******@gmail.com wrote:
Generally, don't create objects, don't perform repeated operations. In
this case, batch up I/O.
>def write_data1(out, data):
for i in data:
if i[0] is 'ELEMENT':
out.write("%s %06d " % (i[0], i[1]))
for j in i[2]:
out.write("%d " % (j))
out.write("\n")

def write_data1(out, data, map=map, str=str):
SPACE_JOIN = ' '.join
lines = [("ELEMENT %06d " % i1) + SPACE_JOIN(map(str, i2))
for i0, i1, i2 in data if i0 == 'ELEMENT']
out.write('\n'.join(lines))

While perhaps a bit obfuscated, it's a bit faster than the original.
Part of what makes this hard to read is the crappy variable names. I
didn't know what to call them. This version assumes that data will
always be a sequence of 3-element items.

The original version took about 11.5 seconds, the version above takes
just over 5 seconds.
footnote: your version doesn't print the final "\n". here's a variant
that do, and leaves the batching to the I/O subsystem:

def write_data3(out, data, map=map, str=str):
SPACE_JOIN = ' '.join
out.writelines(
"ELEMENT %06d %s\n" % (i1, SPACE_JOIN(map(str, i2)))
for i0, i1, i2 in data if i0 == 'ELEMENT'
)

this runs exactly as fast as your example on my machine, but uses less
memory. and if you, for benchmarking purposes, pass in a "sink" file
object that ignores the data you pass it, it runs in no time at all ;-)

</F>

Nov 18 '06 #14
Chris wrote:
So my question is how can I speed up what's happening inside the
function write_data()? Only allowed to use vanilla python (no psycho or
other libraries outside of a vanilla python install).
def write_data1(out, data):
for i in data:
if i[0] is 'ELEMENT':
out.write("%s %06d " % (i[0], i[1]))
for j in i[2]:
out.write("%d " % (j))
out.write("\n")
# reference, modified to avoid trailing ' '
def write_data(out, data):
for i in data:
if i[0] == 'ELEMENT':
out.write("%s %06d" % (i[0], i[1]))
for j in i[2]:
out.write(" %d" % j)
out.write("\n")

# Norvitz/Lundh
def writelines_data(out, data, map=map, str=str):
SPACE_JOIN = ' '.join
out.writelines(
"ELEMENT %06d %s\n" % (i1, SPACE_JOIN(map(str, i2)))
for i0, i1, i2 in data if i0 == 'ELEMENT'
)

def print_data(out, data):
for name, index, items in data:
if name == "ELEMENT":
print >out, "ELEMENT %06d" % index,
for item in items:
print >out, item,
print >out
import time

data = []
for i in range(500000):
data.append(("ELEMENT", i, (1,2,3,4,5,6)))

for index, write in enumerate([write_data, writelines_data, print_data]):
fname = "test%s.txt" % index
out = open(fname,'w')
start = time.time()
write(out, data)
out.close()
print write.__name__, time.time()-start

for fname in "test1.txt", "test2.txt":
assert open(fname).read() == open("test0.txt").read(), fname

Output on my machine:

$ python2.5 writedata.py
write_data 10.3382301331
writelines_data 5.4960360527
print_data 3.50765490532

Moral: don't forget about good old print. It does have an opcode(*) of its
own, after all.

Peter

(*) or two
Nov 18 '06 #15
Peter Otten wrote:
>
# Norvitz/Lundh
def writelines_data(out, data, map=map, str=str):
SPACE_JOIN = ' '.join
out.writelines(
"ELEMENT %06d %s\n" % (i1, SPACE_JOIN(map(str, i2)))
for i0, i1, i2 in data if i0 == 'ELEMENT'
)

def print_data(out, data):
for name, index, items in data:
if name == "ELEMENT":
print >out, "ELEMENT %06d" % index,
for item in items:
print >out, item,
print >out

Output on my machine:

$ python2.5 writedata.py
write_data 10.3382301331
writelines_data 5.4960360527
print_data 3.50765490532
Interesting. I timed with python2.4 and get this:

write_data 12.3158090115
writelines_data 5.02135300636
print_data 5.01881980896

A second run yielded:

write_data 11.5980260372
writelines_data 4.8575668335
print_data 4.84622001648

I'm surprised by your numbers a bit because I would expect string ops
to be faster in 2.5 than in 2.4 thanks to /F. I don't remember other
changes that would cause such an improvement for print between 2.4 and
2.5. (2.3 shows print doing a bit better than the times above.)

It could be that the variability is high due to lots of I/O or even
different builds. I'm on Linux.
Moral: don't forget about good old print. It does have an opcode(*) of its
own, after all.
Using print really should be faster as less objects are created.
(*) or two
or 5 :-)

$ grep 'case PRINT_' Python/ceval.c
case PRINT_EXPR:
case PRINT_ITEM_TO:
case PRINT_ITEM:
case PRINT_NEWLINE_TO:
case PRINT_NEWLINE:

n

Nov 18 '06 #16

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: Yang Li Ke | last post by:
Hi guys, Is it possible to know the internet speed of the visitors with php? Thanx -- Yang
34
by: Jacek Generowicz | last post by:
I have a program in which I make very good use of a memoizer: def memoize(callable): cache = {} def proxy(*args): try: return cache except KeyError: return cache.setdefault(args,...
25
by: Stijn Oude Brunink | last post by:
Hello, I have the following trade off to make: A base class with 2 virtual functions would be realy helpfull for the problem I'm working on. Still though the functions that my program will use...
2
by: laurenq uantrell | last post by:
I have been using the following function to test the speed of various functions, however, quite often the return value is zero. I'm hoping someone can help improve on this. Function TimeIt() As...
7
by: YAZ | last post by:
Hello, I have a dll which do some number crunching. Performances (execution speed) are very important in my application. I use VC6 to compile the DLL. A friend of mine told me that in Visual...
11
by: Jim Lewis | last post by:
Has anyone found a good link on exactly how to speed up code using pyrex? I found various info but the focus is usually not on code speedup.
9
by: burningsunorama | last post by:
Hi guys! This is maybe a too 'academic problem', but I would like to hear your opinions, something like pros and cons for each approach.... ... Recently we've had at work a little talk about the...
8
by: SaltyBoat | last post by:
Needing to import and parse data from a large PDF file into an Access 2002 table: I start by converted the PDF file to a html file. Then I read this html text file, line by line, into a table...
11
by: blackx | last post by:
I'm using clock() to measure the speed of my code (testing the speed of passing by value vs passing by reference in function calls). The problem is, the speed returned by my code is always 0.0000000...
0
by: JuAn2226 | last post by:
hi this my code Private Sub Form_Load() car_count = 0 Cumulative_Speed = 0 End Sub Private Sub Timer1_Timer() Dim tmpNumber As Integer
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge required to effectively administer and manage Oracle...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific technical details, Gmail likely implements measures...
0
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
0
BLUEPANDA
by: BLUEPANDA | last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS starter kit that's not only easy to use but also...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...
2
by: Ricardo de Mila | last post by:
Dear people, good afternoon... I have a form in msAccess with lots of controls and a specific routine must be triggered if the mouse_down event happens in any control. Than I need to discover what...
1
by: Johno34 | last post by:
I have this click event on my form. It speaks to a Datasheet Subform Private Sub Command260_Click() Dim r As DAO.Recordset Set r = Form_frmABCD.Form.RecordsetClone r.MoveFirst Do If...
0
by: jack2019x | last post by:
hello, Is there code or static lib for hook swapchain present? I wanna hook dxgi swapchain present for dx11 and dx9.
0
DizelArs
by: DizelArs | last post by:
Hi all) Faced with a problem, element.click() event doesn't work in Safari browser. Tried various tricks like emulating touch event through a function: let clickEvent = new Event('click', {...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.