473,386 Members | 1,720 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Anyone happen to have optimization hints for this loop?

I have some code that takes data from an Access database and processes
it into text files for another application. At the moment, I am using
a number of loops that are pretty slow. I am not a hugely experienced
python user so I would like to know if I am doing anything
particularly wrong or that can be hugely improved through the use of
another method.

Currently, all of the values that are to be written to file are pulled
from the database and into a list called "domainVa". These values
represent 3D data and need to be written to text files using line
breaks to seperate 'layers'. I am currently looping through the list
and appending a string, which I then write to file. This list can
regularly contain upwards of half a million values...

count = 0
dmntString = ""
for z in range(0, Z):
for y in range(0, Y):
for x in range(0, X):
fraction = domainVa[count]
dmntString += " "
dmntString += fraction
count = count + 1
dmntString += "\n"
dmntString += "\n"
dmntString += "\n***\n

dmntFile = open(dmntFilename, 'wt')
dmntFile.write(dmntString)
dmntFile.close()

I have found that it is currently taking ~3 seconds to build the
string but ~1 second to write the string to file, which seems wrong (I
would normally guess the CPU/Memory would out perform disc writing
speeds).

Can anyone see a way of speeding this loop up? Perhaps by changing the
data format? Is it wrong to append a string and write once, or should
hold a file open and write at each instance?

Thank you in advance for your time,

Dan
Jul 9 '08 #1
5 1557
On Jul 9, 12:04*pm, dp_pearce <dp_pea...@hotmail.comwrote:
I have some code that takes data from an Access database and processes
it into text files for another application. At the moment, I am using
a number of loops that are pretty slow. I am not a hugely experienced
python user so I would like to know if I am doing anything
particularly wrong or that can be hugely improved through the use of
another method.

Currently, all of the values that are to be written to file are pulled
from the database and into a list called "domainVa". These values
represent 3D data and need to be written to text files using line
breaks to seperate 'layers'. I am currently looping through the list
and appending a string, which I then write to file. This list can
regularly contain upwards of half a million values...

count = 0
dmntString = ""
for z in range(0, Z):
* * for y in range(0, Y):
* * * * for x in range(0, X):
* * * * * * fraction = domainVa[count]
* * * * * * dmntString += " *"
* * * * * * dmntString += fraction
* * * * * * count = count + 1
* * * * dmntString += "\n"
* * dmntString += "\n"
dmntString += "\n***\n

dmntFile * * = open(dmntFilename, 'wt')
dmntFile.write(dmntString)
dmntFile.close()

I have found that it is currently taking ~3 seconds to build the
string but ~1 second to write the string to file, which seems wrong (I
would normally guess the CPU/Memory would out perform disc writing
speeds).

Can anyone see a way of speeding this loop up? Perhaps by changing the
data format? Is it wrong to append a string and write once, or should
hold a file open and write at each instance?

Thank you in advance for your time,

Dan
Hi Dan,

Looking at the code sample you sent, you could do some clever stuff
making dmntString a list rather than a string and appending everywhere
you're doing a +=. Then at the end you build the string your write to
the file one time with a dmntFile.write(''.join(dmntList). But I think
the more straightforward thing would be to replace all the dmntString
+= ... lines in the loops with a dmntFile.write(whatever), you're just
constantly adding onto the file in various ways.

I think the slowdown you're seeing your code as written comes from
Python string being immutable. Every time you perform a dmntString
+= ... in the loops you're creating a new dmntString, copying in the
contents of the old, plus the appended content. And if your list can
reach a half a million items, well that's a TON of string create,
string copy operations.

Hope you find this helpful,
Doug
Jul 9 '08 #2
On Jul 9, 12:04*pm, dp_pearce <dp_pea...@hotmail.comwrote:
I have some code that takes data from an Access database and processes
it into text files for another application. At the moment, I am using
a number of loops that are pretty slow. I am not a hugely experienced
python user so I would like to know if I am doing anything
particularly wrong or that can be hugely improved through the use of
another method.

Currently, all of the values that are to be written to file are pulled
from the database and into a list called "domainVa". These values
represent 3D data and need to be written to text files using line
breaks to seperate 'layers'. I am currently looping through the list
and appending a string, which I then write to file. This list can
regularly contain upwards of half a million values...

count = 0
dmntString = ""
for z in range(0, Z):
* * for y in range(0, Y):
* * * * for x in range(0, X):
* * * * * * fraction = domainVa[count]
* * * * * * dmntString += " *"
* * * * * * dmntString += fraction
* * * * * * count = count + 1
* * * * dmntString += "\n"
* * dmntString += "\n"
dmntString += "\n***\n

dmntFile * * = open(dmntFilename, 'wt')
dmntFile.write(dmntString)
dmntFile.close()

I have found that it is currently taking ~3 seconds to build the
string but ~1 second to write the string to file, which seems wrong (I
would normally guess the CPU/Memory would out perform disc writing
speeds).

Can anyone see a way of speeding this loop up? Perhaps by changing the
data format? Is it wrong to append a string and write once, or should
hold a file open and write at each instance?

Thank you in advance for your time,

Dan
Maybe try something like this ...

count = 0
dmntList = []
for z in xrange(Z):
for y in xrange(Y):
dmntList.extend([" "+domainVa[count+x] for x in xrange(X)])
dmntList.append("\n")
count += X
dmntList.append("\n")
dmntList.append("\n***\n")
dmntString = ''.join(dmntList)
Jul 9 '08 #3
On Jul 9, 5:04*pm, dp_pearce <dp_pea...@hotmail.comwrote:
count = 0
dmntString = ""
for z in range(0, Z):
* * for y in range(0, Y):
* * * * for x in range(0, X):
* * * * * * fraction = domainVa[count]
* * * * * * dmntString += " *"
* * * * * * dmntString += fraction
* * * * * * count = count + 1
* * * * dmntString += "\n"
* * dmntString += "\n"
dmntString += "\n***\n

dmntFile * * = open(dmntFilename, 'wt')
dmntFile.write(dmntString)
dmntFile.close()
Can anyone see a way of speeding this loop up?
I'd consider writing it like this:

def dmntGenerator():
count = 0
for z in xrange(Z):
for y in xrange(Y):
for x in xrange(X):
yield ' '
yield domainVa[count]
count += 1
yield '\n'
yield '\n'
yield '\n***\n'

You can make the string using ''.join:

dmntString = ''.join(dmntGenerator())

But if you don't need the string, just write straight to the file:

for part in dmntGenerator():
dmntFile.write(part)

This is likely to be a lot faster as no large string is produced.

--
Paul Hankin

Jul 9 '08 #4
On 9 juil, 18:04, dp_pearce <dp_pea...@hotmail.comwrote:
I have some code that takes data from an Access database and processes
it into text files for another application. At the moment, I am using
a number of loops that are pretty slow. I am not a hugely experienced
python user so I would like to know if I am doing anything
particularly wrong or that can be hugely improved through the use of
another method.

Currently, all of the values that are to be written to file are pulled
from the database and into a list called "domainVa". These values
represent 3D data and need to be written to text files using line
breaks to seperate 'layers'. I am currently looping through the list
and appending a string, which I then write to file. This list can
regularly contain upwards of half a million values...

count = 0
dmntString = ""
for z in range(0, Z):
for y in range(0, Y):
for x in range(0, X):
fraction = domainVa[count]
dmntString += " "
dmntString += fraction
count = count + 1
dmntString += "\n"
dmntString += "\n"
dmntString += "\n***\n

dmntFile = open(dmntFilename, 'wt')
dmntFile.write(dmntString)
dmntFile.close()

I have found that it is currently taking ~3 seconds to build the
string but ~1 second to write the string to file, which seems wrong (I
would normally guess the CPU/Memory would out perform disc writing
speeds).
Not necessarily - when the dataset becomes too big and your process
has eaten all available RAM, your OS starts swapping, and then it's
getting worse than disk IO. IOW, for large datasets (for a definition
of 'large' depending on the available resources), you're sometimes
better doing direct disk access - which BTW are usually buffered by
the OS.
Can anyone see a way of speeding this loop up? Perhaps by changing the
data format?
Almost everyone told you to use a list and str.join()... Which used to
be a sound advice wrt/ both performances and readability, but nowadays
"only" makes your code more readable (and pythonic...) :

bruno@bibi ~ $ python
Python 2.5.1 (r251:54863, Apr 6 2008, 17:20:35)
[GCC 4.1.2 (Gentoo 4.1.2 p1.0.2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>from timeit import Timer
def dostr():
.... s = ''
.... for i in xrange(10000):
.... s += ' ' + str(i)
....
>>def dolist():
.... s = []
.... for i in xrange(10000):
.... s.append(str(i))
.... s = ' '.join(s)
....
>>tstr = Timer("dostr", "from __main__ import dostr")
tlist = Timer("dolist", "from __main__ import dolist")
tlist.timeit(10000000)
1.4280490875244141
>>tstr.timeit(10000000)
1.4347598552703857
>>>
The list + str.join version is only marginaly faster... But you should
consider this solution even if doesn't that change much to perfs -
readabilty counts, too//
Is it wrong to append a string and write once, or should
hold a file open and write at each instance?
Is it really a matter of one XOR the other ? Perhaps you should try a
midway solution, ie building not-too-big chunks as lists, and writing
them to the (opened file) ? This would avoids possible swap and reduce
disk IO. I suggest you try this approach with different list-size /
write ratios, using the timeit module (and eventually the "top"
program on unix or it's equivalent if you're on another platform to
check memory/CPU usage) to find out which ratio works best for a
representative sample of your input data. That's at least what I'd
do...

HTH
Jul 9 '08 #5
Thank you so much for all your advice. I have learnt a lot.

In the end, the solution was perhaps self evident. Why try and build a
huge string AND THEN write it to file when you can just write it to
file? Writing this much data directly to file completed in ~1.5
seconds instead of the 3-4 seconds that any of the .join methods
produced.

Interestingly, I found that my tests agreed with Bruno. There wasn't a
huge change in speed between a lot of the other methods, other than
them looking a lot tidier.

Again, many thanks.

Dan
Jul 15 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Kim Petersen | last post by:
I've worked on this table object a bit too long - and seem to have stared too long at the code. Can someone see where it goes wrong in the insertrow function? Also optimization hints and...
12
by: Steve | last post by:
Hi, I'm getting some output by running a command using os.popen. I need to parse the output and transform it in some sense so that it's 'DB compatible', (i.e I need to store the output in a...
9
by: MLH | last post by:
Trouble is, it doesn't happen every time. Yesterday, for example, it happened only once and not again afterward. Some days ago, a similar situation. Today, well - I tried 7 times straight to open...
6
by: Benjamin Day | last post by:
Does anyone ACTUALLY have the application updater block working? I've been beating my head against the wall on this one for about 3 days. It looks cool. It seems promising. I've even read this...
3
by: Ben | last post by:
Hi all, I'm going to setup a testing environment with FP4 U488486 but the FP4 U488486 is withdrawn from the support page. Do anyone still have V8 FixPak 4 (U488486, not 4a) for AIX5 to share?...
1
by: Joey_Stacks | last post by:
I've been searching for an example of a javascript rollover effect, and I'm beginning to think it's impossible, dare I say? Here's what I'm looking for. It's a double rollover effect. You...
5
by: Anshu | last post by:
int a; for(int i=0;;i++) {cin>>a; if(a==0) break; for(int j=0;j<i;j++) //NEW SORT MECHANISM if(a<a) {int t=a;a=a;a=t;} }
2
by: mika_ella258 | last post by:
the output should be like this Loop Program -------------------- enter 1st integer: enter 2nd integer: enter 3rd integer: enter 4th integer: enter 5th integer:
2
by: dp_pearce | last post by:
I have some code that takes data from an Access database and processes it into text files for another application. At the moment, I am using a number of loops that are pretty slow. I am not a...
1
by: Kristoffer | last post by:
I can't comprehend that this loop isnt working in practise as in my head it makes perfect theoretical sense while(fscanf(input, "%c", &c)==1) { i = 0; while(c!=';') { printf("%c",...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.