Possible memory leak?

Tuvas

I have a function in a program that works something like this.

def load_pic_data(width,heigth,inpdat, filt=TRUE):
data=''
total=0
tnum=0
size=100
for y in range(0,heigth):
row=''
for x in range(0,width):
index=2*(x+y*width)
num=ord(inpdat[index+1])*256+ord(inpdat[index])
if(vfilter.get() and d_filter and filt):
num=round((num-(d_filter[index/2])))
if(num<0):
num=0
if(num>255*64):
num=255*64
row=row+chr(num/64)
data=data+row
The purpose of this part of a program is to take a 14 bit numerical
representation and convert it to an 8 bit representation. This will
later be displayed as an image. However, I've noticed the following
about this code. I was noticing when I took a small picture, it went
really fast, but a larger picture took forever to run through. I added
a print statement to the y portion of the code to see where it was
getting hung up. I noticed that it appears to be running slower as time
goes on. I did a time.time() timestamp to verify this, and had it
confirmed. Any ideas as to what I could do to make it run faster? Note
that if this process is repeated, it runs equally slow.What can I do to
make it run faster? I suspect if I somehow dealocate the row statement
after it's done, that it will run faster, and the data variable when
it's done, but I just don't know how to do so.Thanks!

Jan 24 '06 #1

Subscribe Post Reply

1705

Steven D'Aprano

On Tue, 24 Jan 2006 13:51:36 -0800, Tuvas wrote:

The purpose of this part of a program is to take a 14 bit numerical
representation and convert it to an 8 bit representation. This will
later be displayed as an image. However, I've noticed the following
about this code. I was noticing when I took a small picture, it went
really fast, but a larger picture took forever to run through.
for i in range(10):
for j in range(10):
# do stuff with i and j
# this loops 100 times.

for i in range(1000):
for j in range(1000):
# do stuff with i and j
# this loops 1,000,000.

Of course the code gets slower as the image size increases.

But the real killer is this one line:

row=row+chr(num/64)

Bad, bad BAD idea. Every time you add two strings together, Python has to
copy BOTH strings. As row gets huge, this takes longer and longer to do.

A rule of thumb I use is, never add more than two strings together. Maybe
three. Certainly not more than four. Or five.

But absolutely not millions of strings, which is what you are doing.

The best way to handle this is to turn row into a list, and then once, at
the very end, convert the list into a string.

Instead of this:

row = ""
# processing in a loop...
row = row + chr(num/64)
# finished processing
print row

do this instead:

row = []
# processing in a loop...
row.append(chr(num/64)
# finished processing
row = "".join(row) # convert to a string in one hit
print row

You should find that runs much faster.
I suspect if I somehow dealocate the row statement
after it's done, that it will run faster, and the data variable when
it's done, but I just don't know how to do so.

Once row and data are no longer being used, Python automatically removes
them and reclaims the memory they used. You rarely need to worry about
that.
--
Steven.

Jan 24 '06 #2

Fredrik Lundh

Tuvas wrote:

I have a function in a program that works something like this.

def load_pic_data(width,heigth,inpdat, filt=TRUE):
data=''
total=0
tnum=0
size=100
for y in range(0,heigth):
row=''
for x in range(0,width):
index=2*(x+y*width)
num=ord(inpdat[index+1])*256+ord(inpdat[index])
if(vfilter.get() and d_filter and filt):
num=round((num-(d_filter[index/2])))
if(num<0):
num=0
if(num>255*64):
num=255*64
row=row+chr(num/64)
data=data+row

The purpose of this part of a program is to take a 14 bit numerical
representation and convert it to an 8 bit representation. This will
later be displayed as an image. However, I've noticed the following
about this code. I was noticing when I took a small picture, it went
really fast, but a larger picture took forever to run through. I added
a print statement to the y portion of the code to see where it was
getting hung up. I noticed that it appears to be running slower as time
goes on.
hint: what does
row=row+chr(num/64)

do, and how often is it executed ?

</F>

Jan 24 '06 #3

Paul Rubin

Steven D'Aprano <st***@REMOVETHIScyber.com.au> writes:

row = []
# processing in a loop...
row.append(chr(num/64)
# finished processing
row = "".join(row) # convert to a string in one hit
print row

Probably better to use StringIO or the array module.

Jan 24 '06 #4

Paul Rubin

Paul Rubin <http://ph****@NOSPAM.invalid> writes:

Probably better to use StringIO or the array module.

That's cStringIO of course.

Jan 24 '06 #5

Stephen Kellett

In message <pa****************************@REMOVETHIScyber.co m.au>,
Steven D'Aprano <st***@REMOVETHIScyber.com.au> writes

But the real killer is this one line:

row=row+chr(num/64)

Bad, bad BAD idea. Every time you add two strings together, Python has to
copy BOTH strings. As row gets huge, this takes longer and longer to do.

A rule of thumb I use is, never add more than two strings together. Maybe
three. Certainly not more than four. Or five.

But absolutely not millions of strings, which is what you are doing.

You would be able to visualize this process very well using Python
Memory Validator and Python Performance Validator.

http://www.softwareverify.com

Stephen
--
Stephen Kellett
Object Media Limited http://www.objmedia.demon.co.uk/software.html
Computer Consultancy, Software Development
Windows C++, Java, Assembler, Performance Analysis, Troubleshooting

Jan 25 '06 #6

Travis E. Oliphant

Tuvas wrote:

I have a function in a program that works something like this.

def load_pic_data(width,heigth,inpdat, filt=TRUE):
data=''
total=0
tnum=0
size=100
for y in range(0,heigth):
row=''
for x in range(0,width):
index=2*(x+y*width)
num=ord(inpdat[index+1])*256+ord(inpdat[index])
if(vfilter.get() and d_filter and filt):
num=round((num-(d_filter[index/2])))
if(num<0):
num=0
if(num>255*64):
num=255*64
row=row+chr(num/64)
data=data+row
The purpose of this part of a program is to take a 14 bit numerical
representation and convert it to an 8 bit representation.

This is exactly the kind of thing NumPy (http://numeric.scipy.org) is
ideal for. If that's of interest to you, then I can offer particular
suggestions...

-Travis

Jan 25 '06 #7

Tuvas

Hmm. The problem is that the required form for the image is as a string
of characters to convert with the tkimage interface, at least, as I
understood it. Perhaps an array would work, I'll try that when I get
ahold of the computer in question (One thing required is a linux only
library, and I don't have access to a linux machine at the moment...)

However, I would like to make a few more statements. The max image size
is 1024x1024. The length of time it takes to do the row statement
increased fairly substationally every time it was ran, in the order of
~.5%. That seems small, but when you run it 1M times, the last
operation will take 5000 times longer than the original, or about
~1sec. And that's just for one row, the entire process would take an
eternity...

And the real kicker, when doing this with smaller images, ei, about
128x128, the second image starts of at the same point of slowness as
the first one. EI, the last operation of the first image took .01
seconds to complete (It started, let's say, around .005), for instance,
the next one would start at that length of time, and end at .02 or so,
the third picture would be taking that long for each row, and so on. It
only does this on one particular computer (I've only had the chance to
run it on 2 machines to date, BTW.)

There is a reason why the rows are pieced together as is, I must say,
but it's a tad bit complicated... I'll just defend it without giving a
real reason.

Thanks for the help!

Jan 25 '06 #8

Tuvas

Oh, I should also mention, I used a memory monitor and saw the amount
of memory being used go up with time, even when the function ended,
meaning I did the 10 128x128 pictures, never was any memory dealocated
until I exited the program.

Jan 25 '06 #9

Steven D'Aprano

Tuvas wrote:

Oh, I should also mention, I used a memory monitor and saw the amount
of memory being used go up with time, even when the function ended,
meaning I did the 10 128x128 pictures, never was any memory dealocated
until I exited the program.

Are you really trying to say that the amount of memory
being used is going up even when the function is not
being used?

If so, then at least part of your problem exists in
another part of your program.

If you are monitoring the amount of memory being
allocated to Python, then no, Python won't generally
return memory to the operating system while it is still
running. But the memory allocated to your data
structures will be reclaimed by Python when they are
not in use.
--
Steven.

Jan 25 '06 #10

Steven D'Aprano

Tuvas wrote:

However, I would like to make a few more statements. The max image size
is 1024x1024. The length of time it takes to do the row statement
increased fairly substationally every time it was ran, in the order of
~.5%. That seems small, but when you run it 1M times, the last
operation will take 5000 times longer than the original, or about
~1sec. And that's just for one row, the entire process would take an
eternity...

I've just gone back and taken another look at your
original post. You haven't posted the actual code you
are using, have you? The code you posted doesn't
actually return a result, nor does it set a global
variable. As far as I can tell, it just churns for a
while, processing hard, and then throws the result away.

So I'm guessing that it isn't actually your production
code. Did you really think we would be able to debug
some code that you haven't shown us? I mean, some of
the folks on comp.lang.python are really good, but even
they aren't that good *wink*
--
Steven.

Jan 25 '06 #11

Giovanni Bajo

Steven D'Aprano wrote:

But the real killer is this one line:

row=row+chr(num/64)

Bad, bad BAD idea. Every time you add two strings together, Python
has to copy BOTH strings. As row gets huge, this takes longer and
longer to do.

This is no longer true as of CPython 2.4 though. I'm not sure which version the
OP was using because he didn't say.
--
Giovanni Bajo

Jan 25 '06 #12

Steven D'Aprano

Giovanni Bajo wrote:

Steven D'Aprano wrote:

But the real killer is this one line:

row=row+chr(num/64)

Bad, bad BAD idea. Every time you add two strings together, Python
has to copy BOTH strings. As row gets huge, this takes longer and
longer to do.

This is no longer true as of CPython 2.4 though. I'm not sure which version the
OP was using because he didn't say.

No, that's not correct. You are making a false
assumption. This is from the "What's New" for Python 2.4:

[quote]
String concatenations in statements of the form s = s +
"abc" and s += "abc" are now performed more efficiently
in certain circumstances. This optimization won't be
present in other Python implementations such as Jython,
so you shouldn't rely on it; using the join() method of
strings is still recommended when you want to
efficiently glue a large number of strings together.
[end quote]

Note the "more efficiently in CERTAIN CIRCUMSTANCES"
[emphasis added]? That certainly does not mean that
concatenating large numbers of strings is no longer
slow. It just means that *sometimes* (almost always?
often? occasionally?) it isn't as slow as it used to be.

We really don't know what the optimization recognises,
how it works, or how fast a difference it makes.
Writing poor code, and trusting an implementation-
specific optimization to make it run "faster" (how much
faster?) is always a dangerous strategy.

Better to avoid the temptation if you can, appreciate
the improved performance for those times when you do
string concatenation, but continue to use the join
method anytime you are adding large numbers of strings.
--
Steven.

Jan 25 '06 #13

Steven D'Aprano

Replying to myself... the first step towards perdition...

Steven D'Aprano wrote:

We really don't know what the optimization recognises, how it works, or
how fast a difference it makes.

Of course, by "we" I mean "those of us who haven't seen
and understood the CPython source code, or run detailed
benchmarks".
--
Steven.

Jan 25 '06 #14

Fredrik Lundh

Steven D'Aprano wrote:

Steven D'Aprano wrote:

But the real killer is this one line:

row=row+chr(num/64)

Bad, bad BAD idea. Every time you add two strings together, Python
has to copy BOTH strings. As row gets huge, this takes longer and
longer to do.

This is no longer true as of CPython 2.4 though. I'm not sure which version the
OP was using because he didn't say.

No, that's not correct. You are making a false
assumption. This is from the "What's New" for Python 2.4:

[quote]
String concatenations in statements of the form s = s +
"abc" and s += "abc" are now performed more efficiently
in certain circumstances. This optimization won't be
present in other Python implementations such as Jython,
so you shouldn't rely on it; using the join() method of
strings is still recommended when you want to
efficiently glue a large number of strings together.
[end quote]

Note the "more efficiently in CERTAIN CIRCUMSTANCES"
[emphasis added]? That certainly does not mean that
concatenating large numbers of strings is no longer
slow. It just means that *sometimes* (almost always?
often? occasionally?) it isn't as slow as it used to be.

it only means that if the interpreter can make sure that else is
using the target string, it's modified in place.

however, the string type doesn't use overallocation (beyond the
8-byte alignment provided by the memory allocator), so this fix
only avoids extra copying in a few specific cases. O(n*n/8) is
still quadratic...

</F>

Jan 25 '06 #15

Tuvas

FYI, to all who asked, I was indeed just simply monitering the system
memory. I changed my approach to one that uses arrays and simply joins
the statements together at the end, it seems to have improved the
speed. However, for some reason it still takes a small eternity to
process on one computer, and not the other. The oddest thing of all is
that the one that is taking longer is on a better computer. I have been
using Linux, running Python 2.4.

The modified version of my code is now as follows: (Note, a few small
changes have been made to simplify things, however, these things don't
apply to a full-scale picture, so the shouldn't slow anything down in
the slightest.)

def load_pic_data(width,heigth,inpdat, filt=TRUE):
ldata=[]
total=0
tnum=0
size=100
array=[]
for y in range(0,heigth):
row=[]
ts=time.time()
for x in range(0,width):
index=2*(x+y*width)
num=ord(inpdat[index+1])*256+ord(inpdat[index])
if(vfilter.get() and d_filter and filt):
num=round((num-(d_filter[index/2])))
if(num<0):
num=0
if(num>255*64):
num=255*64
tba=chr(num/64)
row.append(tba)
srow=''.join(row)
ldata.append(srow)
print y,time.time()-ts
data=''.join(ldata)

There is considerable more to the function, however, I've traced the
slowest part to this one.
Note the statement "print y, time.time()-ts". A few of the outputs are
as follows, with the 1024x1024 image.

1 .0633
2. .07005
3. .06698
20 .07925
30 .08410
100 .16255
200 .270895
500 .59182
900 1.06439
Note that at the time I wrote this, 900 was the highest avaliable.

Note that the value seems to be consistant when a few close ones are
observed, but over time, it increases, until the program is running
very slowly. For some reason it does this on one computer, and not
another, and I believe the 2 computers have identical python
configuration, ei, same libraries, version, etc. Both are the same type
of linux as well. I no longer am joining the strings one at a time, but
only at the end. What could be the source of such an odd problem? I
understand that the large image will take longer to process, but I
would think that the relationship should be more or less linear with
the size, and not exponential. Thanks for all of the help till now!

Jan 25 '06 #16

Giovanni Bajo

Steven D'Aprano wrote:

No, that's not correct. You are making a false
assumption.
Did you ever try to measure the code yourself?
This is from the "What's New" for Python 2.4:

[quote]
String concatenations in statements of the form s = s +
"abc" and s += "abc" are now performed more efficiently
in certain circumstances. This optimization won't be
present in other Python implementations such as Jython,
so you shouldn't rely on it; using the join() method of
strings is still recommended when you want to
efficiently glue a large number of strings together.
[end quote]

Note the "more efficiently in CERTAIN CIRCUMSTANCES"
[emphasis added]? That certainly does not mean that
concatenating large numbers of strings is no longer
slow. It just means that *sometimes* (almost always?
often? occasionally?) it isn't as slow as it used to be.

We really don't know what the optimization recognises,
how it works, or how fast a difference it makes.
Writing poor code, and trusting an implementation-
specific optimization to make it run "faster" (how much
faster?) is always a dangerous strategy.

The code is poor by your definition of poor. In my definition, it used to be
poor and it's not anymore. Since I was interested in exactly WHICH
circumstances the optimization was performed, I investigated and I know the
answer. I also know that the optimization *will* apply to the OP code. But
maybe you want some numbers:

------- foo.py -----
def iters(n):
s = ''
for i in xrange(n):
s += chr(i%64)
return s

def iters2(n):
L = []
for i in xrange(n):
L.append(chr(i%64))
return "".join(L)
------- foo.py -----

[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters(10000)"
100 loops, best of 3: 10.2 msec per loop

[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters(20000)"
100 loops, best of 3: 20.5 msec per loop

[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters(40000)"
100 loops, best of 3: 40.9 msec per loop

[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters(80000)"
100 loops, best of 3: 81.6 msec per loop

[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters2(10000)"
100 loops, best of 3: 10.9 msec per loop

[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters2(20000)"
100 loops, best of 3: 22.2 msec per loop

[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters2(40000)"
100 loops, best of 3: 44 msec per loop

[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters2(80000)"
100 loops, best of 3: 89.9 msec per loop
So, look, it's even faster than the solution you're proposing.
--
Giovanni Bajo

Jan 25 '06 #17

Tuvas

Very interesting results with the last test. I guess I go back to my
other code then, even if it is only a hair faster, it's still faster...

It's odd, I just ran another test. There's 2 ways I can call my
load_pic function, first of all, through taking a picture, secondly by
loading a picture. For some reason, the exact same function takes the
same amount of time when the load picture function is used, but not
when called from the take picture function. Odd, isn't it... I don't
know why the time would increase for one, but not for the other. The
data is passed in exactly the same format, it's just really odd... Oh
well, I'll get something figured out...

Jan 25 '06 #18

Fredrik Lundh

Giovanni Bajo wrote:

------- foo.py -----
def iters(n):
s = ''
for i in xrange(n):
s += chr(i%64)
return s

def iters2(n):
L = []
for i in xrange(n):
L.append(chr(i%64))
return "".join(L)
------- foo.py -----

So, look, it's even faster than the solution you're proposing.

since you know the length, you can preallocate the list

def iters3(n):
L = [None]*n
for i in xrange(n):
L[i] = chr(i%64)
return "".join(L)

or use a preallocated array

def iters4(n):
L = array.array("B", [0])*n
for i in xrange(n):
L[i] = i%64
return L.tostring()

on my machine, the last one is twice as fast as your "even faster"
solution under 2.4. in earlier versions, it's just under 5 times faster.

for the OP's problem, a PIL-based solution would probably be ~100
times faster than the array solution, but that's another story.

</F>

Jan 25 '06 #19

Tuvas

Fredrik Lundh wrote:

Giovanni Bajo wrote:
------- foo.py -----
def iters(n):
s = ''
for i in xrange(n):
s += chr(i%64)
return s

def iters2(n):
L = []
for i in xrange(n):
L.append(chr(i%64))
return "".join(L)
------- foo.py -----

So, look, it's even faster than the solution you're proposing.
since you know the length, you can preallocate the list

def iters3(n):
L = [None]*n
for i in xrange(n):
L[i] = chr(i%64)
return "".join(L)

or use a preallocated array

def iters4(n):
L = array.array("B", [0])*n
for i in xrange(n):
L[i] = i%64
return L.tostring()

on my machine, the last one is twice as fast as your "even faster"
solution under 2.4. in earlier versions, it's just under 5 times faster.

for the OP's problem, a PIL-based solution would probably be ~100
times faster than the array solution, but that's another story.

What do you mean by a PIL based solution? The reason I need to get the
data into the string list is so I can pump it into PIL to give me my
image... If PIL has a way to make it easier, I do not know it, but
would like to know it.
</F>

Jan 25 '06 #20

Giovanni Bajo

Fredrik Lundh wrote:

since you know the length, you can preallocate the list

def iters3(n):
L = [None]*n
for i in xrange(n):
L[i] = chr(i%64)
return "".join(L)

or use a preallocated array

def iters4(n):
L = array.array("B", [0])*n
for i in xrange(n):
L[i] = i%64
return L.tostring()

Of course. I was just trying to make a point about string accumulation being
O(n) and not O(n^2). You can get better factors with other solutions like the
ones you propose, but even the "simple" implementation (and most readable and
Pythonic, IMHO) behaves "well" under CPython 2.4.
--
Giovanni Bajo

Jan 25 '06 #21

Steven D'Aprano

On Wed, 25 Jan 2006 13:17:25 -0800, Tuvas wrote:

Very interesting results with the last test. I guess I go back to my
other code then, even if it is only a hair faster, it's still faster...
Can you say "premature optimization"?

Have you timed your code with realistic data? Not somebody else's code.
Not some cut back piece of sample code with only three lines. Unless you
have timed YOUR code, you don't know what's faster.

Anyway, do you really care about "optimizing" code that takes an hour to
run to 59 minutes, 59 seconds, 997 milliseconds? Or, at the other extreme,
from 80 milliseconds to 77? Do you think anyone will notice?

I hope you do speed up your code, whatever way it takes. But I'm guessing
that if you aren't using the array module, you aren't getting anywhere
*near* as fast as you could be. (And that is JUST a guess -- time it to be
sure.)
It's odd, I just ran another test. There's 2 ways I can call my
load_pic function, first of all, through taking a picture, secondly by
loading a picture. For some reason, the exact same function takes the
same amount of time when the load picture function is used, but not
when called from the take picture function. Odd, isn't it... I don't
know why the time would increase for one, but not for the other. The
data is passed in exactly the same format, it's just really odd... Oh
well, I'll get something figured out...

What are you timing? Are you timing the execution time of the code, or the
elapsed time? They are not the same thing. Hint: think about your
operating system and what it is doing. Can you say "preemptive
multitasking"? Are you measuring network events, memory paging,
screen savers, disk I/O? Are the files you are loading from disk badly
fragmented?

If your code is not optimized for memory, and hence pages a lot, you
should be including that time in your measurements. But you probably don't
want to count elapsed time when the screen saver kicks in.
--
Steven.

Jan 26 '06 #22

Tim Peters

[Fredrik Lundh]

...
for the OP's problem, a PIL-based solution would probably be ~100
times faster than the array solution, but that's another story.

[Tuvas] What do you mean by a PIL based solution? The reason I need to get the
data into the string list is so I can pump it into PIL to give me my
image... If PIL has a way to make it easier, I do not know it, but
would like to know it.

If your data is in an array.array, you can pass that directly to PIL's
(1.1.4) Image.frombuffer() constructor

Jan 26 '06 #23

Steven D'Aprano

On Wed, 25 Jan 2006 23:20:08 +0000, Giovanni Bajo wrote:

Of course. I was just trying to make a point about string accumulation being
O(n) and not O(n^2).

But according to Fredrik, string accumulation is still quadratic, even
with the optimizations added to Python 2.4. Quoting:

"it only means that if the interpreter can make sure that else is
using the target string, it's modified in place.

however, the string type doesn't use overallocation (beyond the
8-byte alignment provided by the memory allocator), so this fix
only avoids extra copying in a few specific cases. O(n*n/8) is
still quadratic..."

I presume Fredrik meant to say "nothing else".

Or have I misunderstood something?

--
Steven.

Jan 26 '06 #24

Steven D'Aprano

On Wed, 25 Jan 2006 20:08:56 +0100, Giovanni Bajo wrote:

Steven D'Aprano wrote:
No, that's not correct. You are making a false
assumption.
Did you ever try to measure the code yourself?

I'm still running Python 2.3, it would give misleading results.

This is from the "What's New" for Python 2.4:

[quote]
String concatenations in statements of the form s = s +
"abc" and s += "abc" are now performed more efficiently
in certain circumstances. This optimization won't be
present in other Python implementations such as Jython,
so you shouldn't rely on it; using the join() method of
strings is still recommended when you want to
efficiently glue a large number of strings together.
[end quote]

Note the "more efficiently in CERTAIN CIRCUMSTANCES"
[emphasis added]? That certainly does not mean that
concatenating large numbers of strings is no longer
slow. It just means that *sometimes* (almost always?
often? occasionally?) it isn't as slow as it used to be.

We really don't know what the optimization recognises,
how it works, or how fast a difference it makes.
Writing poor code, and trusting an implementation-
specific optimization to make it run "faster" (how much
faster?) is always a dangerous strategy.

The code is poor by your definition of poor. In my definition, it used to be
poor and it's not anymore. Since I was interested in exactly WHICH
circumstances the optimization was performed, I investigated and I know the
answer. I also know that the optimization *will* apply to the OP code.

I think you are being overly optimistic about the OP's actual code.

It looks to me that the code posted can't possibly be his production code.
Both versions he has posted lack a return statement. He isn't posting his
actual code, only a modified version of it. He's even admitted it: "Note,
a few small changes have been made to simplify things, however, these
things don't apply to a full-scale picture, so the shouldn't slow anything
down in the slightest." Famous last words. If I had a dollar for every
time somebody said "I made some changes, but they shouldn't change
anything"...
But maybe you want some numbers: .... So, look, it's even faster than the solution you're proposing.

But your test code isn't equivalent to the OP's code. I don't know if it
will make a difference, but his code looks more like this:

def iters3(n,m):
data = ''
for i in xrange(n):
row = ''
for j in xrange(m):
row += chr(j%64)
data += row
return data

while yours is:

def iters(n):
s = ''
for i in xrange(n):
s += chr(i%64)
return s

I'm happy to agree that your code is optimized to the point it is
marginally faster than the list/join algorithm. But does iters3 optimize
as well? I don't know.

Even if it does, what generalisation do you learn from that? What do you
predict about this one?

def iters4(n):
s = ''
D = {}
for i in xrange(n):
s += chr(i%64)
D[i] = s
return s

At what level of complexity should the Python coder stop relying on
compiler optimizations to turn slow code into fast?

You have saved a handful of milliseconds for one particular version of
Python, at the cost of performance plummeting like a stone if the code
ever gets run under an older version, or on Jython, PyPy or IronPython, or
if the algorithm is used in a slightly more complicated way. I would much
rather have consistently good performance, even if not quite the fastest
possible, than a small speed increase on one platform and terrible
performance on everything else.

--
Steven.

Jan 26 '06 #25

Fredrik Lundh

Steven D'Aprano wrote:

Of course. I was just trying to make a point about string accumulation being
O(n) and not O(n^2).

But according to Fredrik, string accumulation is still quadratic, even
with the optimizations added to Python 2.4. Quoting:

"it only means that if the interpreter can make sure that else is
using the target string, it's modified in place.

however, the string type doesn't use overallocation (beyond the
8-byte alignment provided by the memory allocator), so this fix
only avoids extra copying in a few specific cases. O(n*n/8) is
still quadratic..."

I presume Fredrik meant to say "nothing else".

correct.

the only way to get amortized O(n) behaviour is by adding small
strings to the end of a larger string, and hope that the memory
allocator can satisfy all reallocations without too much copying.

the list implementation, in contrast, uses controlled overallocation
and can therefore guarantee amortized O(n) even if realloc always
fails to reallocate in place.

</F>

Jan 26 '06 #26

Tuvas

The times that I posted was the time that it took to perform ONE row
iteration. As you can see, the time was going up, fairly dramatically.
Why on earth could it be doing this? I understand the the time will
fluctuate somewhat depending upon what else the CPU is doing, but, why
is the base time increasing so much for one machine doing something one
way, and not for another machine appearently identically configured
doing the same operation? That is the great question I have. Still, it
seems as though it cannot be solved as to why its doing this. The only
thing I can think of that might be different is perhaps these results
came off of a slightly newer version of python. One great problem is
the data is read in streamlining, ei, the data enters this function as
a list (Or tuple, I always mix those two up, but the one that uses
[]'s. not ()'s). Oh well, the solution will come, eventually. Thanks
for the help!

Jan 26 '06 #27

Tuvas

I have made one confirmation. The only identifiable difference that I
have seen is that one runs on python 2.4.2, and the other 2.4.1. Oddly
enough, it's the first one that's the one that is having more problems
than the second... Why that is, I don't know. It still could be
something else, but...

Jan 26 '06 #28

Marc 'BlackJack' Rintsch

In <11**********************@z14g2000cwz.googlegroups .com>, Tuvas wrote:

The modified version of my code is now as follows: (Note, a few small
changes have been made to simplify things, however, these things don't
apply to a full-scale picture, so the shouldn't slow anything down in
the slightest.)

def load_pic_data(width,heigth,inpdat, filt=TRUE):
ldata=[]
total=0
tnum=0
size=100
array=[]
for y in range(0,heigth):
row=[]
ts=time.time()
for x in range(0,width):
index=2*(x+y*width)
num=ord(inpdat[index+1])*256+ord(inpdat[index])
if(vfilter.get() and d_filter and filt):
num=round((num-(d_filter[index/2])))
if(num<0):
num=0
if(num>255*64):
num=255*64
tba=chr(num/64)
row.append(tba)
srow=''.join(row)
ldata.append(srow)
print y,time.time()-ts
data=''.join(ldata)

Do you always work on the whole `width` and `height` of the picture? If
yes, do you really need the `x` and `y` coordinate? The snipped above
would work if you just process all values as one sequence.

And using `array.array` for `inpdat` and the result may speed things up
too::

inpdat_array = array('H', inpdat)
data = array('c')
for index in xrange(width * heigth):
num = inpdat_array[index]
if vfilter.get() and d_filter and filt:
num = round(num - d_filter[index])
if num < 0:
num = 0
if num > 255 * 64:
num = 255 * 64
data.append(num // 64)
return data.tostring()

If you don't need the `index` somewhere in the loop you can even get rid
of it by always subtract the filter value and supply a "zero filter" if
the condition of the first ``if`` is not fulfilled::

from itertools import izip, repeat

inpdat_array = array('H', inpdat)
data = array('c')

if not (vfilter.get() and d_filter and filt):
filter_values = repeat(0)
else:
filter_values = d_filter

for num, filter_value in izip(inpdat_array, filter_values):
num = round(inpdat_array[index] - filter_value)
if num < 0:
num = 0
if num > 255 * 64:
num = 255 * 64
data.append(num // 64)
return data.tostring()

Ciao,
Marc 'BlackJack' Rintsch

Jan 26 '06 #29

Tuvas

The reason it's done in width and heigth is that there is some other
non-related processing functions that were going on in the mean time
with the problem. I found the source of the slow-down, when 2
non-important lines of code were commented out, the program worked
perfectly.

if(vfilter.get() and d_filter and filt):
num=round((num-(d_filter[index/2])))

I don't know why these lines of code are causing a slow-down, but, it
is for some reason. What I will do is just simply combine these 3
variables (Which won't move image to image) into one variable, that
will not have the problem that I've seen. But, thanks for all of your
help!

Jan 30 '06 #30

Possible memory leak?

Similar topics