469,602 Members | 2,080 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,602 developers. It's quick & easy.

slowdown with massive memory usage

I have a program which starts by reading a lot of data into various
dicts.

When I moved a function to create one such dict from near the beginning
of the program to a later time, that function slowed down by a factor
of 8-14: 38 sec at 15M memory usage, 570 sec at 144M, 330 sec at 200M.
Is there anything I can do to fix that?

When the program is running, the system has 18M free memory and is not
doing any swapping. `python -O' did not help.

Python: 2.2.3 configured with --enable-ipv6 --enable-unicode=ucs4
on i386-redhat-linux-gnu (Linux 2.4.21-15.0.2.ELsmp).
I'm not sure if upgrading to Python 2.3 is an option at the moment;
I'll check if necessary.

--
Hallvard
Jul 18 '05 #1
4 1317
On Sat, 30 Jul 2004, Hallvard B Furuseth wrote:
I have a program which starts by reading a lot of data into various
dicts.

When I moved a function to create one such dict from near the beginning
of the program to a later time, that function slowed down by a factor
of 8-14: 38 sec at 15M memory usage, 570 sec at 144M, 330 sec at 200M.
Is there anything I can do to fix that?

When the program is running, the system has 18M free memory and is not
doing any swapping. `python -O' did not help.

Python: 2.2.3 configured with --enable-ipv6 --enable-unicode=ucs4
on i386-redhat-linux-gnu (Linux 2.4.21-15.0.2.ELsmp).
I'm not sure if upgrading to Python 2.3 is an option at the moment;
I'll check if necessary.


Python 2.2 didn't use PyMalloc by default. This leaves Python at the
mercy of the platform malloc()/realloc()/free(), and Python has found
rough spots with nearly every platform's implementation of these - which
is why PyMalloc was written.

While it isn't certain that this is your problem, if you can rebuild your
Python interpreter to include PyMalloc (--with-pymalloc I think), you can
find out.

Be warned that there were some bugs in PyMalloc that were fixed before
Python 2.3 was released (when PyMalloc became a default option); as far as
I recall, these bugfixes were never backported to 2.2x. So I wouldn't
recommend running a 2.2.3 PyMalloc enabled interpreter in production
without seriously testing all your production code.

--
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: an*****@bullseye.apana.org.au (pref) | Snail: PO Box 370
an*****@pcug.org.au (alt) | Belconnen ACT 2616
Web: http://www.andymac.org/ | Australia
Jul 18 '05 #2
Istvan Albert wrote:
Hallvard B Furuseth wrote:
When I moved a function to create one such dict from near the beginning
of the program to a later time, that function slowed down by a factor
of 8-14: 38 sec at 15M memory usage, 570 sec at 144M, 330 sec at 200M.
I suspect there is more to it than just "moving". There must be a reason
for the reorganization and...


I ran it by hand since it was so slow - and then it wasn't slow.
So I timed it at different places in the program, and also checked
if it gave different results. It didn't.
check what other things are you doing and profile
your program http://docs.python.org/lib/profile.html


Thanks. I profiled the function with <hotshot.Profile>.runcall() and
print_stats():

The same functions are called, and they are called the same number of
times. A total of 2872425 function calls. Only the run times differ.

For example, this simple method (called 90440 times) slows down by a
factor of 7 according to the profiles:

class PgNumeric:
...
def __int__(self):
return int(self.__v / self.__sf)

__v and __sf are longs, so there is little room to mess up that one:-)
Debug output shows the same sequence of input values in each run:
self.__class__.__name__ = 'PgNumeric',
self.__dict__ = {
'_PgNumeric__v': <long integer increasing from 7735L to 260167L>,
'_PgNumeric__p': 12,
'_PgNumeric__sf': 1L,
'_PgNumeric__s': 0}.

It "only" slows down by 30% if I add
class PgNumeric(object):
__slots__ = ('_PgNumeric__p', '_PgNumeric__s',
'_PgNumeric__sf', '_PgNumeric__v')
but I don't know if such a change to pyPgSQL will be accepted, since
`object' disables the __coerce__() method. Well, I'll try.

--
Hallvard
Jul 18 '05 #3
Andrew MacIntyre wrote:
On Sat, 30 Jul 2004, Hallvard B Furuseth wrote:
I have a program which starts by reading a lot of data into various
dicts.

When I moved a function to create one such dict from near the beginning
of the program to a later time, that function slowed down by a factor
of 8-14: (...)

Python 2.2 didn't use PyMalloc by default. This leaves Python at the
mercy of the platform malloc()/realloc()/free(), and Python has found
rough spots with nearly every platform's implementation of these - which
is why PyMalloc was written.

While it isn't certain that this is your problem, if you can rebuild your
Python interpreter to include PyMalloc (--with-pymalloc I think), you can
find out.


Thanks. I'll check that when I get time. Until then, malloc gets the
blame until proven innocent, since profiling and test output turned out
nothing else that was different. (See my reply to Istvan.)
Be warned that there were some bugs in PyMalloc that were fixed before
Python 2.3 was released (when PyMalloc became a default option); as far as
I recall, these bugfixes were never backported to 2.2x. So I wouldn't
recommend running a 2.2.3 PyMalloc enabled interpreter in production
without seriously testing all your production code.


If PyMalloc helps, I'll push for an upgrade to 2.3. Thanks again.

--
Hallvard
Jul 18 '05 #4
On 01 Aug 2004 22:08:14 +0200, Hallvard B Furuseth <h.**********@usit.uio.no> wrote:
Andrew MacIntyre wrote:
On Sat, 30 Jul 2004, Hallvard B Furuseth wrote:
I have a program which starts by reading a lot of data into various
dicts.

When I moved a function to create one such dict from near the beginning
of the program to a later time, that function slowed down by a factor
of 8-14:

(...)

Python 2.2 didn't use PyMalloc by default. This leaves Python at the
mercy of the platform malloc()/realloc()/free(), and Python has found
rough spots with nearly every platform's implementation of these - which
is why PyMalloc was written.

While it isn't certain that this is your problem, if you can rebuild your
Python interpreter to include PyMalloc (--with-pymalloc I think), you can
find out.


Thanks. I'll check that when I get time. Until then, malloc gets the
blame until proven innocent, since profiling and test output turned out
nothing else that was different. (See my reply to Istvan.)
Be warned that there were some bugs in PyMalloc that were fixed before
Python 2.3 was released (when PyMalloc became a default option); as far as
I recall, these bugfixes were never backported to 2.2x. So I wouldn't
recommend running a 2.2.3 PyMalloc enabled interpreter in production
without seriously testing all your production code.


If PyMalloc helps, I'll push for an upgrade to 2.3. Thanks again.

Speculating broadly here, but have you considered possible cache effects?
I.e., instructions execute faster when they and their operands can be fetched
from the CPU cache, and similarly L2 cache is faster than RAM. What is in
the caches at any point depends on what has recently been executed, and
how different memory areas map into the cache, and that will probably depend
on where you have put things in your program and what order you call for its
execution (the OS kernel may also affect the cache via interrupt service routines
and/or multitasking etc, e.g., for downloading or playing music in the background
(which I doubt you did) ;-) ). If you have multiple CPUs they do better if they
don't work on each others' jobs too much, since a switch tends to mess up caching.
I think some older kernels don't take that into account, but maybe that's all
history by now.

In a loop, typically the first time through will show cacheloading overhead, and
the rest will benefit, with blips for interrupts or interpreter special effects
such as extending an allocation pool or garbage collecting. These get washed out
in big averages, or filtered out in best-of timings, but they can be seen if you
create a graphic that shows every timing (e.g. a raster of dots colored by time
if there's a lot of timings). (Of course you have to watch out that your data
capture doesn't cause overhead that invalidates your results. It can be tricky.)

Another effect that has shown up as mystery culprit in the past is CPU heating and
consequent automatic slowing of the clock to prevent damage, but that doesn't
seem that likely in this case.

Another way you could lose time is if your code gets into a new relationship in
time with some other code than yours. Just speculating in general here, but if you are
processing data coming from a disk or other i/o that has some natural clumping
to it in the OS, such as waiting for an interrupt that says the next cluster read
is ready to fill buffers from, and for one arrangement of your code that happened
just before you executed and the other way just after, then there would be a difference
in interfering cache effects due to OS activity. Also, if you do a succession of i/o
that can't physically happen back to back, then you should be able to gain by doing
some computing in between. OS buffering and disk caches mitigate this, but you can empty or stuff
them so they demand physical i/o, depending on your program. Moving such code would
presumably have an effect on overall timing.

Of course, if you are also doing multi-threaded stuff in your program, it's another ball game.
My USD.02

Regards,
Bengt Richter
Jul 18 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by tomvr | last post: by
reply views Thread by vojtech | last post: by
3 posts views Thread by tac-tics | last post: by
1 post views Thread by Jean-Paul Calderone | last post: by
reply views Thread by guillaume weymeskirch | last post: by
reply views Thread by devrayhaan | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.