473,395 Members | 2,006 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Tracking down memory leaks?

I have an application with one function called "compute", which given a
filename, goes through that file and performs various statistical
analyses. It uses arrays extensively and loops alot. it prints the
results of it's statistical significance tests to standard out. Since
the compute function returns and I think no variables of global scope
are being used, I would think that when it does, all memory returns
back to the operating system.

Instead, what I see is that every iteration uses several megs more.
For example, python uses 52 megs when starting out, it goes through
several iterations and I'm suddenly using more than 500 megs of ram.

Does anyone have any pointers on how to figure out what I'm doing
wrong?

Thanks,
mohan

Feb 12 '06 #1
14 2701
Em Dom, 2006-02-12 Ã*s 05:11 -0800, MKoool escreveu:
I have an application with one function called "compute", which given a
filename, goes through that file and performs various statistical
analyses. It uses arrays extensively and loops alot. it prints the
results of it's statistical significance tests to standard out. Since
the compute function returns and I think no variables of global scope
are being used, I would think that when it does, all memory returns
back to the operating system.

Instead, what I see is that every iteration uses several megs more.
For example, python uses 52 megs when starting out, it goes through
several iterations and I'm suddenly using more than 500 megs of ram.

Does anyone have any pointers on how to figure out what I'm doing
wrong?
Have you tried to force a garbage collection? Try, for example, running
gc.collect() everytime the function returns. See
http://www.python.org/doc/current/lib/module-gc.html for more details.
Thanks,
mohan


Cya,
Felipe.

--
"Quem excele em empregar a força militar subjulga os exércitos dos
outros povos sem travar batalha, toma cidades fortificadas dos outros
povos sem as atacar e destrói os estados dos outros povos sem lutas
prolongadas. Deve lutar sob o Céu com o propósito primordial da
'preservação'. Desse modo suas armas não se embotarão, e os ganhos
poderão ser preservados. Essa é a estratégia para planejar ofensivas."

-- Sun Tzu, em "A arte da guerra"

Feb 12 '06 #2
I *think* Python uses reference counting for garbage collection. I've
heard talk of people wanting to change this (to mark and sweep?).
Anyway, Python stores a counter with each object. Everytime you make a
reference to an object this counter is increased. Everytime a pointer
to the object is deleteted or reassigned the counter is decreased.
When the counter reaches zero the object is freed from memory. A flaw
with this algorithm is that if you create a circular reference the
object will never be freed. A linked list where the tail points to the
head will have a reference count of 1 for each node, after the head
pointer is deleted. So the list is never freed. Make sure you are not
creating a circular reference. Something like this:

a = [1, 2, 3, 4, 5, 6]
b = ['a', 'b', 'c', 'd']
c = [10, 20, 30, 40]

a[3] = b
b[1] = c
c[0] = a

the last assignment creates a circular refence, and until it is
removed, non of these objects will be removed from memory.

I'm not an expert on python internals, and it is possible that they
have a way of checking for cases like this. I think the deepcopy
method catches this, but I don't *think* basic garbage collection look
for this sort of thing.

David

Feb 12 '06 #3
On Sun, 12 Feb 2006 05:11:02 -0800, MKoool wrote:
I have an application with one function called "compute", which given a
filename, goes through that file and performs various statistical
analyses. It uses arrays extensively and loops alot. it prints the
results of it's statistical significance tests to standard out. Since
the compute function returns and I think no variables of global scope
are being used, I would think that when it does, all memory returns
back to the operating system.
I may be mistaken, and if so I will welcome the correction, but Python
does not return memory to the operating system until it terminates.

Objects return memory to Python when they are garbage collected, but not
the OS.

Instead, what I see is that every iteration uses several megs more.
For example, python uses 52 megs when starting out, it goes through
several iterations and I'm suddenly using more than 500 megs of ram.

Does anyone have any pointers on how to figure out what I'm doing
wrong?


How big is the file you are reading in? If it is (say) 400 MB, then it is
hardly surprising that you will be using 500MB of RAM. If the file is 25K,
that's another story.

How are you storing your data while you are processing it? I'd be looking
for hidden duplicates.

I suggest you re-factor your program. Instead of one giant function, break
it up into lots of smaller ones, and call them from compute. Yes, this
will use a little more memory, which might sound counter-productive at the
moment when you are trying to use less memory, but in the long term it
will allow your computer to use memory more efficiently (it is easier to
page small functions as they are needed than one giant function), and it
will be much easier for you to write and debug when you can isolate
individual pieces of the task in individual functions.

Re-factoring will have another advantage: you might just find the problem
on your own.
--
Steven.

Feb 12 '06 #4
"dm**************@yahoo.com" wrote:
I'm not an expert on python internals, and it is possible that they
have a way of checking for cases like this. I think the deepcopy
method catches this, but I don't *think* basic garbage collection look
for this sort of thing.


http://www.python.org/doc/faq/genera...-manage-memory

</F>

Feb 12 '06 #5

MKoool wrote:
I have an application with one function called "compute", which given a
filename, goes through that file and performs various statistical
analyses. It uses arrays extensively and loops alot. it prints the
results of it's statistical significance tests to standard out. Since
the compute function returns and I think no variables of global scope
are being used, I would think that when it does, all memory returns
back to the operating system.

Instead, what I see is that every iteration uses several megs more.
For example, python uses 52 megs when starting out, it goes through
several iterations and I'm suddenly using more than 500 megs of ram.

Does anyone have any pointers on how to figure out what I'm doing
wrong?
Are you importing any third party modules? It's not unheard of that
someone else's code has a memory leak.

Thanks,
mohan


Feb 12 '06 #6
On Sun, 12 Feb 2006 06:01:55 -0800, dm**************@yahoo.com wrote:
I *think* Python uses reference counting for garbage collection.
Yes it does, with special code for detecting and collecting circular
references.
I've
heard talk of people wanting to change this (to mark and sweep?).
Reference counting is too simple to be cool *wink*

[snip] Make sure you are not creating
a circular reference. Something like this:

a = [1, 2, 3, 4, 5, 6]
b = ['a', 'b', 'c', 'd']
c = [10, 20, 30, 40]

a[3] = b
b[1] = c
c[0] = a

the last assignment creates a circular refence, and until it is removed,
non of these objects will be removed from memory.
I believe Python now handles this sort of situation very well now.

I'm not an expert on python internals, and it is possible that they have
a way of checking for cases like this. I think the deepcopy method
catches this, but I don't *think* basic garbage collection look for this
sort of thing.


deepcopy has nothing to do with garbage collection.

This is where you use deepcopy:

py> a = [2, 4, [0, 1, 2], 8] # note the nested list
py> b = a # b and a both are bound to the same list
py> b is a # b is the same list as a, not just a copy
True
py> c = a[:] # make a shallow copy of a
py> c is a # c is a copy of a, not a itself
False
py> c[2] is a[2] # but both a and c include the same nested list
True

What if you want c to include a copy of the nested list? That's where you
use deepcopy:

py> d = copy.deepcopy(a)
py> d[2] is a[2]
False

--
Steven.

Feb 12 '06 #7

me********@aol.com wrote:
MKoool wrote:
I have an application with one function called "compute", which given a
filename, goes through that file and performs various statistical
analyses. It uses arrays extensively and loops alot. it prints the
results of it's statistical significance tests to standard out. Since
the compute function returns and I think no variables of global scope
are being used, I would think that when it does, all memory returns
back to the operating system.

Instead, what I see is that every iteration uses several megs more.
For example, python uses 52 megs when starting out, it goes through
several iterations and I'm suddenly using more than 500 megs of ram.

Does anyone have any pointers on how to figure out what I'm doing
wrong?


Are you importing any third party modules? It's not unheard of that
someone else's code has a memory leak.


- sounds like you're working with very large, very sparse matrices,
running LSI/SVD or a PCA/covariance analysis, something like that. So
it's a specialized problem, you need to specify what libs you're using,
what your platform / O/S is, py release, how you installed it, details
about C estensions, pyrex/psyco/swig, the more info you supply, the
mroe you get back.

- be aware there's wrong ways to measure memory, e.g. this long thread:
http://mail.python.org/pipermail/pyt...er/310121.html

Feb 12 '06 #8
>> I'm not an expert on python internals, and it is possible that they have
a way of checking for cases like this. I think the deepcopy method
catches this, but I don't *think* basic garbage collection look for this
sort of thing.


deepcopy has nothing to do with garbage collection.

This is where you use deepcopy:

py> a = [2, 4, [0, 1, 2], 8] # note the nested list
py> b = a # b and a both are bound to the same list
py> b is a # b is the same list as a, not just a copy
True
py> c = a[:] # make a shallow copy of a
py> c is a # c is a copy of a, not a itself
False
py> c[2] is a[2] # but both a and c include the same nested list
True

What if you want c to include a copy of the nested list? That's where you
use deepcopy:

py> d = copy.deepcopy(a)
py> d[2] is a[2]
False


What I ment is that deepcopy is recursive, and if you have a circular
reference in your data structure a recursive copy will become infinite.
I think deepcopy has the ability to detect this situation. So if it
could be detected for deepcopy, I don't see why it could not be
detected for garbage collection purposes.

David

Feb 12 '06 #9
Hi Steven,

Is there any way for making Python return memory no longer needed to
the OS? Cases may arise where you indeed need a big memory block
temporarily without being able to split it up into smaller chunks.
Thank you.
malv

Steven D'Aprano wrote:
Objects return memory to Python when they are garbage collected, but not
the OS.


Feb 12 '06 #10
malv:
Is there any way for making Python return memory no longer needed to
the OS? Cases may arise where you indeed need a big memory block
temporarily without being able to split it up into smaller chunks.


That's not really necessary. On any decent OS it's just unused address
space, that doesn't consume any physical memory.

And when your process runs out of address space, you should program more
carefully :-)

--
René Pijlman
Feb 12 '06 #11
MKoool wrote:
I have an application with one function called "compute", which given a
filename, goes through that file and performs various statistical
analyses. It uses arrays extensively and loops alot. it prints the
results of it's statistical significance tests to standard out. Since
the compute function returns and I think no variables of global scope
are being used, I would think that when it does, all memory returns
back to the operating system.

Instead, what I see is that every iteration uses several megs more.
For example, python uses 52 megs when starting out, it goes through
several iterations and I'm suddenly using more than 500 megs of ram.

Does anyone have any pointers on how to figure out what I'm doing
wrong?


if gc.collect() doesn't help:

maybe objects of extension libs are not freed correctly.

And Python has a real skeleton in the cupboard:

a known problem with python object/libs when classes with __del__ are
involved ( Once suffered myself from such tremendous "unexplainable"
memory blow up until I found this "del gc.garbage[:]" remedy:

<http://www.python.org/doc/current/lib/module-gc.html>
garbage
A list of objects which the collector found to be unreachable but
could not be freed (uncollectable objects). By default, this list
contains only objects with __del__() methods.3.1Objects that have
__del__() methods and are part of a reference cycle cause the entire
reference cycle to be uncollectable, including objects not necessarily
in the cycle but reachable only from it. Python doesn't collect such
cycles automatically because, in general, it isn't possible for Python
to guess a safe order in which to run the __del__() methods. If you know
a safe order, you can force the issue by examining the garbage list, and
explicitly breaking cycles due to your objects within the list. Note
that these objects are kept alive even so by virtue of being in the
garbage list, so they should be removed from garbage too. For example,
after breaking cycles, do del gc.garbage[:] to empty the list. It's
generally better to avoid the issue by not creating cycles containing
objects with __del__() methods, and garbage can be examined in that case
to verify that no such cycles are being created.
Robert
Feb 12 '06 #12
> How big is the file you are reading in? If it is (say) 400 MB, then it is
hardly surprising that you will be using 500MB of RAM. If the file is 25K,
that's another story.
Actually, I am downloading the matrix data from a file on a server on
the net using urllib2, and then I am running several basic stats on it
using some functions that i get from matplotlib. Most are statistical
functions I run on standard vectors, such as standard deviation, mean,
median, etc. I do then loop through various matrix items, and then
based on a set of criteria, I attempt to perform a sort of linear
regression model using a few loops on the vectors.
How are you storing your data while you are processing it? I'd be looking
for hidden duplicates.


I am storing basically everything as a set of vectors. For example, I
would have one vector for my X-axis, time. The other variables are the
number of units sold and the total aggregate revenue from selling all
units.

I am wondering if it's actually urllib2 that is messing me up. It
could be matplotlib as well, although I doubt it since I do not use
matplotlib unless the statistical significance test I produce indicates
a high level of strength (very rare), indicating to me that the company
has a "winning" product.

Feb 13 '06 #13
Steven D'Aprano <st***@REMOVETHIScyber.com.au> writes:
On Sun, 12 Feb 2006 05:11:02 -0800, MKoool wrote: [...] I may be mistaken, and if so I will welcome the correction, but Python
does not return memory to the operating system until it terminates.

Objects return memory to Python when they are garbage collected, but not
the OS.

[...]

http://groups.google.com/group/comp....ea1c569a65e13e
John
Feb 13 '06 #14
On 12 Feb 2006 05:11:02 -0800, rumours say that "MKoool"
<mo**********@gmail.com> might have written:
I have an application with one function called "compute", which given a
filename, goes through that file and performs various statistical
analyses. It uses arrays extensively and loops alot. it prints the
results of it's statistical significance tests to standard out. Since
the compute function returns and I think no variables of global scope
are being used, I would think that when it does, all memory returns
back to the operating system.
Would your program work if you substituted collections.deque for the arrays
(did you mean array.arrays or lists?)? Please test.
Instead, what I see is that every iteration uses several megs more.
For example, python uses 52 megs when starting out, it goes through
several iterations and I'm suddenly using more than 500 megs of ram.
If your algorithms can work with the collections.deque container, can you
please check that the memory use pattern changes?
Does anyone have any pointers on how to figure out what I'm doing
wrong?


I suspect that you have more than one large arrays (lists?) that
continuously grow.

It would be useful if you ran your program on a fairly idle machine and had
a way to see if the consumed memory seems to be swapped out without being
swapped in eventually.
--
TZOTZIOY, I speak England very best.
"Dear Paul,
please stop spamming us."
The Corinthians
Feb 14 '06 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Valerie Hough | last post by:
Currently our client runs one particular C# .NET executable, and after a few hours performance slows to a crawl. It would be very useful for me to be able to rule in (or out) the possibility that...
9
by: Frank1213 | last post by:
In my ASP.NET application, we are noticing appreciable memory leaks and the main culprit appears to be System.String We use ResourceReader to read in a resource file and we close and dispose the...
2
by: James Hunter Ross | last post by:
Friends, I've been watching or W3WP process size (using Task Manager), and it grows and grows when using our application. I am the only one on our server. I'll end my session, either through...
10
by: darkStar_e2 | last post by:
Hi guys. I have an applications which unfortunately eating up all the computer resources. the this is that it is not releasing the resources (memory) that it has eaten after it was closed... how...
6
by: depkefamily | last post by:
I have a large C++ program using multiple vendors DLLs which is giving me a major headache. Under release mode I get exceptions thrown from what looks like a dereferenced corrupt pointer. The...
5
by: cnickl | last post by:
Some time ago I was looking for a cheap or free way to check an application written in C# and some unsafe code for memory leaks. Someone in this forum suggested DevPartner form Compuware, telling...
4
by: p1r0 | last post by:
Hi I was wondering if any of you could recommend me a good tool for testing for memry leaks under Windows and/or Linux Thanks in advance p1r0
3
by: Jim Land | last post by:
Jack Slocum claims here http://www.jackslocum.com/yui/2006/10/02/3-easy-steps-to-avoid-javascript- memory-leaks/ that "almost every site you visit that uses JavaScript is leaking memory". ...
22
by: Frank Rizzo | last post by:
I have an object tree that is pretty gigantic and it holds about 100mb of data. When I set the top object to null, I expect that the .NET framework will clean up the memory at some point. ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.