472,347 Members | 2,357 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,347 software developers and data experts.

Looping-related Memory Leak

I am having a problem where a long-running function will cause a
memory leak / balloon for reasons I cannot figure out. Essentially, I
loop through a directory of pickled files, load them, and run some
other functions on them. In every case, each function uses only local
variables and I even made sure to use `del` on each variable at the
end of the loop. However, as the loop progresses the amount of memory
used steadily increases.

I had a related problem before where I would loop through a very large
data-set of files and cache objects that were used to parse or
otherwise operate on different files in the data-set. Once again,
only local variables were used in the cached object's methods. After
a while it got to the point where simply running these methods on the
data took so long that I had to terminate the process (think, first
iteration .01sec, 1000th iteration 10sec). The solution I found was
to cause the cached objects to become "stale" after a certain number
of uses and be deleted and re-instantiated.

However, in the current case, there is no caching being done at all.
Only local variables are involved. It would seem that over time
objects take up more memory even when there are no attributes being
added to them or altered. Has anyone experienced similar anomalies?
Is this behavior to be expected for some other reason? If not, is
there a common fix for it, i.e. manual GC or something?


Jun 27 '08 #1
7 3980
On Jun 26, 5:19 am, Tom Davis <binju...@gmail.comwrote:
I am having a problem where a long-running function will cause a
memory leak / balloon for reasons I cannot figure out. Essentially, I
loop through a directory of pickled files, load them, and run some
other functions on them. In every case, each function uses only local
variables and I even made sure to use `del` on each variable at the
end of the loop. However, as the loop progresses the amount of memory
used steadily increases.
Do you happen to be using a single Unpickler instance? If so, change
it to use a different instance each time. (If you just use the module-
level load function you are already using a different instance each
time.)

Unpicklers hold a reference to everything they've seen, which prevents
objects it unpickles from being garbage collected until it is
collected itself.
Carl Banks
Jun 27 '08 #2
Tom Davis wrote:
I am having a problem where a long-running function will cause a
memory leak / balloon for reasons I cannot figure out. Essentially, I
loop through a directory of pickled files, load them, and run some
other functions on them. In every case, each function uses only local
variables and I even made sure to use `del` on each variable at the
end of the loop. However, as the loop progresses the amount of memory
used steadily increases.

I had a related problem before where I would loop through a very large
data-set of files and cache objects that were used to parse or
otherwise operate on different files in the data-set. Once again,
only local variables were used in the cached object's methods. After
a while it got to the point where simply running these methods on the
data took so long that I had to terminate the process (think, first
iteration .01sec, 1000th iteration 10sec). The solution I found was
to cause the cached objects to become "stale" after a certain number
of uses and be deleted and re-instantiated.
Here the alleged "memory leak" is clearly the cache, and the slowdown is
caused by garbage collector. The solution is to turn it off with
gc.disable() during phases where your programm allocates huge amounts of
objects with the intent of keeping them for a longer time.
However, in the current case, there is no caching being done at all.
Only local variables are involved. It would seem that over time
objects take up more memory even when there are no attributes being
added to them or altered. Has anyone experienced similar anomalies?
Is this behavior to be expected for some other reason? If not, is
there a common fix for it, i.e. manual GC or something?
Unless you post a script demonstrating the leak I will assume you are
overlooking a reference that keeps your data alive -- whether it's a true
global or within a long-running function doesn't really matter.

Peter
Jun 27 '08 #3
On Jun 26, 5:38 am, Carl Banks <pavlovevide...@gmail.comwrote:
On Jun 26, 5:19 am, Tom Davis <binju...@gmail.comwrote:
I am having a problem where a long-running function will cause a
memory leak / balloon for reasons I cannot figure out. Essentially, I
loop through a directory of pickled files, load them, and run some
other functions on them. In every case, each function uses only local
variables and I even made sure to use `del` on each variable at the
end of the loop. However, as the loop progresses the amount of memory
used steadily increases.

Do you happen to be using a single Unpickler instance? If so, change
it to use a different instance each time. (If you just use the module-
level load function you are already using a different instance each
time.)

Unpicklers hold a reference to everything they've seen, which prevents
objects it unpickles from being garbage collected until it is
collected itself.

Carl Banks
Carl,

Yes, I was using the module-level unpickler. I changed it with little
effect. I guess perhaps this is my misunderstanding of how GC works.
For instance, if I have `a = Obj()` and run `a.some_method()` which
generates a highly-nested local variable that cannot be easily garbage
collected, it was my assumption that either (1) completing the method
call or (2) deleting the object instance itself would automatically
destroy any variables used by said method. This does not appear to be
the case, however. Even when a variable/object's scope is destroyed,
it would seem t hat variables/objects created within that scope cannot
always be reclaimed, depending on their complexity.

To me, this seems illogical. I can understand that the GC is
reluctant to reclaim objects that have many connections to other
objects and so forth, but once those objects' scopes are gone, why
doesn't it force a reclaim? For instance, I can use timeit to create
an object instance, run a method of it, then `del` the variable used
to store the instance, but each loop thereafter continues to require
more memory and take more time. 1000 runs may take .27 usec/pass
whereas 100000 takes 2 usec/pass (Average).
Jun 30 '08 #4
On Mon, 30 Jun 2008 10:55:00 -0700, Tom Davis wrote:
To me, this seems illogical. I can understand that the GC is
reluctant to reclaim objects that have many connections to other
objects and so forth, but once those objects' scopes are gone, why
doesn't it force a reclaim? For instance, I can use timeit to create
an object instance, run a method of it, then `del` the variable used
to store the instance, but each loop thereafter continues to require
more memory and take more time. 1000 runs may take .27 usec/pass
whereas 100000 takes 2 usec/pass (Average).
`del` just removes the name and one reference to that object. Objects are
only deleted when there's no reference to them anymore. Your example
sounds like you keep references to objects somehow that are accumulating.
Maybe by accident. Any class level bound mutables or mutable default
values in functions in that source code? Would be my first guess.

Ciao,
Marc 'BlackJack' Rintsch
Jun 30 '08 #5
On Jun 30, 1:55 pm, Tom Davis <binju...@gmail.comwrote:
On Jun 26, 5:38 am, Carl Banks <pavlovevide...@gmail.comwrote:
On Jun 26, 5:19 am, Tom Davis <binju...@gmail.comwrote:
I am having a problem where a long-running function will cause a
memory leak / balloon for reasons I cannot figure out. Essentially, I
loop through a directory of pickled files, load them, and run some
other functions on them. In every case, each function uses only local
variables and I even made sure to use `del` on each variable at the
end of the loop. However, as the loop progresses the amount of memory
used steadily increases.
Do you happen to be using a single Unpickler instance? If so, change
it to use a different instance each time. (If you just use the module-
level load function you are already using a different instance each
time.)
Unpicklers hold a reference to everything they've seen, which prevents
objects it unpickles from being garbage collected until it is
collected itself.
Carl Banks

Carl,

Yes, I was using the module-level unpickler. I changed it with little
effect. I guess perhaps this is my misunderstanding of how GC works.
For instance, if I have `a = Obj()` and run `a.some_method()` which
generates a highly-nested local variable that cannot be easily garbage
collected, it was my assumption that either (1) completing the method
call or (2) deleting the object instance itself would automatically
destroy any variables used by said method. This does not appear to be
the case, however. Even when a variable/object's scope is destroyed,
it would seem t hat variables/objects created within that scope cannot
always be reclaimed, depending on their complexity.

To me, this seems illogical. I can understand that the GC is
reluctant to reclaim objects that have many connections to other
objects and so forth, but once those objects' scopes are gone, why
doesn't it force a reclaim?

Are your objects involved in circular references, and do you have any
objects with a __del__ method? Normally objects are reclaimed when
the reference count goes to zero, but if there are cycles then the
reference count never reaches zero, and they remain alive until the
generational garbage collector makes a pass to break the cycle.
However, the generational collector doesn't break cycles that involve
objects with a __del__method.

Are you calling any C extensions that might be failing to decref an
object? There could be a memory leak.

Are you keeping a reference around somewhere. For example, appending
results to a list, and the result keeps a reference to all of your
unpickled data for some reason.
You know, we can throw out all these scenarios, but these suggestions
are just common pitfalls. If it doesn't look like one of these
things, you're going to have to do your own legwork to help isolate
what's causing the behavior. Then if needed you can come back to us
with more detailed information.

Start with your original function, and slowly remove functionality
from it until the bad behavior goes away. That will give you a clue
what's causing it.
Carl Banks
Jul 1 '08 #6
On Jun 30, 3:12 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
On Mon, 30 Jun 2008 10:55:00 -0700, Tom Davis wrote:
To me, this seems illogical. I can understand that the GC is
reluctant to reclaim objects that have many connections to other
objects and so forth, but once those objects' scopes are gone, why
doesn't it force a reclaim? For instance, I can use timeit to create
an object instance, run a method of it, then `del` the variable used
to store the instance, but each loop thereafter continues to require
more memory and take more time. 1000 runs may take .27 usec/pass
whereas 100000 takes 2 usec/pass (Average).

`del` just removes the name and one reference to that object. Objects are
only deleted when there's no reference to them anymore. Your example
sounds like you keep references to objects somehow that are accumulating.
Maybe by accident. Any class level bound mutables or mutable default
values in functions in that source code? Would be my first guess.

Ciao,
Marc 'BlackJack' Rintsch
Marc,

Thanks for the tips. A quick confirmation:

I took "class level bound mutables" to mean something like:

Class A(object):
SOME_MUTABLE = [1,2]
...

And "mutable default values" to mean:

...
def a(self, arg=[1,2]):
...

If this is correct, I have none of these. I understand your point
about the references, but in my `timeit` example the statement is as
simple as this:

import MyClass
a = MyClass()
del a

So, yes, it would seem that object references are piling up and not
being removed. This is entirely by accident. Is there some kind of
list somewhere that says "If your class has any of these attributes
(mutable defaults, class-level mutables, etc.) it may not be properly
dereferenced:"? My obvious hack around this is to only do X loops at a
time and make a cron to run the script over and over until all the
files have been processed, but I'd much prefer to make the code run as
intended. I ran a test overnight last night and found that at first a
few documents were handled per second, but when I woke up it had
slowed down so much that it took over an hour to process a single
document! The RAM usage went from 20mb at the start to over 300mb when
it should actually never use more than about 20mb because everything
is handled with local variables and new objects are instantiated for
each document. This is a serious problem.

Thanks,

Tom
Jul 1 '08 #7
On Jun 30, 8:24*pm, Carl Banks <pavlovevide...@gmail.comwrote:
On Jun 30, 1:55 pm, Tom Davis <binju...@gmail.comwrote:
On Jun 26, 5:38 am, Carl Banks <pavlovevide...@gmail.comwrote:
On Jun 26, 5:19 am, Tom Davis <binju...@gmail.comwrote:
I am having a problem where a long-running function will cause a
memory leak / balloon for reasons I cannot figure out. *Essentially, I
loop through a directory of pickled files, load them, and run some
other functions on them. *In every case, each function uses only local
variables and I even made sure to use `del` on each variable at the
end of the loop. *However, as the loop progresses the amount of memory
used steadily increases.
Do you happen to be using a single Unpickler instance? *If so, change
it to use a different instance each time. *(If you just use the module-
level load function you are already using a different instance each
time.)
Unpicklers hold a reference to everything they've seen, which prevents
objects it unpickles from being garbage collected until it is
collected itself.
Carl Banks
Carl,
Yes, I was using the module-level unpickler. *I changed it with little
effect. *I guess perhaps this is my misunderstanding of how GC works.
For instance, if I have `a = Obj()` and run `a.some_method()` which
generates a highly-nested local variable that cannot be easily garbage
collected, it was my assumption that either (1) completing the method
call or (2) deleting the object instance itself would automatically
destroy any variables used by said method. *This does not appear to be
the case, however. *Even when a variable/object's scope is destroyed,
it would seem t hat variables/objects created within that scope cannot
always be reclaimed, depending on their complexity.
To me, this seems illogical. *I can understand that the GC is
reluctant to reclaim objects that have many connections to other
objects and so forth, but once those objects' scopes are gone, why
doesn't it force a reclaim?

Are your objects involved in circular references, and do you have any
objects with a __del__ method? *Normally objects are reclaimed when
the reference count goes to zero, but if there are cycles then the
reference count never reaches zero, and they remain alive until the
generational garbage collector makes a pass to break the cycle.
However, the generational collector doesn't break cycles that involve
objects with a __del__method.
There are some circular references, but these are produced by objects
created by BeautifulSoup. I try to decompose all of them, but if
there's one part of the code to blame it's almost certainly this. I
have no objects with __del__ methods, at least none that I wrote.
Are you calling any C extensions that might be failing to decref an
object? *There could be a memory leak.
Perhaps. Yet another thing to look into.
Are you keeping a reference around somewhere. *For example, appending
results to a list, and the result keeps a reference to all of your
unpickled data for some reason.
No.
You know, we can throw out all these scenarios, but these suggestions
are just common pitfalls. *If it doesn't look like one of these
things, you're going to have to do your own legwork to help isolate
what's causing the behavior. *Then if needed you can come back to us
with more detailed information.

Start with your original function, and slowly remove functionality
from it until the bad behavior goes away. *That will give you a clue
what's causing it.
I realize this and thank you folks for your patience. I thought
perhaps there was something simple I was overlooking, but in this case
it would seem that there are dozens of things outside of my direct
control that could be causing this, most likely from third-party
libraries I am using. I will continue to try to debug this on my own
and see if I can figure anything out. Memory leaks and failing GC and
so forth are all new concerns for me.

Thanks Again,

Tom

Jul 1 '08 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

45
by: Trevor Best | last post by:
I did a test once using a looping variable, first dimmed as Integer, then as Long. I found the Integer was quicker at looping. I knew this to be...
1
by: Diva | last post by:
Hi, I have a data grid in my application. It has 20 rows and I have set the page size as 5. I have a Submit button on my form and when I click...
0
by: anthon | last post by:
Hi all - first post! anywho; I need to create a function for speeding up and down a looping clip. imagine a rotating object, triggered by...
20
by: Ifoel | last post by:
Hi all, Sorry im beginer in vb. I want making programm looping character or number. Just say i have numbers from 100 to 10000. just sample: ...
2
by: Davaa | last post by:
Dear all, I am a student making a MS Form application in C++. I would ask a question about "Timer". Sample code which I am developing is below. ...
0
by: concettolabs | last post by:
In today's business world, businesses are increasingly turning to PowerApps to develop custom business applications. PowerApps is a powerful tool...
0
by: teenabhardwaj | last post by:
How would one discover a valid source for learning news, comfort, and help for engineering designs? Covering through piles of books takes a lot of...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was...
2
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...
0
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.