472,951 Members | 2,035 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,951 software developers and data experts.

Memory leak in Python

I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am assuming
is that I am running out of memory.

What should I do to make sure that my code runs fine without becoming
unstable. How should I address the memory leak problem if any ? I have
a gig of RAM.

Every help is appreciated.

May 9 '06 #1
18 2874
Can you paste an example of the code you're using?

May 10 '06 #2
how big is the set? 100MB, more? what are you doing with the set? do
you have a small example that can prove the set is causing the freeze?
I am not the sharpest tool in the shed but it sounds like you might be
multiplying your set in/directly either permanently or temporarily on
purpose or accident.

May 10 '06 #3
Its kinda 65o lines of code...not the best idea to paste the code.
co********@gmail.com wrote:
Can you paste an example of the code you're using?


May 10 '06 #4
di********@gmail.com enlightened us with:
I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am
assuming is that I am running out of memory.


Before acting on your assumptions, you need to verify them. Run 'top'
and hit 'M' to sort by memory usage. After that, use 'ulimit' to limit
the allowed memory usage, run your program again, and see if it stops
at some point due to memory problems.

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?
Frank Zappa
May 10 '06 #5
1) Review your design - You say you are processing a large data set,
just make sure you are not trying to store 3 versions. If you are
missing a design, create a flow chart or something that is true to the
code you have produced. You could probably even post the design if you
are brave enough.

2) Check your implementation - make sure you manage lists, arrays etc
correctly. You need to sever links (references) to objects for them to
get swept up. I know it is obvious but easy to do in a hasty implementation.

3) Verify and test problem characteristics, profilers, top etc. It is
hard for us to help you much without more info. Test your assumptions.

Problem solving and debugging is a process, not some mystic art. Though
sometime the Gremlins disappear after a pint or two :-)

p

di********@gmail.com wrote:
I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am assuming
is that I am running out of memory.

What should I do to make sure that my code runs fine without becoming
unstable. How should I address the memory leak problem if any ? I have
a gig of RAM.

Every help is appreciated.

May 10 '06 #6
di********@gmail.com wrote:
I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am assuming
is that I am running out of memory.

What should I do to make sure that my code runs fine without becoming
unstable. How should I address the memory leak problem if any ? I have
a gig of RAM.

Every help is appreciated.


Just a hint : if you're trying to load your whole "huge data set" in
memory, you're in for trouble whatever the language - for an example,
doing a 'buf = openedFile.read()' on a 100 gig file may not be a good
idea...

--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"
May 10 '06 #7
I am using Ubuntu Linux.

My program is a simulation program with four classes and it mimics bit
torrent file sharing systems on 2000 nodes. Now, each node has lot of
attributes and my program kinds of tries to keep tab of everything. As
I mentioned its a simulation program, it starts at time T=0 and goes on
untill all nodes have recieved all parts of the file(BitTorrent
concept). The ending time goes to thousands of seconds. In each sec I
process all the 2000 nodes.

Psuedo Code

Time = 0
while (True){
For all nodes in the system{
Process + computation
}
Time++
If (DownloadFinished == True) exit;
}
Dennis Lee Bieber wrote:
On 8 May 2006 18:15:02 -0700, di********@gmail.com declaimed the
following in comp.lang.python:
I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am assuming
is that I am running out of memory.

What should I do to make sure that my code runs fine without becoming
unstable. How should I address the memory leak problem if any ? I have
a gig of RAM.

Does the memory come back after the process exits?

You don't show any sample of code or data... Nor do you mention what
OS/processor is involved.

Many systems do not return /allocated/ memory to the OS until the
top-level process exits, even if the memory is "freed" from the
viewpoint of the process.
--
Wulfraed Dennis Lee Bieber KD6MOG
wl*****@ix.netcom.com wu******@bestiaria.com
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: we******@bestiaria.com)
HTTP://www.bestiaria.com/


May 10 '06 #8
di********@gmail.com enlightened us with:
My program is a simulation program with four classes and it mimics
bittorrent file sharing systems on 2000 nodes.


Wouldn't it be better to use an existing simulator? That way, you
won't have to do the stuff you don't want to think about, and focus on
the more interesting parts. There are plenty of discrete-event and
discrete-time simulators to choose from.

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?
Frank Zappa
May 10 '06 #9
di********@gmail.com wrote:
I am using Ubuntu Linux.

My program is a simulation program with four classes and it mimics bit
torrent file sharing systems on 2000 nodes. Now, each node has lot of
attributes and my program kinds of tries to keep tab of everything. As
I mentioned its a simulation program, it starts at time T=0 and goes on
untill all nodes have recieved all parts of the file(BitTorrent
concept). The ending time goes to thousands of seconds. In each sec I
process all the 2000 nodes.


Most likely you keep references to objects you don't need, so python
garbage collector cannot remove those objects. If you cannot figure it
out looking at the source code, you can gather some statistics to help
you, for example use module gc to iterate over all objects in your
program (gc.get_objects()) and find out objects of which type are
growing with each iteration.

May 10 '06 #10
di********@gmail.com wrote:
(top-post corrected)

bruno at modulix wrote:
di********@gmail.com wrote:
I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am assuming
is that I am running out of memory.

What should I do to make sure that my code runs fine without becoming
unstable. How should I address the memory leak problem if any ? I have
a gig of RAM.

Every help is appreciated.
Just a hint : if you're trying to load your whole "huge data set" in
memory, you're in for trouble whatever the language - for an example,
doing a 'buf = openedFile.read()' on a 100 gig file may not be a good
idea...


The amount of data I read in is actually small.


So the problem is probably elsewhere... Sorry, since you were talking
about huge dataset, the good old "read-whole-file-in-memory" antipattern
seemed an obvious guess.
If you see my algorithm above it deals with 2000 nodes and each node
has ot of attributes.

When I close the program my computer becomes stable and performs as
usual. I check the performance in Performance monitor and using "top"
and the total memory is being used and on top of that around half a gig
swap memory is also being used.

Please give some helpful pointers to overcome such memory errors.
A real memory leak would cause the memory usage to keep increasing as
long as your program is running. If this is not the case, it's not a
"memory error", but a design/program error. FWIW, apps like Zope can end
up using a whole lot of memory, but there's no known memory-leak problem
AFAIK. And believe me, a Zope app can end up managing a *really huge
lot* of objects (>= many thousands).
I revisited my code to find nothing so obvious which would let this
leak happen. How to kill cross references in the program.


Using weakref and/or gc might help.

FWIW, the default memory management in Python is based on
reference-counting. As long as anything keeps a reference to an object,
this object will stay alive. If you have lot of cross-references and
2000+ big objects, you may effectively end up eating all the ram and
more. The gc module can detect and manage some cyclic references (obj A
has a ref on obj B which has a ref on obj A). The weakref module uses
'proxy' references that let reference-counting do it's job (I guess the
doc will be much more explicit than me).

Another possible improvement could be to use the flyweight design
pattern to share memory for some attributes :

- a general (while somewhat Java-oriented) explanation:
http://www.exciton.cs.rice.edu/JavaR...ghtPattern.htm

- two Python exemples (the second being based on the first)
http://www.suttoncourtenay.org.uk/du...html#flyweight
http://push.cx/2006/python-flyweights

HTH
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"
May 10 '06 #11
Sure, are there any available simulators...since i am modifying some
stuff i thought of creating one of my own. But if you know some
exisiting simlators , those can be of great help to me.

Thanks

May 10 '06 #12
With 1024 nodes it runs fine...but takes around4 hours to run on AMD
3100.

May 10 '06 #13
I ran simulation for 128 nodes and used the following

oo = gc.get_objects()
print len(oo)

on every time step the number of objects are increasing. For 128 nodes
I had 1058177 objects.

I think I need to revisit the code and remove the references....but how
to do that. I am still a newbie coder and every help will be greatly
appreciated.

thanks

May 10 '06 #14
di********@gmail.com enlightened us with:
Sure, are there any available simulators...since i am modifying some
stuff i thought of creating one of my own. But if you know some
exisiting simlators , those can be of great help to me.


Don't know any by name, but I'm sure you can find some on Google. Do
you need a discrete-event or a discrete-time simulator?

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?
Frank Zappa
May 11 '06 #15
di********@gmail.com wrote:
Sure, are there any available simulators...since i am modifying some
stuff i thought of creating one of my own. But if you know some
exisiting simlators , those can be of great help to me.


http://simpy.sourceforge.net/
May 11 '06 #16
di********@gmail.com wrote:
I ran simulation for 128 nodes and used the following

oo = gc.get_objects()
print len(oo)

on every time step the number of objects are increasing. For 128 nodes
I had 1058177 objects.

I think I need to revisit the code and remove the references....but how
to do that. I am still a newbie coder and every help will be greatly
appreciated.


The next step is to find out what type of objects contributes to the
growth most of all, after that print several object of that type that
didn't exist on iteration N-1 but exist on iteration N

May 11 '06 #17
Either of it would do, I am creating a discrete time simulator.

May 11 '06 #18
In message <11**********************@g10g2000cwb.googlegroups .com>,
Serge Orlov <Se*********@gmail.com> writes
The next step is to find out what type of objects contributes to the
growth most of all,
Shame you aren't on Windows, as Python Memory Validator does all of
this.
after that print several object of that type that
didn't exist on iteration N-1 but exist on iteration N


And this, but for garbage collection generations.

Stephen
--
Stephen Kellett
Object Media Limited http://www.objmedia.demon.co.uk/software.html
Computer Consultancy, Software Development
Windows C++, Java, Assembler, Performance Analysis, Troubleshooting
May 12 '06 #19

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Adam Deutsch | last post by:
I would like to ask some advice about tracking down memory leaks in Python code. We have a python application running on Python 2.0.1 in an embedded Linux environment (kernel version 2.4.7). We...
10
by: Debian User | last post by:
Hi, I'm trying to discover a memory leak on a program of mine. I've taken several approaches, but the leak still resists to appear. First of all, I've tried to use the garbage collector to...
3
by: Guy | last post by:
Hi It might take me a little time to explain this but here goes. Firstly I'm not using the latest upto date python releases so my first plan is to try more upto date rels of python and win32...
2
by: Elbert Lev | last post by:
#When I'm running this script on my windows NT4.0 box, #every time dialog box is reopened there is memory growth 384K. #Bellow is the text I sent to Stephen Ferg (author of easygui) # I have...
10
by: Andrew Trevorrow | last post by:
No response to my last message, so I'll try a different tack... Does anyone know of, or even better, has anyone here written a C++ application for Mac/Windows that allows users to run Python...
0
by: nejucomo | last post by:
Hi folks, Quick Synopsis: A test script demonstrates a memory leak when I use pythonic extensions of my builtin types, but if I use the builtin types themselves there is no memory leak. ...
3
by: crazy420fingers | last post by:
I'm running a python program that simulates a wireless network protocol for a certain number of "frames" (measure of time). I've observed the following: 1. The memory consumption of the program...
0
by: Gabriel Genellina | last post by:
En Wed, 16 Apr 2008 13:10:42 -0300, rkmr.em@gmail.com <rkmr.em@gmail.comescribió: In principle pure Python code should have no memory leaks - if there are, they're on Python itself. Maybe...
4
by: Christian Heimes | last post by:
Gabriel Genellina schrieb: Pure Python code can cause memory leaks. No, that's not a bug in the interpreter but the fault of the developer. For example code that messes around with stack frames...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Sept 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.