By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,652 Members | 1,694 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,652 IT Pros & Developers. It's quick & easy.

Memory leak in Python

P: n/a
I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am assuming
is that I am running out of memory.

What should I do to make sure that my code runs fine without becoming
unstable. How should I address the memory leak problem if any ? I have
a gig of RAM.

Every help is appreciated.

May 9 '06 #1
Share this Question
Share on Google+
18 Replies


P: n/a
Can you paste an example of the code you're using?

May 10 '06 #2

P: n/a
how big is the set? 100MB, more? what are you doing with the set? do
you have a small example that can prove the set is causing the freeze?
I am not the sharpest tool in the shed but it sounds like you might be
multiplying your set in/directly either permanently or temporarily on
purpose or accident.

May 10 '06 #3

P: n/a
Its kinda 65o lines of code...not the best idea to paste the code.
co********@gmail.com wrote:
Can you paste an example of the code you're using?


May 10 '06 #4

P: n/a
di********@gmail.com enlightened us with:
I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am
assuming is that I am running out of memory.


Before acting on your assumptions, you need to verify them. Run 'top'
and hit 'M' to sort by memory usage. After that, use 'ulimit' to limit
the allowed memory usage, run your program again, and see if it stops
at some point due to memory problems.

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?
Frank Zappa
May 10 '06 #5

P: n/a
1) Review your design - You say you are processing a large data set,
just make sure you are not trying to store 3 versions. If you are
missing a design, create a flow chart or something that is true to the
code you have produced. You could probably even post the design if you
are brave enough.

2) Check your implementation - make sure you manage lists, arrays etc
correctly. You need to sever links (references) to objects for them to
get swept up. I know it is obvious but easy to do in a hasty implementation.

3) Verify and test problem characteristics, profilers, top etc. It is
hard for us to help you much without more info. Test your assumptions.

Problem solving and debugging is a process, not some mystic art. Though
sometime the Gremlins disappear after a pint or two :-)

p

di********@gmail.com wrote:
I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am assuming
is that I am running out of memory.

What should I do to make sure that my code runs fine without becoming
unstable. How should I address the memory leak problem if any ? I have
a gig of RAM.

Every help is appreciated.

May 10 '06 #6

P: n/a
di********@gmail.com wrote:
I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am assuming
is that I am running out of memory.

What should I do to make sure that my code runs fine without becoming
unstable. How should I address the memory leak problem if any ? I have
a gig of RAM.

Every help is appreciated.


Just a hint : if you're trying to load your whole "huge data set" in
memory, you're in for trouble whatever the language - for an example,
doing a 'buf = openedFile.read()' on a 100 gig file may not be a good
idea...

--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"
May 10 '06 #7

P: n/a
I am using Ubuntu Linux.

My program is a simulation program with four classes and it mimics bit
torrent file sharing systems on 2000 nodes. Now, each node has lot of
attributes and my program kinds of tries to keep tab of everything. As
I mentioned its a simulation program, it starts at time T=0 and goes on
untill all nodes have recieved all parts of the file(BitTorrent
concept). The ending time goes to thousands of seconds. In each sec I
process all the 2000 nodes.

Psuedo Code

Time = 0
while (True){
For all nodes in the system{
Process + computation
}
Time++
If (DownloadFinished == True) exit;
}
Dennis Lee Bieber wrote:
On 8 May 2006 18:15:02 -0700, di********@gmail.com declaimed the
following in comp.lang.python:
I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am assuming
is that I am running out of memory.

What should I do to make sure that my code runs fine without becoming
unstable. How should I address the memory leak problem if any ? I have
a gig of RAM.

Does the memory come back after the process exits?

You don't show any sample of code or data... Nor do you mention what
OS/processor is involved.

Many systems do not return /allocated/ memory to the OS until the
top-level process exits, even if the memory is "freed" from the
viewpoint of the process.
--
Wulfraed Dennis Lee Bieber KD6MOG
wl*****@ix.netcom.com wu******@bestiaria.com
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: we******@bestiaria.com)
HTTP://www.bestiaria.com/


May 10 '06 #8

P: n/a
di********@gmail.com enlightened us with:
My program is a simulation program with four classes and it mimics
bittorrent file sharing systems on 2000 nodes.


Wouldn't it be better to use an existing simulator? That way, you
won't have to do the stuff you don't want to think about, and focus on
the more interesting parts. There are plenty of discrete-event and
discrete-time simulators to choose from.

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?
Frank Zappa
May 10 '06 #9

P: n/a
di********@gmail.com wrote:
I am using Ubuntu Linux.

My program is a simulation program with four classes and it mimics bit
torrent file sharing systems on 2000 nodes. Now, each node has lot of
attributes and my program kinds of tries to keep tab of everything. As
I mentioned its a simulation program, it starts at time T=0 and goes on
untill all nodes have recieved all parts of the file(BitTorrent
concept). The ending time goes to thousands of seconds. In each sec I
process all the 2000 nodes.


Most likely you keep references to objects you don't need, so python
garbage collector cannot remove those objects. If you cannot figure it
out looking at the source code, you can gather some statistics to help
you, for example use module gc to iterate over all objects in your
program (gc.get_objects()) and find out objects of which type are
growing with each iteration.

May 10 '06 #10

P: n/a
di********@gmail.com wrote:
(top-post corrected)

bruno at modulix wrote:
di********@gmail.com wrote:
I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am assuming
is that I am running out of memory.

What should I do to make sure that my code runs fine without becoming
unstable. How should I address the memory leak problem if any ? I have
a gig of RAM.

Every help is appreciated.
Just a hint : if you're trying to load your whole "huge data set" in
memory, you're in for trouble whatever the language - for an example,
doing a 'buf = openedFile.read()' on a 100 gig file may not be a good
idea...


The amount of data I read in is actually small.


So the problem is probably elsewhere... Sorry, since you were talking
about huge dataset, the good old "read-whole-file-in-memory" antipattern
seemed an obvious guess.
If you see my algorithm above it deals with 2000 nodes and each node
has ot of attributes.

When I close the program my computer becomes stable and performs as
usual. I check the performance in Performance monitor and using "top"
and the total memory is being used and on top of that around half a gig
swap memory is also being used.

Please give some helpful pointers to overcome such memory errors.
A real memory leak would cause the memory usage to keep increasing as
long as your program is running. If this is not the case, it's not a
"memory error", but a design/program error. FWIW, apps like Zope can end
up using a whole lot of memory, but there's no known memory-leak problem
AFAIK. And believe me, a Zope app can end up managing a *really huge
lot* of objects (>= many thousands).
I revisited my code to find nothing so obvious which would let this
leak happen. How to kill cross references in the program.


Using weakref and/or gc might help.

FWIW, the default memory management in Python is based on
reference-counting. As long as anything keeps a reference to an object,
this object will stay alive. If you have lot of cross-references and
2000+ big objects, you may effectively end up eating all the ram and
more. The gc module can detect and manage some cyclic references (obj A
has a ref on obj B which has a ref on obj A). The weakref module uses
'proxy' references that let reference-counting do it's job (I guess the
doc will be much more explicit than me).

Another possible improvement could be to use the flyweight design
pattern to share memory for some attributes :

- a general (while somewhat Java-oriented) explanation:
http://www.exciton.cs.rice.edu/JavaR...ghtPattern.htm

- two Python exemples (the second being based on the first)
http://www.suttoncourtenay.org.uk/du...html#flyweight
http://push.cx/2006/python-flyweights

HTH
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"
May 10 '06 #11

P: n/a
Sure, are there any available simulators...since i am modifying some
stuff i thought of creating one of my own. But if you know some
exisiting simlators , those can be of great help to me.

Thanks

May 10 '06 #12

P: n/a
With 1024 nodes it runs fine...but takes around4 hours to run on AMD
3100.

May 10 '06 #13

P: n/a
I ran simulation for 128 nodes and used the following

oo = gc.get_objects()
print len(oo)

on every time step the number of objects are increasing. For 128 nodes
I had 1058177 objects.

I think I need to revisit the code and remove the references....but how
to do that. I am still a newbie coder and every help will be greatly
appreciated.

thanks

May 10 '06 #14

P: n/a
di********@gmail.com enlightened us with:
Sure, are there any available simulators...since i am modifying some
stuff i thought of creating one of my own. But if you know some
exisiting simlators , those can be of great help to me.


Don't know any by name, but I'm sure you can find some on Google. Do
you need a discrete-event or a discrete-time simulator?

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?
Frank Zappa
May 11 '06 #15

P: n/a
di********@gmail.com wrote:
Sure, are there any available simulators...since i am modifying some
stuff i thought of creating one of my own. But if you know some
exisiting simlators , those can be of great help to me.


http://simpy.sourceforge.net/
May 11 '06 #16

P: n/a
di********@gmail.com wrote:
I ran simulation for 128 nodes and used the following

oo = gc.get_objects()
print len(oo)

on every time step the number of objects are increasing. For 128 nodes
I had 1058177 objects.

I think I need to revisit the code and remove the references....but how
to do that. I am still a newbie coder and every help will be greatly
appreciated.


The next step is to find out what type of objects contributes to the
growth most of all, after that print several object of that type that
didn't exist on iteration N-1 but exist on iteration N

May 11 '06 #17

P: n/a
Either of it would do, I am creating a discrete time simulator.

May 11 '06 #18

P: n/a
In message <11**********************@g10g2000cwb.googlegroups .com>,
Serge Orlov <Se*********@gmail.com> writes
The next step is to find out what type of objects contributes to the
growth most of all,
Shame you aren't on Windows, as Python Memory Validator does all of
this.
after that print several object of that type that
didn't exist on iteration N-1 but exist on iteration N


And this, but for garbage collection generations.

Stephen
--
Stephen Kellett
Object Media Limited http://www.objmedia.demon.co.uk/software.html
Computer Consultancy, Software Development
Windows C++, Java, Assembler, Performance Analysis, Troubleshooting
May 12 '06 #19

This discussion thread is closed

Replies have been disabled for this discussion.