473,396 Members | 1,714 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Garbage collection

Hi all

I suspect I may be missing something vital here, but Python's garbage
collection doesn't seem to work as I expect it to. Here's a small test
program which shows the problem on python 2.4 and 2.5:

$ python2.5
Python 2.5 (release25-maint, Dec 9 2006, 15:33:01)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-20)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
(at this point, Python is using 15MB)
>>a = range(int(1e7))
(at this point, Python is using 327MB)
>>a = None
(at this point, Python is using 251MB)
>>import gc
gc.collect()
0
>>>
(at this point, Python is using 252MB)
Is there something I've forgotten to do? Why is Python still using such a
lot of memory?
Thanks!

--
I'm at CAMbridge, not SPAMbridge
Mar 21 '07 #1
22 3142
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Wright wrote:
Hi all

I suspect I may be missing something vital here, but Python's garbage
collection doesn't seem to work as I expect it to. Here's a small test
program which shows the problem on python 2.4 and 2.5:
................ skip .....................
(at this point, Python is using 252MB)
Is there something I've forgotten to do? Why is Python still using such a
lot of memory?
Thanks!
How do you know amount of memory used by Python?
ps ˇB top or something?

- --
Thinker Li - th*****@branda.to th********@gmail.com
http://heaven.branda.to/~thinker/GinGin_CGI.py
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGATUI1LDUVnWfY8gRAhy9AKDTA2vZYkF7ZLl9Ufy4i+ onVSmWhACfTAOv
PdQn/V1ppnaKAhdrblA3y+0=
=dmnr
-----END PGP SIGNATURE-----

Mar 21 '07 #2
Thinker wrote:
How do you know amount of memory used by Python?
ps ? top or something?
$ ps up `pidof python2.5`
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
tew24 26275 0.0 11.9 257592 243988 pts/6 S+ 13:10 0:00 python2.5

"VSZ" is "Virtual Memory Size" (ie. total memory used by the application)
"RSS" is "Resident Set Size" (ie. non-swapped physical memory)
--
I'm at CAMbridge, not SPAMbridge
Mar 21 '07 #3
TomI suspect I may be missing something vital here, but Python's
Tomgarbage collection doesn't seem to work as I expect it to. Here's
Toma small test program which shows the problem on python 2.4 and 2.5:

Tom(at this point, Python is using 15MB)
>>a = range(int(1e7))
a = None
import gc
gc.collect()
0

Tom(at this point, Python is using 252MB)

TomIs there something I've forgotten to do? Why is Python still using
Tomsuch a lot of memory?

You haven't forgotten to do anything. Your attempts at freeing memory are
being thwarted (in part, at least) by Python's int free list. I believe the
int free list remains after the 10M individual ints' refcounts drop to zero.
The large storage for the list is grabbed in one gulp and thus mmap()d I
believe, so it is reclaimed by being munmap()d, hence the drop from 320+MB
to 250+MB.

I haven't looked at the int free list or obmalloc implementations in awhile,
but if the free list does return any of its memory to the system it probably
just calls the free() library function. Whether or not the system actually
reclaims any memory from your process is dependent on the details of the
malloc/free implementation's details. That is, the behavior is outside
Python's control.

Skip
Mar 21 '07 #4
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Wright wrote:
Thinker wrote:
>How do you know amount of memory used by Python? ps ? top or
something?

$ ps up `pidof python2.5` USER PID %CPU %MEM VSZ RSS TTY
STAT START TIME COMMAND tew24 26275 0.0 11.9 257592 243988
pts/6 S+ 13:10 0:00 python2.5

"VSZ" is "Virtual Memory Size" (ie. total memory used by the
application) "RSS" is "Resident Set Size" (ie. non-swapped physical
memory)

This is amount of memory allocate by process not Python interpreter.
It is managemed by
malloc() of C library. When you free a block memory by free()
function, it only return
the memory to C library for later use, but C library not always return
the memory to
the kernel.

Since there is a virtual memory for modem OS, inactive memory will be
paged
to pager when more physical memory blocks are need. It don't hurt
much if you have enough
swap space.

What you get from ps command is memory allocated by process, it don't
means
they are used by Python interpreter.

- --
Thinker Li - th*****@branda.to th********@gmail.com
http://heaven.branda.to/~thinker/GinGin_CGI.py
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGATzJ1LDUVnWfY8gRAjSOAKC3uzoAWBow0VN77srjR5 eBF0kXawCcCUYv
0RgdHNHqWMEn2Ap7zQuOFaQ=
=/hWg
-----END PGP SIGNATURE-----

Mar 21 '07 #5
sk**@pobox.com wrote:
You haven't forgotten to do anything. Your attempts at freeing memory are
being thwarted (in part, at least) by Python's int free list. I believe
the int free list remains after the 10M individual ints' refcounts drop to
zero. The large storage for the list is grabbed in one gulp and thus
mmap()d I believe, so it is reclaimed by being munmap()d, hence the drop
from 320+MB to 250+MB.

I haven't looked at the int free list or obmalloc implementations in
awhile, but if the free list does return any of its memory to the system
it probably just calls the free() library function. Whether or not the
system actually reclaims any memory from your process is dependent on the
details of themalloc/free implementation's details. That is, the behavior
is outside Python's control.
Ah, thanks for explaining that. I'm a little wiser about memory allocation
now, but am still having problems reclaiming memory from unused objects
within Python. If I do the following:
>>>
(memory use: 15 MB)
>>a = range(int(4e7))
(memory use: 1256 MB)
>>a = None
(memory use: 953 MB)

....and then I allocate a lot of memory in another process (eg. open a load
of files in the GIMP), then the computer swaps the Python process out to
disk to free up the necessary space. Python's memory use is still reported
as 953 MB, even though nothing like that amount of space is needed. From
what you said above, the problem is in the underlying C libraries, but is
there anything I can do to get that memory back without closing Python?

--
I'm at CAMbridge, not SPAMbridge
Mar 21 '07 #6

Tom...and then I allocate a lot of memory in another process (eg. open
Toma load of files in the GIMP), then the computer swaps the Python
Tomprocess out to disk to free up the necessary space. Python's
Tommemory use is still reported as 953 MB, even though nothing like
Tomthat amount of space is needed. From what you said above, the
Tomproblem is in the underlying C libraries, but is there anything I
Tomcan do to get that memory back without closing Python?

Not really. I suspect the unused pages of your Python process are paged
out, but that Python has just what it needs to keep going. Memory
contention would be a problem if your Python process wanted to keep that
memory active at the same time as you were running GIMP. I think the
process's resident size is more important here than virtual memory size (as
long as you don't exhaust swap space).

Skip
Mar 21 '07 #7
sk**@pobox.com wrote:
Tom...and then I allocate a lot of memory in another process (eg.
open Toma load of files in the GIMP), then the computer swaps the
Python
Tomprocess out to disk to free up the necessary space. Python's
Tommemory use is still reported as 953 MB, even though nothing like
Tomthat amount of space is needed. From what you said above, the
Tomproblem is in the underlying C libraries, but is there anything I
Tomcan do to get that memory back without closing Python?

Not really. I suspect the unused pages of your Python process are paged
out, but that Python has just what it needs to keep going.
Yes, that's what's happening.
Memory contention would be a problem if your Python process wanted to keep
that memory active at the same time as you were running GIMP.
True, but why does Python hang on to the memory at all? As I understand it,
it's keeping a big lump of memory on the int free list in order to make
future allocations of large numbers of integers faster. If that memory is
about to be paged out, then surely future allocations of integers will be
*slower*, as the system will have to:

1) page out something to make room for the new integers
2) page in the relevant chunk of the int free list
3) zero all of this memory and do any other formatting required by Python

If Python freed (most of) the memory when it had finished with it, then all
the system would have to do is:

1) page out something to make room for the new integers
2) zero all of this memory and do any other formatting required by Python

Surely Python should free the memory if it's not been used for a certain
amount of time (say a few seconds), as allocation times are not going to be
the limiting factor if it's gone unused for that long. Alternatively, it
could mark the memory as some sort of cache, so that if it needed to be
paged out, it would instead be de-allocated (thus saving the time taken to
page it back in again when it's next needed)

I think the process's resident size is more important here than virtual
memory size (as long as you don't exhaust swap space).
True in theory, but the computer does tend to go rather sluggish when paging
large amounts out to disk and back. Surely the use of virtual memory
should be avoided where possible, as it is so slow? This is especially
true when the contents of the blocks paged out to disk will never be read
again.
I've also tested similar situations on Python under Windows XP, and it shows
the same behaviour, so I think this is a Python and/or GCC/libc issue,
rather than an OS issue (assuming Python for linux and Python for windows
are both compiled with GCC).

--
I'm at CAMbridge, not SPAMbridge
Mar 21 '07 #8
Tom Wright wrote:
sk**@pobox.com wrote:
> Tom...and then I allocate a lot of memory in another process (eg.
open Toma load of files in the GIMP), then the computer swaps the
Python
Tomprocess out to disk to free up the necessary space. Python's
Tommemory use is still reported as 953 MB, even though nothing like
Tomthat amount of space is needed. From what you said above, the
Tomproblem is in the underlying C libraries, but is there anything I
Tomcan do to get that memory back without closing Python?

Not really. I suspect the unused pages of your Python process are paged
out, but that Python has just what it needs to keep going.

Yes, that's what's happening.
>Memory contention would be a problem if your Python process wanted to keep
that memory active at the same time as you were running GIMP.

True, but why does Python hang on to the memory at all? As I understand it,
it's keeping a big lump of memory on the int free list in order to make
future allocations of large numbers of integers faster. If that memory is
about to be paged out, then surely future allocations of integers will be
*slower*, as the system will have to:

1) page out something to make room for the new integers
2) page in the relevant chunk of the int free list
3) zero all of this memory and do any other formatting required by Python

If Python freed (most of) the memory when it had finished with it, then all
the system would have to do is:

1) page out something to make room for the new integers
2) zero all of this memory and do any other formatting required by Python

Surely Python should free the memory if it's not been used for a certain
amount of time (say a few seconds), as allocation times are not going to be
the limiting factor if it's gone unused for that long. Alternatively, it
could mark the memory as some sort of cache, so that if it needed to be
paged out, it would instead be de-allocated (thus saving the time taken to
page it back in again when it's next needed)
Easy to say. How do you know the memory that's not in use is in a
contiguous block suitable for return to the operating system? I can
pretty much guarantee it won't be. CPython doesn't use a relocating
garbage collection scheme, so objects always stay at the same place in
the process's virtual memory unless they have to be grown to accommodate
additional data.
>
>I think the process's resident size is more important here than virtual
memory size (as long as you don't exhaust swap space).

True in theory, but the computer does tend to go rather sluggish when paging
large amounts out to disk and back. Surely the use of virtual memory
should be avoided where possible, as it is so slow? This is especially
true when the contents of the blocks paged out to disk will never be read
again.
Right. So all we have to do is identify those portions of memory that
will never be read again and return them to the OS. That should be easy.
Not.
>
I've also tested similar situations on Python under Windows XP, and it shows
the same behaviour, so I think this is a Python and/or GCC/libc issue,
rather than an OS issue (assuming Python for linux and Python for windows
are both compiled with GCC).
It's probably a dynamic memory issue. Of course if you'd like to provide
a patch to switch it over to a relocating garbage collection scheme
we'll all await it with bated breath :)

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com

Mar 21 '07 #9

TomTrue, but why does Python hang on to the memory at all? As I
Tomunderstand it, it's keeping a big lump of memory on the int free
Tomlist in order to make future allocations of large numbers of
Tomintegers faster. If that memory is about to be paged out, then
Tomsurely future allocations of integers will be *slower*, as the
Tomsystem will have to:

Tom1) page out something to make room for the new integers
Tom2) page in the relevant chunk of the int free list
Tom3) zero all of this memory and do any other formatting required by
Tom Python

If your program's behavior is:

* allocate a list of 1e7 ints
* delete that list

how does the Python interpreter know your next bit of execution won't be to
repeat the allocation? In addition, checking to see that an arena in the
free list can be freed is itself not a free operation. From the comments at
the top of intobject.c:

free_list is a singly-linked list of available PyIntObjects, linked
via abuse of their ob_type members.

Each time an int is allocated, the free list is checked to see if it's got a
spare object lying about sloughin off. If so, it is plucked from the list
and reinitialized appropriately. If not, a new block of memory sufficient
to hold about 250 ints is grabbed via a call to malloc, which *might* have
to grab more memory from the OS. Once that block is allocated, it's strung
together into a free list via the above ob_type slot abuse. Then the 250 or
so items are handed out one-by-one as needed and stitched back into the free
list as they are freed.

Now consider how difficult it is to decide if that block of 250 or so
objects is all unused so that we can free() it. We have to walk through the
list and check to see if that chunk is in the free list. That's complicated
by the fact that the ref count fields aren't initialized to zero until a
particular chunk is first used as an allocated int object and would have to
be to support this block free operation (=more cost up front). Still,
assume we can semi-efficiently determine that a particular block is composed
of all freed int-object-sized chunks. We will then unstitch it from the
chain of blocks and call free() to free it. Still, we are left with the
behavior of the operating system's malloc/free implementation. It probably
won't sbrk() the block back to the OS, so after all that work your process
still holds the memory.

Okay, so malloc/free won't work. We could boost the block size up to the
size of a page and use mmap() to map a page into memory. I suspect that
would become still more complicated to implement, and the block size being
probably about eight times larger than the current block size would incur
even more cost to determine if it was full of nothing but freed objects.

TomIf Python freed (most of) the memory when it had finished with it,
Tomthen all the system would have to do is:

That's the rub. Figuring out when it is truly "finished" with the memory.

TomSurely Python should free the memory if it's not been used for a
Tomcertain amount of time (say a few seconds), as allocation times are
Tomnot going to be the limiting factor if it's gone unused for that
Tomlong.

This is generally the point in such discussions where I respond with
something like, "patches cheerfully accepted". ;-) If you're interested in
digging into this, have a look at the free list implementation in
Objects/intobject.c. It might make for a good Google Summer of Code
project:

http://code.google.com/soc/psf/open.html
http://code.google.com/soc/psf/about.html

but I'm not the guy you want mentoring such a project. There are a lot of
people who understand the ins and outs of Python's memory allocation code
much better than I do.

TomI've also tested similar situations on Python under Windows XP, and
Tomit shows the same behaviour, so I think this is a Python and/or
TomGCC/libc issue, rather than an OS issue (assuming Python for linux
Tomand Python for windows are both compiled with GCC).

Sure, my apologies. The malloc/free implementation is strictly speaking not
part of the operating system. I tend to mentally lump them together because
it's uncommon for people to use a malloc/free implementation different than
the one delivered with their computer.

Skip
Mar 21 '07 #10
On Wed, 21 Mar 2007 15:03:17 +0000, Tom Wright wrote:

[snip]
Ah, thanks for explaining that. I'm a little wiser about memory allocation
now, but am still having problems reclaiming memory from unused objects
within Python. If I do the following:
>>>>
(memory use: 15 MB)
>>>a = range(int(4e7))
(memory use: 1256 MB)
>>>a = None
(memory use: 953 MB)

...and then I allocate a lot of memory in another process (eg. open a load
of files in the GIMP), then the computer swaps the Python process out to
disk to free up the necessary space. Python's memory use is still reported
as 953 MB, even though nothing like that amount of space is needed.
Who says it isn't needed? Just because *you* have only one object
existing, doesn't mean the Python environment has only one object existing.

From what you said above, the problem is in the underlying C libraries,
What problem?

Nothing you've described seems like a problem to me. It sounds like a
modern, 21st century operating system and programming language working
like they should. Why do you think this is a problem?

You've described an extremely artificial set of circumstances: you create
40,000,000 distinct integers, then immediately destroy them. The obvious
solution to that "problem" of Python caching millions of integers you
don't need is not to create them in the first place.

In real code, the chances are that if you created 4e7 distinct integers
you'll probably need them again -- hence the cache. So what's your actual
problem that you are trying to solve?

but is there anything I can do to get that memory back without closing
Python?
Why do you want to manage memory yourself anyway? It seems like a
horrible, horrible waste to use a language designed to manage memory for
you, then insist on over-riding it's memory management.

I'm not saying that there is never any good reason for fine control of the
Python environment, but this doesn't look like one to me.
--
Steven.

Mar 21 '07 #11
On Wed, 21 Mar 2007 15:32:17 +0000, Tom Wright wrote:
>Memory contention would be a problem if your Python process wanted to keep
that memory active at the same time as you were running GIMP.

True, but why does Python hang on to the memory at all? As I understand it,
it's keeping a big lump of memory on the int free list in order to make
future allocations of large numbers of integers faster. If that memory is
about to be paged out, then surely future allocations of integers will be
*slower*, as the system will have to:

1) page out something to make room for the new integers
2) page in the relevant chunk of the int free list
3) zero all of this memory and do any other formatting required by Python

If Python freed (most of) the memory when it had finished with it, then all
the system would have to do is:

1) page out something to make room for the new integers
2) zero all of this memory and do any other formatting required by Python

Surely Python should free the memory if it's not been used for a certain
amount of time (say a few seconds), as allocation times are not going to be
the limiting factor if it's gone unused for that long. Alternatively, it
could mark the memory as some sort of cache, so that if it needed to be
paged out, it would instead be de-allocated (thus saving the time taken to
page it back in again when it's next needed)
And increasing the time it takes to re-create the objects in the cache
subsequently.

Maybe this extra effort is worthwhile when the free int list holds 10**7
ints, but is it worthwhile when it holds 10**6 ints? How about 10**5 ints?
10**3 ints?

How many free ints is "typical" or even "common" in practice?

The lesson I get from this is, instead of creating such an enormous list
of integers in the first place with range(), use xrange() instead.

Fresh running instance of Python 2.5:

$ ps up 9579
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
steve 9579 0.0 0.2 6500 2752 pts/7 S+ 03:42 0:00 python2.5
Run from within Python:
>>n = 0
for i in xrange(int(1e7)):
.... # create lots of ints, one at a time
.... # instead of all at once
.... n += i # make sure the int is used
....
>>n
49999995000000L
And the output of ps again:

$ ps up 9579
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
steve 9579 4.2 0.2 6500 2852 pts/7 S+ 03:42 0:11 python2.5

Barely moved a smidgen.

For comparison, here's what ps reports after I create a single list with
range(int(1e7)), and again after I delete the list:

$ ps up 9579 # after creating list with range(int(1e7))
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
steve 9579 1.9 15.4 163708 160056 pts/7 S+ 03:42 0:11 python2.5

$ ps up 9579 # after deleting list
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
steve 9579 1.7 11.6 124632 120992 pts/7 S+ 03:42 0:12 python2.5
So there is another clear advantage to using xrange instead of range,
unless you specifically need all ten million ints all at once.

--
Steven.

Mar 21 '07 #12
Steven D'Aprano wrote:
You've described an extremely artificial set of circumstances: you create
40,000,000 distinct integers, then immediately destroy them. The obvious
solution to that "problem" of Python caching millions of integers you
don't need is not to create them in the first place.
I know it's a very artificial setup - I was trying to make the situation
simple to demonstrate in a few lines. The point was that it's not caching
the values of those integers, as they can never be read again through the
Python interface. It's just holding onto the space they occupy in case
it's needed again.
So what's your actual problem that you are trying to solve?
I have a program which reads a few thousand text files, converts each to a
list (with readlines()), creates a short summary of the contents of each (a
few floating point numbers) and stores this summary in a master list. From
the amount of memory it's using, I think that the lists containing the
contents of each file are kept in memory, even after there are no
references to them. Also, if I tell it to discard the master list and
re-read all the files, the memory use nearly doubles so I presume it's
keeping the lot in memory.

The program may run through several collections of files, but it only keeps
a reference to the master list of the most recent collection it's looked
at. Obviously, it's not ideal if all the old collections hang around too,
taking up space and causing the machine to swap.
>but is there anything I can do to get that memory back without closing
Python?

Why do you want to manage memory yourself anyway? It seems like a
horrible, horrible waste to use a language designed to manage memory for
you, then insist on over-riding it's memory management.
I agree. I don't want to manage it myself. I just want it to re-use memory
or hand it back to the OS if it's got an awful lot that it's not using.
Wouldn't you say it was wasteful if (say) an image editor kept an
uncompressed copy of an image around in memory after the image had been
closed?

--
I'm at CAMbridge, not SPAMbridge
Mar 21 '07 #13
Steve Holden wrote:
Easy to say. How do you know the memory that's not in use is in a
contiguous block suitable for return to the operating system? I can
pretty much guarantee it won't be. CPython doesn't use a relocating
garbage collection scheme
Fair point. That is difficult and I don't see a practical solution to it
(besides substituting a relocating garbage collector, which seems like a
major undertaking).
Right. So all we have to do is identify those portions of memory that
will never be read again and return them to the OS. That should be easy.
Not.
Well, you have this nice int free list which points to all the bits which
will never be read again (they might be written to, but if you're writing
without reading then it doesn't really matter where you do it). The point
about contiguous chunks still applies though.
--
I'm at CAMbridge, not SPAMbridge
Mar 21 '07 #14
sk**@pobox.com wrote:
If your program's behavior is:

* allocate a list of 1e7 ints
* delete that list

how does the Python interpreter know your next bit of execution won't be
to repeat the allocation?
It doesn't know, but if the program runs for a while without repeating it,
it's a fair bet that it won't mind waiting the next time it does a big
allocation. How long 'a while' is would obviously be open to debate.
In addition, checking to see that an arena in
the free list can be freed is itself not a free operation.
(snip thorough explanation)
Yes, that's a good point. It looks like the list is designed for speedy
re-use of the memory it points to, which seems like a good choice. I quite
agree that it should hang on to *some* memory, and perhaps my artificial
situation has shown this as a problem when it wouldn't cause any issues for
real programs. I can't help thinking that there are some situations where
you need a lot of memory for a short time though, and it would be nice to
be able to use it briefly and then hand most of it back. Still, I see the
practical difficulties with doing this.

--
I'm at CAMbridge, not SPAMbridge
Mar 21 '07 #15
Tom Wright wrote:
Steven D'Aprano wrote:
>You've described an extremely artificial set of circumstances: you create
40,000,000 distinct integers, then immediately destroy them. The obvious
solution to that "problem" of Python caching millions of integers you
don't need is not to create them in the first place.

I know it's a very artificial setup - I was trying to make the situation
simple to demonstrate in a few lines. The point was that it's not caching
the values of those integers, as they can never be read again through the
Python interface. It's just holding onto the space they occupy in case
it's needed again.
>So what's your actual problem that you are trying to solve?

I have a program which reads a few thousand text files, converts each to a
list (with readlines()), creates a short summary of the contents of each (a
few floating point numbers) and stores this summary in a master list. From
the amount of memory it's using, I think that the lists containing the
contents of each file are kept in memory, even after there are no
references to them. Also, if I tell it to discard the master list and
re-read all the files, the memory use nearly doubles so I presume it's
keeping the lot in memory.
I'd like to bet you are keeping references to them without realizing it.
The interpreter won't generally allocate memory that it can get by
garbage collection, and reference counting pretty much eliminates the
need for garbage collection anyway except when you create cyclic data
structures.
The program may run through several collections of files, but it only keeps
a reference to the master list of the most recent collection it's looked
at. Obviously, it's not ideal if all the old collections hang around too,
taking up space and causing the machine to swap.
We may need to see code here for you to convince us of the correctness
of your hypothesis. It sounds pretty screwy to me.
>>but is there anything I can do to get that memory back without closing
Python?
Why do you want to manage memory yourself anyway? It seems like a
horrible, horrible waste to use a language designed to manage memory for
you, then insist on over-riding it's memory management.

I agree. I don't want to manage it myself. I just want it to re-use memory
or hand it back to the OS if it's got an awful lot that it's not using.
Wouldn't you say it was wasteful if (say) an image editor kept an
uncompressed copy of an image around in memory after the image had been
closed?
Yes, but I'd say it was the programmer's fault if it turned out that the
interpreter wasn't doing anything wrong ;-) It could be something inside
an exception handler that is keeping a reference to a stack frame or
something silly like that.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com

Mar 21 '07 #16
On Wed, 21 Mar 2007 17:19:23 +0000, Tom Wright wrote:
>So what's your actual problem that you are trying to solve?

I have a program which reads a few thousand text files, converts each to a
list (with readlines()), creates a short summary of the contents of each (a
few floating point numbers) and stores this summary in a master list. From
the amount of memory it's using, I think that the lists containing the
contents of each file are kept in memory, even after there are no
references to them. Also, if I tell it to discard the master list and
re-read all the files, the memory use nearly doubles so I presume it's
keeping the lot in memory.
Ah, now we're getting somewhere!

Python's caching behaviour with strings is almost certainly going to be
different to its caching behaviour with ints. (For example, Python caches
short strings that look like identifiers, but I don't believe it caches
great blocks of text or short strings which include whitespace.)

But again, you haven't really described a problem, just a set of
circumstances. Yes, the memory usage doubles. *Is* that a problem in
practice? A few thousand 1KB files is one thing; a few thousand 1MB files
is an entirely different story.

Is the most cost-effective solution to the problem to buy another 512MB of
RAM? I don't say that it is. I just point out that you haven't given us
any reason to think it isn't.

The program may run through several collections of files, but it only keeps
a reference to the master list of the most recent collection it's looked
at. Obviously, it's not ideal if all the old collections hang around too,
taking up space and causing the machine to swap.
Without knowing exactly what your doing with the data, it's hard to tell
where the memory is going. I suppose if you are storing huge lists of
millions of short strings (words?), they might all be cached. Is there a
way you can avoid storing the hypothetical word-lists in RAM, perhaps by
writing them straight out to a disk file? That *might* make a
difference to the caching algorithm used.

Or you could just have an "object leak" somewhere. Do you have any
complicated circular references that the garbage collector can't resolve?
Lists-of-lists? Trees? Anything where objects aren't being freed when you
think they are? Are you holding on to references to lists? It's more
likely that your code simply isn't freeing lists you think are being freed
than it is that Python is holding on to tens of megabytes of random text.

--
Steven.

Mar 21 '07 #17
Steven D'Aprano <st***@REMOVE.THIS.cybersource.com.auwrote:
Or you could just have an "object leak" somewhere. Do you have any
complicated circular references that the garbage collector can't resolve?
Lists-of-lists? Trees? Anything where objects aren't being freed when you
think they are? Are you holding on to references to lists? It's more
likely that your code simply isn't freeing lists you think are being freed
than it is that Python is holding on to tens of megabytes of random
text.
This is surely just the fragmented heap problem.

It is a hard problem returning unused memory to the OS since it
usually comes in page size (4k) chunks and you can only return pages
on the end of your memory (the sbrk() interface).

The glibc allocator uses mmap() for large allocations which *can* be
returned to the OS without any fragmentation worries.

However if you have lots of small allocations then the heap will be
fragmented and you'll never be able to return the memory to the OS.

However that is why we have virtual memory systems.

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Mar 21 '07 #18
In article <sl*****************@irishsea.home.craig-wood.com>,
Nick Craig-Wood <ni**@craig-wood.comwrote:
>Steven D'Aprano <st***@REMOVE.THIS.cybersource.com.auwrote:
>>
Or you could just have an "object leak" somewhere. Do you have any
complicated circular references that the garbage collector can't resolve?
Lists-of-lists? Trees? Anything where objects aren't being freed when you
think they are? Are you holding on to references to lists? It's more
likely that your code simply isn't freeing lists you think are being freed
than it is that Python is holding on to tens of megabytes of random
text.

This is surely just the fragmented heap problem.
Possibly. I believe PyMalloc doesn't have as much a problem in this
area, but off-hand I don't remember the extent to which strings use
PyMalloc. Nevertheless, my bet is on holding references as the problem
with doubled memory use.
--
Aahz (aa**@pythoncraft.com) <* http://www.pythoncraft.com/

"Typing is cheap. Thinking is expensive." --Roy Smith
Mar 21 '07 #19
In article <yP*****************@newsread4.news.pas.earthlink. net>,
Dennis Lee Bieber <wl*****@ix.netcom.comwrote:
>On Wed, 21 Mar 2007 15:32:17 +0000, Tom Wright <te***@spam.ac.uk>
declaimed the following in comp.lang.python:
>>
True, but why does Python hang on to the memory at all? As I understand it,
it's keeping a big lump of memory on the int free list in order to make
future allocations of large numbers of integers faster. If that memory is
about to be paged out, then surely future allocations of integers will be
*slower*, as the system will have to:
It may not just be that free list -- which on a machine with lots of
RAM may never be paged out anyway [mine (XP) currently shows: physical
memory total/available/system: 2095196/1355296/156900K, commit charge
total/limit/peak: 514940/3509272/697996K (limit includes page/swap file
of 1.5GB)] -- it could easily just be that the OS or runtime just
doesn't return memory to the OS until a process/executable image exits.
Mar 22 '07 #20
Tom Wright <te***@spam.ac.ukwrote:
real programs. I can't help thinking that there are some situations where
you need a lot of memory for a short time though, and it would be nice to
be able to use it briefly and then hand most of it back. Still, I see the
practical difficulties with doing this.
What I do in those cases:
a. fork
b. do the memory-hogging work in the child process
c. meanwhile the parent just waits
d. the child sends back to the parent the small results
e. the child terminates
f. the parent proceeds merrily

I learned this architectural-pattern a long, long time ago, around the
time when fork first got implemented via copy-on-write pages...
Alex
Mar 22 '07 #21
Alex Martelli wrote:
Tom Wright <te***@spam.ac.ukwrote:
>real programs. I can't help thinking that there are some situations where
you need a lot of memory for a short time though, and it would be nice to
be able to use it briefly and then hand most of it back. Still, I see the
practical difficulties with doing this.

What I do in those cases:
a. fork
b. do the memory-hogging work in the child process
c. meanwhile the parent just waits
d. the child sends back to the parent the small results
e. the child terminates
f. the parent proceeds merrily

I learned this architectural-pattern a long, long time ago, around the
time when fork first got implemented via copy-on-write pages...
Yup, it's easier to be pragmatic and find the real solution to your
problem than it is to try and mould reality to your idea of what the
solution should be ...

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com

Mar 22 '07 #22
Steve Holden <st***@holdenweb.comwrote:
...
a. fork
b. do the memory-hogging work in the child process
c. meanwhile the parent just waits
d. the child sends back to the parent the small results
e. the child terminates
f. the parent proceeds merrily

I learned this architectural-pattern a long, long time ago, around the
time when fork first got implemented via copy-on-write pages...
Yup, it's easier to be pragmatic and find the real solution to your
problem than it is to try and mould reality to your idea of what the
solution should be ...
"That's why all progress is due to the unreasonable man", hm?-)
Alex
Mar 22 '07 #23

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Bob | last post by:
Are there any known applications out there used to test the performance of the .NET garbage collector over a long period of time? Basically I need an application that creates objects, uses them, and...
6
by: Ganesh | last post by:
Is there a utility by microsoft (or anyone) to force garbage collection in a process without have access to the process code. regards Ganesh
11
by: Rick | last post by:
Hi, My question is.. if Lisp, a 40 year old language supports garbage collection, why didn't the authors of C++ choose garbage collection for this language? Are there fundamental reasons behind...
34
by: Ville Voipio | last post by:
I would need to make some high-reliability software running on Linux in an embedded system. Performance (or lack of it) is not an issue, reliability is. The piece of software is rather simple,...
5
by: Bob lazarchik | last post by:
Hello: We are considering developing a time critical system in C#. Our tool used in Semiconductor production and we need to be able to take meaurements at precise 10.0 ms intervals( 1000...
8
by: mike2036 | last post by:
For some reason it appears that garbage collection is releasing an object that I'm still using. The object is declared in a module and instantiated within a class that is in turn instantiated by...
28
by: Goalie_Ca | last post by:
I have been reading (or at least googling) about the potential addition of optional garbage collection to C++0x. There are numerous myths and whatnot with very little detailed information. Will...
56
by: Johnny E. Jensen | last post by:
Hellow I'am not sure what to think about the Garbage Collector. I have a Class OutlookObject, It have two private variables. Private Microsoft.Office.Interop.Outlook.Application _Application =...
350
by: Lloyd Bonafide | last post by:
I followed a link to James Kanze's web site in another thread and was surprised to read this comment by a link to a GC: "I can't imagine writing C++ without it" How many of you c.l.c++'ers use...
158
by: pushpakulkar | last post by:
Hi all, Is garbage collection possible in C++. It doesn't come as part of language support. Is there any specific reason for the same due to the way the language is designed. Or it is...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.