473,473 Members | 1,549 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Python Memory Manager

I've been looking at the Python source code recently, more
specifically trying to figure out how it's garbage collector works.

I've gathered that it uses refcounting as well as some cycle-detection
algorithms, but I haven't been able to figure out some other things.

Does Python actually have a single 'heap' where all the data is
stored? Because PyObject_HEAD seemed to imply to me it was just a
linked list of objects, although perhaps I didnt understand this
correctly.

Also, if it does, how does it deal with memory segmentation? This
question bothers me because I've been trying to implement a moving
garbage collector, and am not sure how to deal with updating all
program pointers to objects on the heap, and thought perhaps an answer
to this question would give me some ideas. (Also, if you know any
resources for things like this, I'd be grateful for links/names)

Thanks in advance,

Pie Squared
Feb 17 '08 #1
15 1447
Pie Squared <Pi********@gmail.comwrites:
Also, if it does, how does it deal with memory segmentation? This
question bothers me because I've been trying to implement a moving
garbage collector, and am not sure how to deal with updating all
program pointers to objects on the heap, and thought perhaps an answer
to this question would give me some ideas.
As I understand it, Python primarily uses reference counting, with a
mark and sweep scheme for cycle breaking tacked on as an afterthought.
It doesn't move objects in memory during GC so you can get
fragmentation. It's probably not feasible to change this in CPython
without extensive rewriting of CPython and maybe a lot of external C
modules.
(Also, if you know any
resources for things like this, I'd be grateful for links/names)
If you mean about GC in general, the old book by Jones and Lins is
still standard, I think.
Feb 17 '08 #2
On Feb 17, 1:57*pm, Paul Rubin <http://phr...@NOSPAM.invalidwrote:
Pie Squared <PieSqua...@gmail.comwrites:
Also, if it does, how does it deal with memory segmentation? This
question bothers me because I've been trying to implement a moving
garbage collector, and am not sure how to deal with updating all
program pointers to objects on the heap, and thought perhaps an answer
to this question would give me some ideas.

As I understand it, Python primarily uses reference counting, with a
mark and sweep scheme for cycle breaking tacked on as an afterthought.
It doesn't move objects in memory during GC so you can get
fragmentation. It's probably not feasible to change this in CPython
without extensive rewriting of CPython and maybe a lot of external C
modules.
(Also, if you know any
resources for things like this, I'd be grateful for links/names)

If you mean about GC in general, the old book by Jones and Lins is
still standard, I think.
Thanks for the quick reply!

That answered my question, and I'll check out the book you're
referring to - it's exactly what I need, I think. Again, thanks!
Feb 17 '08 #3
Pie Squared wrote:
I've been looking at the Python source code recently, more
specifically trying to figure out how it's garbage collector works.

I've gathered that it uses refcounting as well as some cycle-detection
algorithms, but I haven't been able to figure out some other things.
Python uses ref counting for all objects and an additional GC for
container objects like lists and dicts. The cyclic GC depends on ref
counting.
Does Python actually have a single 'heap' where all the data is
stored? Because PyObject_HEAD seemed to imply to me it was just a
linked list of objects, although perhaps I didnt understand this
correctly.
In release builds PyObject_HEAD only contains the ref count and a link
to the object type. In Py_DEBUG builds it also contains a double linked
list of all allocated objects to debug reference counting bugs.

Python uses its own optimized memory allocator. Be sure you have read
Object/obmalloc.c!
Also, if it does, how does it deal with memory segmentation? This
question bothers me because I've been trying to implement a moving
garbage collector, and am not sure how to deal with updating all
program pointers to objects on the heap, and thought perhaps an answer
to this question would give me some ideas. (Also, if you know any
resources for things like this, I'd be grateful for links/names)
I don't think it's possible to implement a moving GC. You'd have to
chance some fundamental parts of Python.

Christian

Feb 17 '08 #4
On Feb 17, 10:01 pm, Pie Squared <PieSqua...@gmail.comwrote:
[...]
It seems to me that another, perhaps better strategy, would be to
allocate a large heap space, then store a pointer to the base of the
heap, the current heap size, and the beginning of the free memory.
When you need to 'allocate' more room, just return a pointer to some
location in the heap and increment the start-of-free-memory pointer.
That way, allocation really IS free, more or less. Wouldn't that be
more efficient? Perhaps I'm missing something.
Deallocation?
As a side note, I'm new to Usenet, so I'm not exactly sure... are
'tangents' like this - since this IS a Python newsgroup, after all -
okay?
It varies depending on the group, c.l.py is pretty tolerant as long as
it's interesting ;-)

To bring it back to Python, I was under the impression that the GC was
a generational collector and not a simple mark-sweep, but it's been a
while since I read about it...

-- bjorn

Feb 17 '08 #5
>Also, if it does, how does it deal with memory segmentation? This
>question bothers me because I've been trying to implement a moving
garbage collector, and am not sure how to deal with updating all
program pointers to objects on the heap, and thought perhaps an answer
to this question would give me some ideas.

As I understand it, Python primarily uses reference counting, with a
mark and sweep scheme for cycle breaking tacked on as an afterthought.
That's not exactly true, i.e. it isn't mark-and-sweep, but some similar
scheme that allows incremental collection without write barriers. This
particular scheme heavily relies on refcounting itself (specifically,
an object is garbage in a certain generation when all references to
it come from the same generation).

As for the consequences of the scheme (i.e. no compaction), you are
right.

Regards,
Martin
Feb 17 '08 #6
"Martin v. Löwis" <ma****@v.loewis.dewrites:
That's not exactly true, i.e. it isn't mark-and-sweep, but some similar
scheme that allows incremental collection without write barriers. This
particular scheme heavily relies on refcounting itself (specifically,
an object is garbage in a certain generation when all references to
it come from the same generation).
Ah, thanks. I made another post which I guess is also somewhat wrong.
Feb 17 '08 #7
Christian Heimes wrote:
In release builds PyObject_HEAD only contains the ref count and a link
to the object type. In Py_DEBUG builds it also contains a double linked
list of all allocated objects to debug reference counting bugs.
There's also a doubly-linked list used by the cycle detector,
but it doesn't hold all objects, only those that need to
participate in cyclic GC. Objects such as numbers and strings,
which don't contain references to other objects, don't need
to be on this list, since they can never be part of a cycle.

Some other immutable types such as tuples also don't need to
be on it, even though they contain references, because it's
impossible to create a cycle consisting entirely of such
objects. There has to be at least one mutable object in the
cycle, and the GC will be able to find the cycle via that
object.

--
Greg
Feb 18 '08 #8


Paul Rubin wrote:
The problem here is with a high allocation rate, you have to GC a lot
more often, which typically involves copying live data.
This is last century's issue. Copying data, RAM to RAM, is nearly free
using the Intel architecture.

This short article, http://www.martinrinehart.com/articles/repz.html
explains why.

I'd use one int per clock as a rule of thumb for the current copy rate
in a single-core CPU.
Feb 18 '08 #9
Quoting Steve Holden <st***@holdenweb.com>:
[...]
Not only that, but all pointers to an object have to be updated when it
is relocated.
"Any problem in computer science can be solved by another level of indirection"
-- David John Wheeler

;-)

RB
Feb 18 '08 #10
Ma************@gmail.com wrote:
>
Paul Rubin wrote:
>The problem here is with a high allocation rate, you have to GC a lot
more often, which typically involves copying live data.

This is last century's issue. Copying data, RAM to RAM, is nearly free
using the Intel architecture.
What's "the Intel architecture?" Do you mean the x86_64 architecture
that was actually developed by AMD, or x86 for x some number, or do
you actually mean IA64?
>
This short article, http://www.martinrinehart.com/articles/repz.html
explains why.

I'd use one int per clock as a rule of thumb for the current copy rate
in a single-core CPU.
Feb 18 '08 #11


Jeff Schwab wrote:
What's "the Intel architecture?" Do you mean the x86_64 architecture
that was actually developed by AMD, or x86 for x some number, or do
you actually mean IA64?
I mean chips of the family that goes back to the 8086 and 8088 chips,
chips that support the REPZ prefix to the MOVSW instruction.
Feb 20 '08 #12
Ma************@gmail.com writes:
I mean chips of the family that goes back to the 8086 and 8088 chips,
chips that support the REPZ prefix to the MOVSW instruction.
repz movsw is a pretty lame way to copy data on current x86's.
Use XMM instead.
Feb 20 '08 #13


Steve Holden wrote:
You have a strange idea of "nearly free" ...

Extending an integer array from 100 to 150 items is a pretty puny
operation when you compare it with the amount of data that might need to
be moved during a compactifying garbage collection of a 20MB Python
program image.
20 MBs = 5 M 32-bit words = 1.25 millis to move half of them on a
2GHz
machine. Don't know how much a milli costs where you live.
Feb 20 '08 #14


Paul Rubin wrote:
repz movsw is a pretty lame way to copy data on current x86's.
Use XMM instead.
Thank you, Paul. I'm pretty sure you meant MMX, Multi-Media
eXtensions.

Paul's just told me to upgrade my 32-bit thinking to use newer 64-bit
registers, even on a 32-bit cpu. Please divide my prior timings by
two.
Pentium or later required.

Feb 20 '08 #15
<Ma************@gmail.comwrote:
>20 MBs = 5 M 32-bit words = 1.25 millis to move half of them on a
2GHz machine. Don't know how much a milli costs where you live.
A 2GHz machine doesn't have 20Mb of 2GHz memory. You made the mistake
of measuring the speed of processor's cache, rather than the RAM.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rr****@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //
Feb 20 '08 #16

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: ulysses | last post by:
hi, I'm working in python 5 months. I think it's very cool language. I do a p2p python program GUI. First I make a software by wxpython. But I find wxpython use many many memory. Second I use...
15
by: Qopit | last post by:
I'm setting up a system that consists of several small python applications that all communicate amongst each other on the same pc. When running in Windows, launching each application generates a...
3
by: test.07 | last post by:
I am wondering what happens to a thread in python in relation to win32com extensions. If I create a new thread, that uses the Dispatch method from win32com, what happens to the memory allocated...
5
by: vishnu | last post by:
Hi there, I am embedding python 2.5 on embedded system running on RTOS where I had strict memory constraints. As python is a huge malloc intensive application, I observed huge memory...
206
by: WaterWalk | last post by:
I've just read an article "Building Robust System" by Gerald Jay Sussman. The article is here: http://swiss.csail.mit.edu/classes/symbolic/spring07/readings/robust-systems.pdf In it there is a...
0
by: Steve Holden | last post by:
Hank @ITGroup wrote: It doesn't really make that much sense to watch memory usage as you have been doing. Your first test case appears to trigger a specific pathology, where the memory allocator...
0
by: bieffe62 | last post by:
On 21 Ott, 17:19, Rolf Wester <rolf.wes...@ilt.fraunhofer.dewrote: To be sure that the deallocated memory is not cached at some level to be reused, you could try someting like this: while...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.