Garbage Collection in C

jacob navia

Abstract
--------
Garbage collection is a method of managing memory by using a "collector"
library. Periodically, or triggered by an allocation request, the
collector looks for unused memory chunks and recycles them.
This memory allocation strategy has been adapted to C (and C++) by the
library written by Hans J Boehm and Alan J Demers.

Why a Garbage Collector?
-----------------------
Standard C knows only the malloc/calloc/free functions. The programmer
must manage each block of memory it allocates, never forgetting to call
the standard function free() for each block. Any error is immediately
fatal, but helas, not with immediate consequences. Many errors like
freeing a block twice (or more) or forgetting to free an allocated
block will be discovered much later (if at all). This type of bugs are
very difficult to find and a whole industry of software packages
exists just to find this type of bugs.

The garbage collector presents a viable alternative to the traditional
malloc/free "manual" allocation strategies. The allocator of Boehm
tries to find unused memory when either an allocation request is
done, or when explicitely invoked by the programmer.

The main advantage of a garbage collector is that the programmer is
freed from the responsability of allocating/deallocating memory. The
programmer requests memory to the GC, and then the rest is *automatic*.
Limitations of the GC.
---------------------
The GC needs to see all pointers in a program. Since it scans
periodically memory, it will assume that any block in its block list is
free to reuse when it can't find any pointers to it. This means that the
programmer can't store pointers in the disk, or in the "windows extra
bytes", as it was customary to do under older windows versions, or
elsewhere.

This is actually not a limitation since most programs do not write
pointers to disk, and expect them to be valid later...
Obviously, there is an infinite way to hide pointers (by XORing them
with some constant for instance) to hide them from the collector.

This is of no practical significance. Pointers aren't XORed in normal
programs, and if you stay within the normal alignment requirements
of the processor, everything works without any problems.

Performance considerations
--------------------------
In modern workstations, the time needed to make a complete sweep in
mid-size projects is very small, measured in some milliseconds. In
programs that are not real time the GC time is completely undetectable.
I have used Boehm's GC in the IDE of lcc-win32, specially in the
debugger. Each string I show in the "automatic" window is allocated
using the GC. In slow machines you can sometimes see a pause of
less than a second, completely undetectable unless you know that is
there and try to find it.

It must be said too that the malloc/free system is slow too, since at
each allocation request malloc must go through the list of free blocks
trying to find a free one. Memory must be consolidated too, to avoid
fragmentation, and a malloc call can become very expensive, depending
on the implementation and the allocation pattern done by the program.
Portability
-----------
Boehm's GC runs under most standard PC and UNIX/Linux platforms. The
collector should work on Linux, *BSD, recent Windows versions, MacOS X,
HP/UX, Solaris, Tru64, Irix and a few other operating systems. Some
ports are more polished than others. There are instructions for porting
the collector to a new platform. Kenjiro Taura, Toshio Endo, and Akinori
Yonezawa have made available a parallel collector.

Conclusions
-----------
The GC is a good alternative to traditional allocation strategies for C
(and C++). The main weakness of the malloc/free system is that it
doesn't scale. It is impossible to be good at doing a mind numbing task
without any error 100% of the time. You can be good at it, you can be
bad at it, but you can NEVER be perfect. It is human nature.

The GC frees you from those problems, and allows you to conecntrate in
the problems that really matter, and where you can show your strength
as software designer. It frees you from the boring task of keeping track
of each memory block you allocate.

jacob

Oct 11 '06

Subscribe Post Reply

142

6664

Richard Bos

jacob navia <ja***@jacob.remcomp.frwrote:

Paul Connolly wrote:
and there are no relatively long (compared to
free()) pauses while garbage collection goes on?

There are pauses, but in modern workstations this is barely noticeable.

Famous last words. Never written a server program? Little bits add up.

wow that's phantastic and incredible.

Write in assembly. That's a *real* language :-)

Wrong. Real programmers _do_ eat quiche. But they bake their own,
because you can't trust fastfood cooks not to give you one of those
sissy quiches Lorraine, or a thing with all fluff and no real meat.

Richard

Oct 20 '06 #101

William Hughes

jacob navia wrote:

There are pauses, but in modern workstations this is barely noticeable.

I do not understand. I thought that GC had to scan the entire
amount of memory used by the program. I regularly have programs
that use over a gibabyte of memory. How is this
scanned without a noticable pause?

- William Hughes

Oct 20 '06 #102

jacob navia

William Hughes wrote:

jacob navia wrote:

>>There are pauses, but in modern workstations this is barely noticeable.

I do not understand. I thought that GC had to scan the entire
amount of memory used by the program. I regularly have programs
that use over a gibabyte of memory. How is this
scanned without a noticable pause?

- William Hughes

1) When you allocate data structures that have no pointers in it,
just data, you tell the GC that there are no pointers in it and that
it doesn't need to look for pointers in those areas. This is an enormous
improvement.

2)
You use 1GB, but if you allocate your data structures correctly,
(see above) the GC doesn't need to scan all of that. Scanning the
stack and following the roots in there, and scanning the data section
is not very time consuming. A large heap (1GB!)
can cause problems but you should be able to optimize away most of it.

Oct 20 '06 #103

William Hughes

jacob navia wrote:

William Hughes wrote:
jacob navia wrote:

>There are pauses, but in modern workstations this is barely noticeable.

I do not understand. I thought that GC had to scan the entire
amount of memory used by the program. I regularly have programs
that use over a gibabyte of memory. How is this
scanned without a noticable pause?

- William Hughes

1) When you allocate data structures that have no pointers in it,
just data, you tell the GC that there are no pointers in it and that
it doesn't need to look for pointers in those areas. This is an enormous
improvement.

But I am not allocating the memory. The memory is being allocated
by a library that I do not control. Even if I could determine where
the memory
was being allocated, I don't know that is contains only data
(knowing that it contains mostly data is not good enough).

- William Hughes

Oct 20 '06 #104

SM Ryan

Jean-Marc Bourguet <jm@bourguet.orgwrote:

# Destructors do more than releasing memory. And finalisation is not a
# good substitute for these tasks as finalisation is not synchronous.

If you always know exactly when your destructors are being
called, they're not really destructors--they're methods you
haven't named. Destructors were intended to solve the problem
if collecting anonymous temporaries and the bookkeeping of
collecting local variables and class variables. You know the
destructors will be called before the stack or heap is
overwritten, but the whole point of destructors was you
don't need to know the exact time or place they are called.

If I need to guarentee a program action synchronise to some
external concern, I would have a explicit method to do so
and call it explicitly.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
I ASSURE YOU WE'RE OPEN!

Oct 20 '06 #105

SM Ryan

# I do not understand. I thought that GC had to scan the entire
# amount of memory used by the program. I regularly have programs
# that use over a gibabyte of memory. How is this
# scanned without a noticable pause?

Then don't use it.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
You face forward, or you face the possibility of shock and damage.

Oct 20 '06 #106

jacob navia

William Hughes wrote:

jacob navia wrote:

>>William Hughes wrote:

>>>jacob navia wrote:

There are pauses, but in modern workstations this is barely noticeable.

I do not understand. I thought that GC had to scan the entire
amount of memory used by the program. I regularly have programs
that use over a gibabyte of memory. How is this
scanned without a noticable pause?

- William Hughes

1) When you allocate data structures that have no pointers in it,
just data, you tell the GC that there are no pointers in it and that
it doesn't need to look for pointers in those areas. This is an enormous
improvement.

But I am not allocating the memory. The memory is being allocated
by a library that I do not control. Even if I could determine where
the memory
was being allocated, I don't know that is contains only data
(knowing that it contains mostly data is not good enough).

- William Hughes

If you are not allocating memory, then you are not deallocating it
either. Then just use the library and be happy...

jacob

Oct 20 '06 #107

Laurent Deniau

SM Ryan wrote:

Jean-Marc Bourguet <jm@bourguet.orgwrote:

# Destructors do more than releasing memory. And finalisation is not a
# good substitute for these tasks as finalisation is not synchronous.

If you always know exactly when your destructors are being
called, they're not really destructors--they're methods you
haven't named. Destructors were intended to solve the problem
if collecting anonymous temporaries and the bookkeeping of
collecting local variables and class variables. You know the
destructors will be called before the stack or heap is
overwritten, but the whole point of destructors was you
don't need to know the exact time or place they are called.

If I need to guarentee a program action synchronise to some
external concern, I would have a explicit method to do so
and call it explicitly.

Then you are not familliar with RAII (Ressource Acquisition Is
Initialization):

{ // code to synchronize
Lock lock();

// user code

} // destructor is called and unlock lock.

This has nothing to do with memory leaks but with potential deadlocks if
an exception is thrown in the user code.

a+, ld.

Oct 20 '06 #108

William Hughes

jacob navia wrote:

William Hughes wrote:
jacob navia wrote:

>William Hughes wrote:

jacob navia wrote:

There are pauses, but in modern workstations this is barely noticeable.

I do not understand. I thought that GC had to scan the entire
amount of memory used by the program. I regularly have programs
that use over a gibabyte of memory. How is this
scanned without a noticable pause?

- William Hughes
1) When you allocate data structures that have no pointers in it,
just data, you tell the GC that there are no pointers in it and that
it doesn't need to look for pointers in those areas. This is an enormous
improvement.

But I am not allocating the memory. The memory is being allocated
by a library that I do not control. Even if I could determine where
the memory
was being allocated, I don't know that is contains only data
(knowing that it contains mostly data is not good enough).

- William Hughes

If you are not allocating memory, then you are not deallocating it
either. Then just use the library and be happy...

But the library requires the existence of a free() function at link
time. If I link to a null-op free() then nobody deallocates
memory in the library unless the GC does, in which case
the GC has to scan the whole heap.

O.K. Suppose we solve this problem somehow (we link
the library with another malloc/free and tell the GC to
ignore anything done by the library.) I still have the
problem that if I allocate a large piece of memory I cannot
(now or in the future) store a pointer anywhere in it.
Besides being a maintenance nightmare, this means that
I cannot change from storing a data structure that does
not contain pointers to one that does.

- William Hughes

Oct 20 '06 #109

Richard Tobin

In article <11**********************@f16g2000cwb.googlegroups .com>,
William Hughes <wp*******@hotmail.comwrote:

>I do not understand. I thought that GC had to scan the entire
amount of memory used by the program. I regularly have programs
that use over a gibabyte of memory. How is this
scanned without a noticable pause?

There are over 40 years of research on the subject, so it's not
surprising that this problem has been largely overcome.

In particular, many modern garbage collectors avoid scanning the whole
of memory, except perhaps very rarely.

-- Richard

Oct 20 '06 #110

Jean-Marc Bourguet

SM Ryan <wy*****@tango-sierra-oscar-foxtrot-tango.fake.orgwrites:

Jean-Marc Bourguet <jm@bourguet.orgwrote:

# Destructors do more than releasing memory. And finalisation is not a
# good substitute for these tasks as finalisation is not synchronous.

If you always know exactly when your destructors are being
called, they're not really destructors--they're methods you
haven't named. Destructors were intended to solve the problem
if collecting anonymous temporaries and the bookkeeping of
collecting local variables and class variables. You know the
destructors will be called before the stack or heap is
overwritten, but the whole point of destructors was you
don't need to know the exact time or place they are called.

If I need to guarentee a program action synchronise to some
external concern, I would have a explicit method to do so
and call it explicitly.

Destructor were invented to release *ressources* when they are no more
needed. Memory is probably the most common of the ressources freed by
destructors, but it's not the only one. In C++, we rely on destructor to
free other ressources which need to be freed whatever the way the scope is
left (so there is usually no need to put catch clauses to free the
ressources in case of exceptions). Some of these ressources need to be
freed synchronously, a mutex is probably the canonical example of such kind
of ressources. That's so common that there is a name for this idiom: RAII
(Ressource Acquisition Is Initialization).

Another use of destructors is to notify other objects of the
destruction... this use can sometimes be replaced by the use of weak
references in conjunction with a GC, sometimes not.

Yours,

--
Jean-Marc

Oct 20 '06 #111

William Hughes

Richard Tobin wrote:

In article <11**********************@f16g2000cwb.googlegroups .com>,
William Hughes <wp*******@hotmail.comwrote:

I do not understand. I thought that GC had to scan the entire
amount of memory used by the program. I regularly have programs
that use over a gibabyte of memory. How is this
scanned without a noticable pause?

There are over 40 years of research on the subject, so it's not
surprising that this problem has been largely overcome.

In particular, many modern garbage collectors avoid scanning the whole
of memory, except perhaps very rarely.

How can this be done in a C context without providing the GC with
extra information (e.g. I am not going to use this bit of
memory to store pointers)? True, a scan at any given
time could insure that certain areal of memory do
not contain pointers. But, given that a program may well
write frequently to a large area of memory, how does the GC
know that this memory does not "now" contain pointers. Scan
on write might aleviate the problem somewhat, but only somewhat,
given that most or all of the memory may be overwritten.
(And how could one would implement "scan on write" without slowing
the program to a crawl?)

- William Hughes

Oct 20 '06 #112

jacob navia

William Hughes wrote:

Richard Tobin wrote:

>>In article <11**********************@f16g2000cwb.googlegroups .com>,
William Hughes <wp*******@hotmail.comwrote:

>>>I do not understand. I thought that GC had to scan the entire
amount of memory used by the program. I regularly have programs
that use over a gibabyte of memory. How is this
scanned without a noticable pause?

There are over 40 years of research on the subject, so it's not
surprising that this problem has been largely overcome.

In particular, many modern garbage collectors avoid scanning the whole
of memory, except perhaps very rarely.

How can this be done in a C context without providing the GC with
extra information (e.g. I am not going to use this bit of
memory to store pointers)? True, a scan at any given
time could insure that certain areal of memory do
not contain pointers. But, given that a program may well
write frequently to a large area of memory, how does the GC
know that this memory does not "now" contain pointers. Scan
on write might aleviate the problem somewhat, but only somewhat,
given that most or all of the memory may be overwritten.
(And how could one would implement "scan on write" without slowing
the program to a crawl?)

- William Hughes

Boehm's GC is incremental. Each time you make a GC_malloc call, it will
do a bit of garbage collection, so the full pause is less than it
might be.

Yes, it implies that you design your data structures to avoid mixing
data and pointers. This is not very difficult to do.

Oct 20 '06 #113

Keith Thompson

jacob navia <ja***@jacob.remcomp.frwrites:
[...]

Boehm's GC is incremental. Each time you make a GC_malloc call, it will
do a bit of garbage collection, so the full pause is less than it
might be.

Yes, it implies that you design your data structures to avoid mixing
data and pointers. This is not very difficult to do.

I see, so there's *another* restriction (though not an absolute one in
this case) that you haven't mentioned until now.

So if I've declared a structure consisting of a large array of doubles
and a single pointer (say, for a linked list), then I need to re-write
my code to use it with GC?

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Oct 20 '06 #114

Richard Tobin

In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.orgwrote:

>Yes, it implies that you design your data structures to avoid mixing
data and pointers. This is not very difficult to do.

>So if I've declared a structure consisting of a large array of doubles
and a single pointer (say, for a linked list), then I need to re-write
my code to use it with GC?

You certainly don't have to do that to use the Boehm collector.

If you wish to improve the performance of the garbage collector, you
can use various functions instead of the usual malloc(). In particular,
you can use GC_MALLOC_ATOMIC() to allocate space that will definitely
not contain any pointers. Presumably that's what Jacob is referring to.

If you're interested, there's lots of information at

http://www.hpl.hp.com/personal/Hans_Boehm/gc/

-- Richard

Oct 21 '06 #115

William Hughes

Richard Tobin wrote:

In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.orgwrote:

Yes, it implies that you design your data structures to avoid mixing
data and pointers. This is not very difficult to do.

So if I've declared a structure consisting of a large array of doubles
and a single pointer (say, for a linked list), then I need to re-write
my code to use it with GC?

You certainly don't have to do that to use the Boehm collector.

Well, it will work, but it might take a *lot* of time.

If you wish to improve the performance of the garbage collector, you
can use various functions instead of the usual malloc(). In particular,
you can use GC_MALLOC_ATOMIC() to allocate space that will definitely
not contain any pointers.

But the person who has to insure that the space will not contain
any pointers is me. To do this I have to allocate different areas
for pointers and data. To do this I have to avoid mixing data
and pointers in the first place. Then, I have to go through
my code and change the appropriate malloc calls
to GC_MALLOC_ATOMIC(). Sounds like rewriting code
to me.

- William Hughes

Oct 21 '06 #116

Richard Tobin

In article <11**********************@i3g2000cwc.googlegroups. com>,
William Hughes <wp*******@hotmail.comwrote:

>If you wish to improve the performance of the garbage collector, you
can use various functions instead of the usual malloc(). In particular,
you can use GC_MALLOC_ATOMIC() to allocate space that will definitely
not contain any pointers.

>But the person who has to insure that the space will not contain
any pointers is me. To do this I have to allocate different areas
for pointers and data. To do this I have to avoid mixing data
and pointers in the first place. Then, I have to go through
my code and change the appropriate malloc calls
to GC_MALLOC_ATOMIC(). Sounds like rewriting code
to me.

I explained that you don't have to change your code to use the garbage
collector, but you can to make it more efficient. Why do you restate
this as if you're disagreeing with me?

By the way, in most programs you can use GC_MALLOC_ATOMIC() without
changing your data structures. Just use it for the allocations that
would be pointer-free anyway, such as strings and arrays of numbers.

-- Richard

Oct 21 '06 #117

William Hughes

Richard Tobin wrote:

In article <11**********************@i3g2000cwc.googlegroups. com>,
William Hughes <wp*******@hotmail.comwrote:

If you wish to improve the performance of the garbage collector, you
can use various functions instead of the usual malloc(). In particular,
you can use GC_MALLOC_ATOMIC() to allocate space that will definitely
not contain any pointers.

But the person who has to insure that the space will not contain
any pointers is me. To do this I have to allocate different areas
for pointers and data. To do this I have to avoid mixing data
and pointers in the first place. Then, I have to go through
my code and change the appropriate malloc calls
to GC_MALLOC_ATOMIC(). Sounds like rewriting code
to me.

I explained that you don't have to change your code to use the garbage
collector, but you can to make it more efficient. Why do you restate
this as if you're disagreeing with me?

Because although it will "work" without being more efficient, it may
have very long pauses, and may be so slow as to be useless.
You do not have to rewrite code to get it to work, but you
may have to rewrite code to get it to work usefully. (And worse, you
may be unable to rewrite code to get it to work usefully)

>
By the way, in most programs you can use GC_MALLOC_ATOMIC() without
changing your data structures. Just use it for the allocations that
would be pointer-free anyway, such as strings and arrays of numbers.

And if you have large data structures that do contain pointers
you are SOL. Being told that "most programs" do not have this problem
is cold comfort.

And this does not address the problem of memory allocation in
libraries not under your control (no you cannot simply link them
with another malloc/free as you will then have two mallocs working
on the same address space and same physical memory).

- William Hughes

Oct 21 '06 #118

jacob navia

Richard Tobin wrote:

In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.orgwrote:

>>>Yes, it implies that you design your data structures to avoid mixing
data and pointers. This is not very difficult to do.

>>So if I've declared a structure consisting of a large array of doubles
and a single pointer (say, for a linked list), then I need to re-write
my code to use it with GC?

You certainly don't have to do that to use the Boehm collector.

If you wish to improve the performance of the garbage collector, you
can use various functions instead of the usual malloc(). In particular,
you can use GC_MALLOC_ATOMIC() to allocate space that will definitely
not contain any pointers. Presumably that's what Jacob is referring to.

If you're interested, there's lots of information at

http://www.hpl.hp.com/personal/Hans_Boehm/gc/

-- Richard

Well, yes. Mr Hughes was speaking about an application using
1GB RAM. In those cases optimization could be necessary.
GC_MALLOC_ATOMIC() optimizes the GC scan.

Oct 21 '06 #119

jacob navia

William Hughes wrote:

Richard Tobin wrote:

>>In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.orgwrote:

>>>>Yes, it implies that you design your data structures to avoid mixing
data and pointers. This is not very difficult to do.

>>>So if I've declared a structure consisting of a large array of doubles
and a single pointer (say, for a linked list), then I need to re-write
my code to use it with GC?

You certainly don't have to do that to use the Boehm collector.

Well, it will work, but it might take a *lot* of time.

>>If you wish to improve the performance of the garbage collector, you
can use various functions instead of the usual malloc(). In particular,
you can use GC_MALLOC_ATOMIC() to allocate space that will definitely
not contain any pointers.

But the person who has to insure that the space will not contain
any pointers is me. To do this I have to allocate different areas
for pointers and data. To do this I have to avoid mixing data
and pointers in the first place. Then, I have to go through
my code and change the appropriate malloc calls
to GC_MALLOC_ATOMIC(). Sounds like rewriting code
to me.

- William Hughes

It is rewriting code. If you have an application that uses 1GB
of RAM, it is normal that you optimize RAM usage and the GC
if you use it.

Since you have said that the allocations do not happen
under your control, you shouldn't use the GC for that application
in my opinion. Your third party library is working, and since
you do not have the source code you are stuck with it.

The GC is not magic, and it will save you some work but there
is a minimal effort from your side to be done. If you do not
want to do ANY effort just go on using your system as it is.

We are not gaining any money with the GC, so if you do not use
it we do not lose any "sale" !!!

jacob

Oct 21 '06 #120

jacob navia

William Hughes wrote:

Richard Tobin wrote:

>>In article <11**********************@i3g2000cwc.googlegroups. com>,
William Hughes <wp*******@hotmail.comwrote:

>>>>If you wish to improve the performance of the garbage collector, you
can use various functions instead of the usual malloc(). In particular,
you can use GC_MALLOC_ATOMIC() to allocate space that will definitely
not contain any pointers.

>>>But the person who has to insure that the space will not contain
any pointers is me. To do this I have to allocate different areas
for pointers and data. To do this I have to avoid mixing data
and pointers in the first place. Then, I have to go through
my code and change the appropriate malloc calls
to GC_MALLOC_ATOMIC(). Sounds like rewriting code
to me.

I explained that you don't have to change your code to use the garbage
collector, but you can to make it more efficient. Why do you restate
this as if you're disagreeing with me?

Because although it will "work" without being more efficient, it may
have very long pauses, and may be so slow as to be useless.
You do not have to rewrite code to get it to work, but you
may have to rewrite code to get it to work usefully. (And worse, you
may be unable to rewrite code to get it to work usefully)

Excuse me but you tell us that your program uses 1GB of RAM.
This is not a normal application, and you may need to
optimize your usage.

Besides, you are having NOW a pause each time you do a free() since
free() will try to consolidate memory. If you use a bad malloc package
that doesn't do that, you have a lot of that GB of RAM in very small
chunks, unusable!

If your program is using 1GB of RAM you must pay attention to RAM usage
more so than in a normal application sorry.

>
>>By the way, in most programs you can use GC_MALLOC_ATOMIC() without
changing your data structures. Just use it for the allocations that
would be pointer-free anyway, such as strings and arrays of numbers.

And if you have large data structures that do contain pointers
you are SOL. Being told that "most programs" do not have this problem
is cold comfort.

What do you expect?

Most applications do not use 1GB RAM!

And this does not address the problem of memory allocation in
libraries not under your control (no you cannot simply link them
with another malloc/free as you will then have two mallocs working
on the same address space and same physical memory).

GC_malloc is compatible with malloc/free. You should not MIX both,
i.e. giving a GC_malloced chunk to free().

Oct 21 '06 #121

William Hughes

jacob navia wrote:

William Hughes wrote:
Richard Tobin wrote:

>In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.orgwrote:
Yes, it implies that you design your data structures to avoid mixing
data and pointers. This is not very difficult to do.

So if I've declared a structure consisting of a large array of doubles
and a single pointer (say, for a linked list), then I need to re-write
my code to use it with GC?

You certainly don't have to do that to use the Boehm collector.

Well, it will work, but it might take a *lot* of time.

>If you wish to improve the performance of the garbage collector, you
can use various functions instead of the usual malloc(). In particular,
you can use GC_MALLOC_ATOMIC() to allocate space that will definitely
not contain any pointers.

But the person who has to insure that the space will not contain
any pointers is me. To do this I have to allocate different areas
for pointers and data. To do this I have to avoid mixing data
and pointers in the first place. Then, I have to go through
my code and change the appropriate malloc calls
to GC_MALLOC_ATOMIC(). Sounds like rewriting code
to me.

- William Hughes

It is rewriting code. If you have an application that uses 1GB
of RAM, it is normal that you optimize RAM usage and the GC
if you use it.

I spend a lot of time optimizing RAM usage. Why would
I want to complicate my life to solve a non-problem?

Since you have said that the allocations do not happen
under your control, you shouldn't use the GC for that application
in my opinion. Your third party library is working, and since
you do not have the source code you are stuck with it.

Which was my point. This application is not suitable for
GC. Indeed most of the applications I work on are not suitable
for GC. Am I typical, no. Am I representative of a large enough
group to bring the statement "malloc/free should be replaced by GC"
into doubt. Yes. There is real code out there that is not
suitable for GC. There are many people for whom memory leaks
are a very small or nonexistent problem. GC has benefits for
some (and annecdotal evidence suggest these can be quite
large) but for others it has very little benefit.
It also has risks, and it is not suitable for all code. This does
not sound like a general purpose solution.

- William Hughes

Oct 21 '06 #122

William Hughes

jacob navia wrote:

William Hughes wrote:
Richard Tobin wrote:

>In article <11**********************@i3g2000cwc.googlegroups. com>,
William Hughes <wp*******@hotmail.comwrote:
If you wish to improve the performance of the garbage collector, you
can use various functions instead of the usual malloc(). In particular,
you can use GC_MALLOC_ATOMIC() to allocate space that will definitely
not contain any pointers.

But the person who has to insure that the space will not contain
any pointers is me. To do this I have to allocate different areas
for pointers and data. To do this I have to avoid mixing data
and pointers in the first place. Then, I have to go through
my code and change the appropriate malloc calls
to GC_MALLOC_ATOMIC(). Sounds like rewriting code
to me.

I explained that you don't have to change your code to use the garbage
collector, but you can to make it more efficient. Why do you restate
this as if you're disagreeing with me?

Because although it will "work" without being more efficient, it may
have very long pauses, and may be so slow as to be useless.
You do not have to rewrite code to get it to work, but you
may have to rewrite code to get it to work usefully. (And worse, you
may be unable to rewrite code to get it to work usefully)

Excuse me but you tell us that your program uses 1GB of RAM.
This is not a normal application, and you may need to
optimize your usage.

Besides, you are having NOW a pause each time you do a free() since
free() will try to consolidate memory. If you use a bad malloc package
that doesn't do that, you have a lot of that GB of RAM in very small
chunks, unusable!

If your program is using 1GB of RAM you must pay attention to RAM usage
more so than in a normal application sorry.

>By the way, in most programs you can use GC_MALLOC_ATOMIC() without
changing your data structures. Just use it for the allocations that
would be pointer-free anyway, such as strings and arrays of numbers.

And if you have large data structures that do contain pointers
you are SOL. Being told that "most programs" do not have this problem
is cold comfort.

What do you expect?

Most applications do not use 1GB RAM!

And this does not address the problem of memory allocation in
libraries not under your control (no you cannot simply link them
with another malloc/free as you will then have two mallocs working
on the same address space and same physical memory).

GC_malloc is compatible with malloc/free. You should not MIX both,
i.e. giving a GC_malloced chunk to free().

This I don't understand. The major problem I see is not
the mixing of calls to free. How can GC_malloc operate correctly, if
there
is another malloc operating independently of it? How does GC_malloc
know where it is safe to allocate memory? What happens if the other
malloc (which does not know about GC_malloc) allocates memory
that GC_malloc has already allocated?

- William Hughes

Oct 21 '06 #123

Richard Tobin

In article <11*********************@b28g2000cwb.googlegroups. com>,
William Hughes <wp*******@hotmail.comwrote:

>Because although it will "work" without being more efficient, it may
have very long pauses, and may be so slow as to be useless.

>[more in the same vein deleted]

Perhaps you're mistaking me for someone who wants to persuade
you to use garbage collection?

>And if you have large data structures that do contain pointers
you are SOL. Being told that "most programs" do not have this problem
is cold comfort.

Well if you find someone who tells you that, I suggest you enlighten
them.

-- Richard

Oct 21 '06 #124

Richard Tobin

In article <11**********************@k70g2000cwa.googlegroups .com>,
William Hughes <wp*******@hotmail.comwrote:

>This I don't understand. The major problem I see is not
the mixing of calls to free. How can GC_malloc operate correctly, if
there is another malloc operating independently of it?

It probably can't reclaim memory allocated by the other malloc.

>How does GC_malloc
know where it is safe to allocate memory?

Perhaps it uses malloc(). Or (more likely) it works in a system
dependent way. Almost all operating systems have other ways to
allocate memory, and to be useful they have to interoperate with
malloc().

-- Richard

Oct 21 '06 #125

William Hughes

Richard Tobin wrote:

In article <11*********************@b28g2000cwb.googlegroups. com>,
William Hughes <wp*******@hotmail.comwrote:

Because although it will "work" without being more efficient, it may
have very long pauses, and may be so slow as to be useless.

[more in the same vein deleted]

Perhaps you're mistaking me for someone who wants to persuade
you to use garbage collection?

No, but we are discussing this in a thread where the OP
makes the argument that GC should be the default.
The argument is not that GC can't be useful in some cases
(or indeed very useful in some cases), The argument is that GC
is not useful enough in enough cases.

Part of the OP's case is that the time needed for GC is
negligible, both in terms of the pauses and in terms
of overall run time. My point is that there is an important
class of programs for which this is not true.
(I argue that there are cases where the "negligible pauses"
are not negligibe by any reasonable use of the term. Others
have pointed out that the "negligible pauses" may not be
negligible in some cases.)

The general point, that GC can consume significant resources,
is important (note that CPU time is only one resource that GC
uses).

- William Hughes

Oct 21 '06 #126

CBFalconer

William Hughes wrote:

>

.... snip ...

>
And if you have large data structures that do contain pointers
you are SOL. Being told that "most programs" do not have this
problem is cold comfort.

And this does not address the problem of memory allocation in
libraries not under your control (no you cannot simply link
them with another malloc/free as you will then have two mallocs
working on the same address space and same physical memory).

My tendency is to have pointers in almost everything I malloc,
apart from strings. My code is usually peppered with
create/destroy pairs for headers to records.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

Oct 21 '06 #127

CBFalconer

jacob navia wrote:

William Hughes wrote:

.... snip ...

>>
Because although it will "work" without being more efficient, it
may have very long pauses, and may be so slow as to be useless.
You do not have to rewrite code to get it to work, but you may
have to rewrite code to get it to work usefully. (And worse,
you may be unable to rewrite code to get it to work usefully)

Excuse me but you tell us that your program uses 1GB of RAM.
This is not a normal application, and you may need to
optimize your usage.

Besides, you are having NOW a pause each time you do a free()
since free() will try to consolidate memory. If you use a bad
malloc package that doesn't do that, you have a lot of that GB
of RAM in very small chunks, unusable!

I pointed out earlier (in this thread, I believe) that delays for
recombination are not intrinsic in the design of a malloc/free
system. It is possible to have the free operation be O(1),
regardless of the number of freed and allocated blocks present.
The proof is available to all in my nmalloc, available at:

<http://cbfalconer.home.att.net/download>

This is a fully standards conforming system, without the gotchas of
a GC. It is even available to all without licence fees, publishing
restrictions, etc. The implementation is not purely standard, but
minimizes the non-standard requirements.

Please don't come up with this excuse again.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

Oct 21 '06 #128

CBFalconer

William Hughes wrote:

jacob navia wrote:

.... snip ...

>>
GC_malloc is compatible with malloc/free. You should not MIX both,
i.e. giving a GC_malloced chunk to free().

This I don't understand. The major problem I see is not the mixing
of calls to free. How can GC_malloc operate correctly, if there
is another malloc operating independently of it? How does GC_malloc
know where it is safe to allocate memory? What happens if the other
malloc (which does not know about GC_malloc) allocates memory
that GC_malloc has already allocated?

This simply means that GC_malloc has to call malloc to get memory.
When the GC system decides that something is freeable, it has to
call free to do it. That means that all the foibles of the malloc
system are still present, including the free delays (see my post
elsethread though), except that those delays can occur at
uncontrolled times. In exchange you get a license to be sloppy
about releasing memory, provided you adhere to other unspecified
restrictions.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

Oct 21 '06 #129

jacob navia

CBFalconer wrote:

William Hughes wrote:

>>jacob navia wrote:

... snip ...

>>>GC_malloc is compatible with malloc/free. You should not MIX both,
i.e. giving a GC_malloced chunk to free().

This I don't understand. The major problem I see is not the mixing
of calls to free. How can GC_malloc operate correctly, if there
is another malloc operating independently of it? How does GC_malloc
know where it is safe to allocate memory? What happens if the other
malloc (which does not know about GC_malloc) allocates memory
that GC_malloc has already allocated?

This simply means that GC_malloc has to call malloc to get memory.
When the GC system decides that something is freeable, it has to
call free to do it. That means that all the foibles of the malloc
system are still present, including the free delays (see my post
elsethread though), except that those delays can occur at
uncontrolled times. In exchange you get a license to be sloppy
about releasing memory, provided you adhere to other unspecified
restrictions.

The GC calls directly the OS. It knows where are the areas
of virtual memory that it has reserved memory, so it knows if a
word points into its memory areas.

Oct 21 '06 #130

William Hughes

jacob navia wrote:

CBFalconer wrote:
William Hughes wrote:

>jacob navia wrote:

... snip ...

>>GC_malloc is compatible with malloc/free. You should not MIX both,
i.e. giving a GC_malloced chunk to free().

This I don't understand. The major problem I see is not the mixing
of calls to free. How can GC_malloc operate correctly, if there
is another malloc operating independently of it? How does GC_malloc
know where it is safe to allocate memory? What happens if the other
malloc (which does not know about GC_malloc) allocates memory
that GC_malloc has already allocated?

This simply means that GC_malloc has to call malloc to get memory.
When the GC system decides that something is freeable, it has to
call free to do it. That means that all the foibles of the malloc
system are still present, including the free delays (see my post
elsethread though), except that those delays can occur at
uncontrolled times. In exchange you get a license to be sloppy
about releasing memory, provided you adhere to other unspecified
restrictions.

The GC calls directly the OS. It knows where are the areas
of virtual memory that it has reserved memory, so it knows if a
word points into its memory areas.

This will probably work if the other implementation of malloc
does not assume that it is the only thing modifying the heap
and act accordingly. I don't know what the present
situation is (the C standard cannot address this problem as
the only way C defines to access the heap is malloc), but
I would guess that assuming that a malloc implementation is
safe to use with other independent memory allocators is
similar to the assumption that an unknown function is
multi-thread safe. I am very short on warm fuzzies.

-William Hughes

Oct 21 '06 #131

jacob navia

William Hughes wrote:

This will probably work if the other implementation of malloc
does not assume that it is the only thing modifying the heap
and act accordingly. I don't know what the present
situation is (the C standard cannot address this problem as
the only way C defines to access the heap is malloc), but
I would guess that assuming that a malloc implementation is
safe to use with other independent memory allocators is
similar to the assumption that an unknown function is
multi-thread safe. I am very short on warm fuzzies.

Look, if you use a brain dead malloc it is not somebody else's
fault. In windows, for instance, you can allocate using malloc, but
also with GlobalAlloc, VirtualAlloc, and a dozen of other functions.
Under unix you have sbrk() and many others.

Any malloc that assumes that all heap is allocated by malloc
is completely brain dead. Besides, WHY would such an assumption
need to be done?

As we advance in the discussion, your arguments become more and more
contrieved.

Oct 21 '06 #132

Paul Connolly

"William Hughes" <wp*******@hotmail.comwrote in message
news:11**********************@f16g2000cwb.googlegr oups.com...

>
jacob navia wrote:

>There are pauses, but in modern workstations this is barely noticeable.

I do not understand. I thought that GC had to scan the entire
amount of memory used by the program. I regularly have programs
that use over a gibabyte of memory. How is this
scanned without a noticable pause?

Most modern GC uses "generational scavenging" (as opposed to the old full
stop-and-copy that some older pure functional programming languages had) - a
generation is a partition of the whole allocated memory - with the more
recently allocated memmory being in the first generation to be compacted -
generational scavenging is based on an assumption that the most recently
allocated memory is the most likely to require collection - this is often a
good bet and mitigates the problem of "pauses", but does not eliminate the
problem - well-written malloc/free code will be superior in terms of speed
and amount memory allocated.

Oct 21 '06 #133

Paul Connolly

"jacob navia" <ja***@jacob.remcomp.frwrote in message
news:45***********************@news.orange.fr...

Paul Connolly wrote:
>"SM Ryan" <wy*****@tango-sierra-oscar-foxtrot-tango.fake.orgwrote in
message news:12*************@corp.supernews.com...
everything? so this garbage collection doesn't use any more memory than
hand written malloc/free code,

Not really. Garbage is automatically collected. You can fine-tune this,
specifying the threshold or call a gc when you think it is a good moment
to do that.

Explicitly calling GC more often isn't free (excuse the pun) - I think it
will slow the execution of my program (I tried explicitly calling GC in an
ASP .NET application - it got slower - the manual warned me it might, but I
wouldn't be told)

>and the application doesn't run slower than hand written malloc/free
code,

No, it doesn't run slower. Allocation is a bit slower since each
allocation does a bit of a GC (incremental GC)

It doesn't run slower but allocation is slower - so does it make up the
ground on freeing? - I find that difficult to believe - the freeing takes
longer under automatic GC - one possibility is that locality of reference is
improved by compaction of the heap and processors with cache might benefit
from that - but is that enough to balance the other work that automatic GC
has to do? - my experience is that languages with automatic GC (ML,
Haskell, Prolog, C#) run slower (admittedly ML, Haskell, Prolog might also
suffer from lack of mutable data, Haskell from laziness, and Prolog from
backtracking)

>and there are no relatively long (compared to free()) pauses while
garbage collection goes on?

There are pauses, but in modern workstations this is barely noticeable.

>wow that's phantastic and incredible.

Write in assembly. That's a *real* language :-)

Actually I wrote a _little_ bit of assembler this morning, which turned into
this afternoon, as it often does with assembler :-)

Oct 21 '06 #134

Paul Connolly

"Paul Connolly" <pg********@blueyonder.co.ukwrote in message
news:0W****************@fe1.news.blueyonder.co.uk. ..

>
Most modern GC uses "generational scavenging" (as opposed to the old full
stop-and-copy that some older pure functional programming languages had) -
a

I thought about this again - generational scavenging works best with pure
functional languages (where every structure refers to earlier created
structures) - assignment means that older structures can be made to refer to
newer ones - making it much more complicated and inefficient to do GC in a
language with destructive assignment - so that makes GC in C even more
inefficient than I first imagined.

Oct 23 '06 #135

Friedrich Dominicus

"Paul Connolly" <pg********@blueyonder.co.ukwrites:

"Paul Connolly" <pg********@blueyonder.co.ukwrote in message
news:0W****************@fe1.news.blueyonder.co.uk. ..
>>
Most modern GC uses "generational scavenging" (as opposed to the old full
stop-and-copy that some older pure functional programming languages had) -
a

I thought about this again - generational scavenging works best with pure
functional languages (where every structure refers to earlier created
structures) - assignment means that older structures can be made to refer to
newer ones - making it much more complicated and inefficient to do GC in a
language with destructive assignment - so that makes GC in C even more
inefficient than I first imagined.

Strange that Smalltalk, Common Lisp etc have no troubles with GC
than...

Friedrich
--
Please remove just-for-news- to reply via e-mail.

Oct 24 '06 #136

Chris Thomasson

"SM Ryan" <wy*****@tango-sierra-oscar-foxtrot-tango.fake.orgwrote in
message news:12*************@corp.supernews.com...

ro******@ibd.nrc-cnrc.gc.ca (Walter Roberson) wrote:
# In article <45**********************@news.orange.fr>,
# jacob navia <ja***@jacob.remcomp.frwrote:

[...]

Boehm-Demers garbage collection doesn't work for all programs on
all systems. The systems it has been ported to do work (ie they
have been verified with system libraries and compilers) if you
write your programs in the restricted language.

[...]

FWIW, here is an experimental protoyope of an atomically thread-safe
reference counted pointer:

http://appcore.home.comcast.net/vzoom/refcount/

It has strict C API and the ABI adheres to the C calling convention... You
can use it to do garbage collection like activities:

http://groups.google.com/group/comp....8308f5e7188bb4
Here is some more information:

http://softwareforums.intel.com/ISN/.../30225804.aspx
Enjoy!
;^)

Oct 24 '06 #137

Chris Thomasson

Here is some more information:

>
http://softwareforums.intel.com/ISN/.../30225804.aspx

Just to quick summarize, you can do stuff with this pointer that is
impossible to do with shared_ptr... Perhaps its more flexible... It already
outperforms shared_ptr wrt pointer swaps and weak count updates...

I am hoping that it will get accepted into the Boost library.

Oct 24 '06 #138

Richard Tobin

In article <87************@flarge.here>,
Friedrich Dominicus <ju*****************@q-software-solutions.dewrote:

>"Paul Connolly" <pg********@blueyonder.co.ukwrites:

>I thought about this again - generational scavenging works best with pure
functional languages (where every structure refers to earlier created
structures) - assignment means that older structures can be made to refer to
newer ones - making it much more complicated and inefficient to do GC in a
language with destructive assignment - so that makes GC in C even more
inefficient than I first imagined.

>Strange that Smalltalk, Common Lisp etc have no troubles with GC
than...

No, they just have more complicated garbage collectors than would be required
if all pointers pointed backwards.

Pure functional languages don't necessarily have that property - such
a language could perfectly well have a syntax for circular structures.

-- Richard

Oct 24 '06 #139

SM Ryan

"Chris Thomasson" <cr*****@comcast.netwrote:

# FWIW, here is an experimental protoyope of an atomically thread-safe
# reference counted pointer:

How well does it collect cyclic graphs?

--
SM Ryan http://www.rawbw.com/~wyrmwif/
What kind of convenience store do you run here?

Oct 24 '06 #140

Chris Thomasson

"SM Ryan" <wy*****@tango-sierra-oscar-foxtrot-tango.fake.orgwrote in
message news:12*************@corp.supernews.com...

"Chris Thomasson" <cr*****@comcast.netwrote:

# FWIW, here is an experimental protoyope of an atomically thread-safe
# reference counted pointer:

How well does it collect cyclic graphs?

Currently, you have to be careful about cycles... An object A that
explicitly references object B, which in turn explicitly references object A
== leak... So, you have to resort to using tricky reference counting
techniques, which are compatible with my current algorithm, to get around
the cycle problem.

However, there is a way to add weak pointers to my algorithm which would
render it immune to the cycle problem... Not sure if I should do it or
not...

Humm...

Oct 24 '06 #141

Paul Connolly

"Richard Tobin" <ri*****@cogsci.ed.ac.ukwrote in message
news:eh**********@pc-news.cogsci.ed.ac.uk...

In article <87************@flarge.here>,
Friedrich Dominicus <ju*****************@q-software-solutions.dewrote:
>>"Paul Connolly" <pg********@blueyonder.co.ukwrites:

Pure functional languages don't necessarily have that property - such
a language could perfectly well have a syntax for circular structures.

And also pure functional languages might have optimisations such as in cases
of tail recursion where a containing structure might be constructed before
the elements of that structure - so the garbage collection becomes more
complicated.

Oct 25 '06 #142

Hello,

Paul Connolly wrote:
[...]

Actually I wrote a _little_ bit of assembler this morning, which turned into
this afternoon, as it often does with assembler :-)

BTW, is anyone aware of any assembler/hardware optimizations that can
speed up garbage collection?

And does anyone from the respectful community use high performance
garbage collectors from Harmony project (http://harmony.apache.org)?

With best regards,
Alexei Fedotov

Dec 13 '06 #143

Similar topics