469,898 Members | 1,574 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,898 developers. It's quick & easy.

Re: multithreading.

Jon Harrop <us****@jdh30.plus.comwrites:
Let's do a single-threaded benchmark first.
>What collection-intensive tasks?

Symbolic rewriting would make a good benchmark. Here is one:

http://www.lambdassociates.org/studies/study10.htm

Try implementing this using reference counting and we can see how it
compares (probably on a trickier rewrite). Replace the arbitrary-precision
ints with doubles.
The (old) machine I read news on is not nearly so powerful as Mark's
so for reference here's the time for a slightly modified version[1] of
the O'Caml code :-

$ ocamlopt -v
The Objective Caml native-code compiler, version 3.08.2
Standard library directory: /usr/lib/ocaml/3.08
$ ocamlopt simplify.ml -o simplify
$ time ./simplify
Took 5.020000s

real 0m5.137s
user 0m5.021s
sys 0m0.005s

Now for C code that uses naive reference counting :-

$ cc -v
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.5/specs
Configured with: ../src/configure -v --enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --with-system-zlib --enable-nls --without-included-gettext --enable-__cxa_atexit --enable-clocale=gnu --enable-debug --enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux
Thread model: posix
gcc version 3.3.5 (Debian 1:3.3.5-8ubuntu2.1)
$ cc -O3 -Wall simplify.c -o simplify
$ time ./simplify
real 0m16.757s
user 0m16.553s
sys 0m0.003s
$

That's over 3x slower than O'Caml so O'Caml/GC is the winner, right?

We'll if your only choice is O'Caml or (naive) reference counting for
this problem then yes. However, there are other approaches. For
example, the following is the time for C code that uses a hand-coded
version of what a sufficiently smart compiler that implemented
linear/unique types could produce :-

$ time ./simplify
real 0m2.812s
user 0m2.751s
sys 0m0.002s

So on my machine naive reference counting is 3x slower than O'Caml GC
but the linear version is 2x faster than O'Caml. Should we conclude
that (O'Caml) GC and naive reference counting both suck?

My conclusion is that while the benchmark clearly measures something
it isn't clear to me that what it measures is interesting/useful. It
may be possible to make it useful in which case the benchmark should
also track peak/average heap size. At present that's pointless
because the the test expression is so small that the working set for
each iteration should only be a few hundred bytes. This will fit in
the nursery of any generational collector or only require a copying
collector to traverse a few hundred bytes. That's very GC friendly.

------------------

[1] Compiling the code from the site gave the following :-

$ ocamlopt simplify.ml -o simplify
No implementations provided for the following modules:
Num referenced from simplify.cmx

Rather than work out if I'm suffering from using an out of date
O'Caml or it just isn't installed right, I just changed the code to
use a good old data type and that works fine :-

$ cat simplify.ml
type expr = Int of int | Var of string | Add of expr * expr | Mul of expr * expr;;

let int n = (Int n);;

let ( +: ) f g = Add(f, g) and ( *: ) f g = Mul(f, g);;

let test_expr =
Var "x" *: ((int 12 *: int 0 +: (int 23 +: int 8)) +: Var "y");;

let time f x =
let t = Sys.time() in
let f_x = f x in
Printf.printf "Took %fs\n" (Sys.time() -. t);
f_x;;

let rec loop n f x = if n=1 then f x else (ignore(f x); loop (n-1) f x);;

let rec ( +: ) f g = match f, g with
| Int n, Int m -Int (n + m)
| (Int 0), e | e, (Int 0) -e
| f, Add(g, h) -f +: g +: h
| f, g -Add(f, g);;

let rec ( *: ) f g = match f, g with
| Int n, Int m -Int (n * m)
| (Int 0), e | e, (Int 0) -(Int 0)
| (Int 1), e | e, (Int 1) -e
| f, Mul(g, h) -f *: g *: h
| f, g -Mul(f, g);;

let rec simplify = function
| Int _ | Var _ as f -f
| Add (f, g) -simplify f +: simplify g
| Mul (f, g) -simplify f *: simplify g;;

time (loop 10000000 simplify) test_expr;;
Jun 27 '08 #1
25 1300
"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
Jon Harrop <us****@jdh30.plus.comwrites:
>Let's do a single-threaded benchmark first.
>>What collection-intensive tasks?

Symbolic rewriting would make a good benchmark. Here is one:

http://www.lambdassociates.org/studies/study10.htm

Try implementing this using reference counting and we can see how it
compares (probably on a trickier rewrite). Replace the
arbitrary-precision
ints with doubles.
[...]

Try the test using the following allocator:

http://groups.google.com/group/comp....bdc76f9792de1f

What results do you get?

Jun 27 '08 #2
"Chris Thomasson" <cr*****@comcast.netwrote in message
news:yI******************************@comcast.com. ..
"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
>Jon Harrop <us****@jdh30.plus.comwrites:
>>Let's do a single-threaded benchmark first.

What collection-intensive tasks?

Symbolic rewriting would make a good benchmark. Here is one:

http://www.lambdassociates.org/studies/study10.htm

Try implementing this using reference counting and we can see how it
compares (probably on a trickier rewrite). Replace the
arbitrary-precision
ints with doubles.
[...]

Try the test using the following allocator:

http://groups.google.com/group/comp....bdc76f9792de1f

What results do you get?
Ah, never mind. Who cares.

Jun 27 '08 #3
"Chris Thomasson" <cr*****@comcast.netwrites:
"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
>Jon Harrop <us****@jdh30.plus.comwrites:
>>Let's do a single-threaded benchmark first.

What collection-intensive tasks?

Symbolic rewriting would make a good benchmark. Here is one:

http://www.lambdassociates.org/studies/study10.htm

Try implementing this using reference counting and we can see how it
compares (probably on a trickier rewrite). Replace the
arbitrary-precision
ints with doubles.
[...]

Try the test using the following allocator:

http://groups.google.com/group/comp....bdc76f9792de1f

What results do you get?
Using malloc(3) directly causes the time for the reference counting C
code to be 30 seconds i.e. 6x slower than O'Caml. So I used a simple
object pooling scheme. Although there are 50000011 allocations, only
14 of them result in a call to malloc(3). The same pool is used in
the linear code so while it may be possible to improve the pool, the
fact that the linear code is 8x quicker than the reference counting
version shows that the (native) reference couting is the place to look
for performance improvements.

The above and the fact that your code is C++ and I'm using C means
that there is is little incentive to convert your C++ to C to try it.
Jun 27 '08 #4
"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
"Chris Thomasson" <cr*****@comcast.netwrites:
>"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
>>Jon Harrop <us****@jdh30.plus.comwrites:
Let's do a single-threaded benchmark first.

What collection-intensive tasks?

Symbolic rewriting would make a good benchmark. Here is one:

http://www.lambdassociates.org/studies/study10.htm

Try implementing this using reference counting and we can see how it
compares (probably on a trickier rewrite). Replace the
arbitrary-precision
ints with doubles.
[...]

Try the test using the following allocator:

http://groups.google.com/group/comp....bdc76f9792de1f

What results do you get?

Using malloc(3) directly causes the time for the reference counting C
code to be 30 seconds i.e. 6x slower than O'Caml. So I used a simple
object pooling scheme. Although there are 50000011 allocations, only
14 of them result in a call to malloc(3). The same pool is used in
the linear code so while it may be possible to improve the pool, the
fact that the linear code is 8x quicker than the reference counting
version shows that the (native) reference couting is the place to look
for performance improvements.

The above and the fact that your code is C++ and I'm using C means
that there is is little incentive to convert your C++ to C to try it.
Here is a crude C version of my single-threaded region allocator:

http://pastebin.com/m52ba914

http://groups.google.com/group/comp....f65273b09b4229

Jun 27 '08 #5
"Chris Thomasson" <cr*****@comcast.netwrites:
"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
>Using malloc(3) directly causes the time for the reference counting C
code to be 30 seconds i.e. 6x slower than O'Caml. So I used a simple
object pooling scheme. Although there are 50000011 allocations, only
14 of them result in a call to malloc(3). The same pool is used in
the linear code so while it may be possible to improve the pool, the
fact that the linear code is 8x quicker than the reference counting
version shows that the (native) reference couting is the place to look
for performance improvements.

The above and the fact that your code is C++ and I'm using C means
that there is is little incentive to convert your C++ to C to try it.

Here is a crude C version of my single-threaded region allocator:
There were two reasons for not using your C++ allocator, the fact that
it was in C++ was the second reason. Having a C version does nothing
to address the first reason: the linear version 8x quicker than the
reference counting version using the _same_ allocator. Thus improving
the allocator will not make the reference counting version faster than
the linear version. It _may_ make the reference counting version
faster than O'Caml. However, looking at your C allocator code it does
observably more work per allocation than the pooling allocator I'm
using so it isn't going to be faster. Talk is cheap so I actually
tested it :-

$ time ./simplify speed
(0x804b148/0x804e008/1/8192)::allocator_sys_expand
(0x804b148/0x8052028/2/16384)::allocator_sys_expand
(0x804b148/0x804e008/2/8192)::allocator_sys_promote

real 0m17.530s
user 0m17.307s
sys 0m0.005s
$

This is one second slower than my original code. Not a huge slowdown,
so as code goes it is fine. However, it clearly doesn't make things
faster and so just re-inforces my first reason for not using your
C++ code: allocation is not the bottlneck, (naive) reference counting is.
Jun 27 '08 #6

"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
"Chris Thomasson" <cr*****@comcast.netwrites:
>"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
>>Using malloc(3) directly causes the time for the reference counting C
code to be 30 seconds i.e. 6x slower than O'Caml. So I used a simple
object pooling scheme. Although there are 50000011 allocations, only
14 of them result in a call to malloc(3). The same pool is used in
the linear code so while it may be possible to improve the pool, the
fact that the linear code is 8x quicker than the reference counting
version shows that the (native) reference couting is the place to look
for performance improvements.

The above and the fact that your code is C++ and I'm using C means
that there is is little incentive to convert your C++ to C to try it.

Here is a crude C version of my single-threaded region allocator:

There were two reasons for not using your C++ allocator, the fact that
it was in C++ was the second reason. Having a C version does nothing
to address the first reason: the linear version 8x quicker than the
reference counting version using the _same_ allocator. Thus improving
the allocator will not make the reference counting version faster than
the linear version. It _may_ make the reference counting version
faster than O'Caml. However, looking at your C allocator code it does
observably more work per allocation than the pooling allocator I'm
using so it isn't going to be faster.
Why type of pool allocator are you using? AFAICT, pool allocator is
analogous to region allocator in that allocation and reclamations can be
as simple as:
__________________________________________________ ___________
#define BUF_SIZE 8192
struct region {
unsigned char buf[BUF_SIZE];
size_t offset;
};
void* region_malloc(
region* const _this,
size_t size
) {
size_t const offset = _this->offset;
size = /* adjust size for proper alignment */
if (size + offset <= BUF_SIZE) {
_this->offset += size;
return _this->buf + offset;
}
return NULL;
}
void region_reset(
region* const _this
) {
_this->offset = 0;
}
__________________________________________________ ___________


Now you can do stuff like:
void thread() {
int o, i1, i2;
region region[2] = { { { '\0' }, 0 }, { { '\0' }, 0 } };
for (o = 1 ;; ++o) {
region_malloc(region, (o % 128) + 1);
for (i1 = 1; i1 < (o % 512) + 2; ++i1) {
for (i2 = 1; i2 < (o % 32) + 2; ++i2) {
region_malloc(region + 1, (i2 % 16) + 1);
}
region_reset(region + 1);
}
region_reset(region);
}
}
The ratio of mallocs to frees does not need to be equal. Instead, it
can be N mallocs to 1 reset. The granularity can be improved by clever
partitioning of multiple regions.

Talk is cheap so I actually
tested it :-

$ time ./simplify speed
(0x804b148/0x804e008/1/8192)::allocator_sys_expand
(0x804b148/0x8052028/2/16384)::allocator_sys_expand
(0x804b148/0x804e008/2/8192)::allocator_sys_promote

real 0m17.530s
user 0m17.307s
sys 0m0.005s
$

This is one second slower than my original code.
Did you free each object individually or did you take advantage of the
allocator_reset procedure? The region allocator I posted does not perform
all that well when you call allocator_release. Its best to create multiple
allocators and partition them into zones, and only call reset on a zone when
its in a persistent quiescent state.

Not a huge slowdown,
so as code goes it is fine. However, it clearly doesn't make things
faster and so just re-inforces my first reason for not using your
C++ code: allocation is not the bottlneck, (naive) reference counting is.
I clearly misunderstood you; sorry about that non-sense. I was under the
impression that the allocator was the bottle neck. Humm, can you post the
test code your using? Perhaps it contains some reference counting patterns
that are compatible with fine/medium-grain partitioned reference counting.
Think of zone-based object reclamation. You can amortize a plurality of
reference increments/decrements at so called check-points that only need to
be executed on a periodic/episodic basis.

Also, if the benchmark is single-threaded, well, you could apply more and
more elaborate amortizations for existing refcounting techniques.

Jun 27 '08 #7
"Chris Thomasson" <cr*****@comcast.netwrote in message
news:g4******************************@comcast.com. ..
>
"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
>"Chris Thomasson" <cr*****@comcast.netwrites:
>>"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
Using malloc(3) directly causes the time for the reference counting C
code to be 30 seconds i.e. 6x slower than O'Caml. So I used a simple
object pooling scheme. Although there are 50000011 allocations, only
14 of them result in a call to malloc(3). The same pool is used in
the linear code so while it may be possible to improve the pool, the
fact that the linear code is 8x quicker than the reference counting
version shows that the (native) reference couting is the place to look
for performance improvements.

The above and the fact that your code is C++ and I'm using C means
that there is is little incentive to convert your C++ to C to try it.

Here is a crude C version of my single-threaded region allocator:

There were two reasons for not using your C++ allocator, the fact that
it was in C++ was the second reason. Having a C version does nothing
to address the first reason: the linear version 8x quicker than the
reference counting version using the _same_ allocator. Thus improving
the allocator will not make the reference counting version faster than
the linear version. It _may_ make the reference counting version
faster than O'Caml. However, looking at your C allocator code it does
observably more work per allocation than the pooling allocator I'm
using so it isn't going to be faster.

Why type of pool allocator are you using? AFAICT, pool allocator is
analogous to region allocator in that allocation and reclamations can be
as simple as:
[...]
>Not a huge slowdown,
so as code goes it is fine. However, it clearly doesn't make things
faster and so just re-inforces my first reason for not using your
C++ code: allocation is not the bottlneck, (naive) reference counting is.

I clearly misunderstood you; sorry about that non-sense. I was under the
impression that the allocator was the bottle neck. Humm, can you post the
test code your using? Perhaps it contains some reference counting patterns
that are compatible with fine/medium-grain partitioned reference counting.
Think of zone-based object reclamation. You can amortize a plurality of
reference increments/decrements at so called check-points that only need
to
be executed on a periodic/episodic basis.

Also, if the benchmark is single-threaded, well, you could apply more and
more elaborate amortizations for existing refcounting techniques.
If you apply optimizations to naive reference counting, well, then its no
longer all that naive is it? Conclusion, use the right tool for the job at
hand.

;^)

Jun 27 '08 #8
"Chris Thomasson" <cr*****@comcast.netwrites:
Why type of pool allocator are you using? AFAICT, pool allocator is
analogous to region allocator in that allocation and reclamations can be
as simple as:
Given that the allocator performance is not the bottlenck so the
method I used is irrelevant.
The ratio of mallocs to frees does not need to be equal. Instead, it
can be N mallocs to 1 reset. The granularity can be improved by clever
partitioning of multiple regions.
Agreed, but unless you have a clever partition for this problem then
that idea is moot.
Did you free each object individually or did you take advantage of
the allocator_reset procedure? The region allocator I posted does not
perform all that well when you call allocator_release. Its best to
create multiple allocators and partition them into zones, and only
call reset on a zone when its in a persistent quiescent state.
I used it as a straight malloc&free replacment. There is no point in
trying to take advantage of anything because allocation is not the
bottleneck that is causing the reference counting version to be 8x
slower than a linear version.

I clearly misunderstood you; sorry about that non-sense. I was under the
impression that the allocator was the bottle neck. Humm, can you post the
test code your using?
I could but it is trivial. If you want to take up Jon's challenge
then you can easily write the code using whatever reference counting
pattern you like. Changing to one form of deferred reference counting
decreases the time to ~11 seconds. Still 2x slower than O'Caml and 5x
slower than a linear version.

Think of zone-based object reclamation.
How about you think about it? Jon was arguing for GC you were arguing
for reference counting. All I did was measure the two on Jon's
problem on my machine and show that on this test both suck and
linearity wins, though GC sucks less than two forms of reference
counting.
Jun 27 '08 #9
"Chris Thomasson" <cr*****@comcast.netwrites:
If you apply optimizations to naive reference counting, well, then its
no longer all that naive is it? Conclusion, use the right tool for the
job at hand.
And what might that tool be? It doesn't appear to be GC or reference
counting.
Jun 27 '08 #10
"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
"Chris Thomasson" <cr*****@comcast.netwrites:
>Why type of pool allocator are you using? AFAICT, pool allocator is
analogous to region allocator in that allocation and reclamations can be
as simple as:

Given that the allocator performance is not the bottlenck so the
method I used is irrelevant.
>The ratio of mallocs to frees does not need to be equal. Instead, it
can be N mallocs to 1 reset. The granularity can be improved by clever
partitioning of multiple regions.

Agreed, but unless you have a clever partition for this problem then
that idea is moot.
I don't think its worth the effort. Oh well. Nobody is paying me here...

;^)

>Did you free each object individually or did you take advantage of
the allocator_reset procedure? The region allocator I posted does not
perform all that well when you call allocator_release. Its best to
create multiple allocators and partition them into zones, and only
call reset on a zone when its in a persistent quiescent state.

I used it as a straight malloc&free replacment. There is no point in
trying to take advantage of anything because allocation is not the
bottleneck that is causing the reference counting version to be 8x
slower than a linear version.

>I clearly misunderstood you; sorry about that non-sense. I was under the
impression that the allocator was the bottle neck. Humm, can you post the
test code your using?

I could but it is trivial. If you want to take up Jon's challenge
then you can easily write the code using whatever reference counting
pattern you like. Changing to one form of deferred reference counting
decreases the time to ~11 seconds. Still 2x slower than O'Caml and 5x
slower than a linear version.

>Think of zone-based object reclamation.

How about you think about it? Jon was arguing for GC you were arguing
for reference counting.
I was trying to address the various false blanket claims that Jon was making
that he was making about every form of reference counting.

All I did was measure the two on Jon's
problem on my machine and show that on this test both suck and
linearity wins, though GC sucks less than two forms of reference
counting.
Well, don't use refcounting for this problem then... Refcounting is not the
right answer to every problem. :^)

Jun 27 '08 #11
On 15 ΝΑΚ, 09:27, step...@dino.dnsalias.com (Stephen J. Bevan) wrote:
"Chris Thomasson" <cris...@comcast.netwrites:
If you apply optimizations to naive reference counting, well, then its
no longer all that naive is it? Conclusion, use the right tool for the
job at hand.

And what might that tool be? It doesn't appear to be GC or reference
counting.

Maybe this:
http://groups.google.ru/group/comp.p...b22a17c32c6b13
http://groups.google.ru/group/comp.p...c2bc2366831d29
Btw, what you mean by "reference counting"? Something like this:

void rc_acquire(rc_t* obj)
{
atomic_inc(&obj->rc);
}

void rc_release(rc_t* obj)
{
if (0 == atomic_dec(&obj->rc))
delete obj;
}
Are you testing on SMP/multicore machine?
Dmitriy V'jukov
Jun 27 '08 #12
"Dmitriy V'jukov" <dv*****@gmail.comwrites:
On 15 άΠΩ, 09:27, step...@dino.dnsalias.com (Stephen J. Bevan) wrote:
Maybe this:
http://groups.google.ru/group/comp.p...b22a17c32c6b13
http://groups.google.ru/group/comp.p...c2bc2366831d29
They may or may not help in an SMP situation but in a single threaded
scenario all they do is at more overhead and thus make reference
(naive) counting even slower.

Btw, what you mean by "reference counting"? Something like this:
[snip]

Yes, but since there is no SMP the inc/dec don't have to be atomic.
If they are the performance is even worse. Though to be fair then I'd
need to test against an SMP cable GC too.

Are you testing on SMP/multicore machine?
No.
Jun 27 '08 #13
"Chris Thomasson" <cr*****@comcast.netwrites:
"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
>Agreed, but unless you have a clever partition for this problem then
that idea is moot.

I don't think its worth the effort. Oh well. Nobody is paying me here...
Nobody was paying you to argue with Jon that reference counting was
sufficient either.

>How about you think about it? Jon was arguing for GC you were arguing
for reference counting.

I was trying to address the various false blanket claims that Jon was
making that he was making about every form of reference counting.
Jon challenged you to back up your claims that this claims were false
by providing a reference counting version that is faster than GC for
a specific test. I saw no response from you thus I don't see how you
addressed his claims at all.

>All I did was measure the two on Jon's
problem on my machine and show that on this test both suck and
linearity wins, though GC sucks less than two forms of reference
counting.

Well, don't use refcounting for this problem then... Refcounting is
not the right answer to every problem. :^)
I agree, but then you were the one arguing for reference counting and
against GC not me.
Jun 27 '08 #14
"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
"Chris Thomasson" <cr*****@comcast.netwrites:
>"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
>>Agreed, but unless you have a clever partition for this problem then
that idea is moot.

I don't think its worth the effort. Oh well. Nobody is paying me here...

Nobody was paying you to argue with Jon that reference counting was
sufficient either.
That's not what I was arguing about. You need to re-read the thread. Jon
made misleading claims on reference couning, and I pointed them out.

>>How about you think about it? Jon was arguing for GC you were arguing
for reference counting.

I was trying to address the various false blanket claims that Jon was
making that he was making about every form of reference counting.

Jon challenged you to back up your claims that this claims were false
by providing a reference counting version that is faster than GC for
a specific test. I saw no response from you thus I don't see how you
addressed his claims at all.
When did I say that reference counting in general is always faster than GC?
I said that some forms of reference counting is deterministic and GC is not.
Also most reference counting algorithms do not require some of the "heavy
handed" techniques that some GC employ. Thread suspension, thread-stack
introspection, stop-the-world, ect... Does this fact make reference count
better? Na.

>>All I did was measure the two on Jon's
problem on my machine and show that on this test both suck and
linearity wins, though GC sucks less than two forms of reference
counting.

Well, don't use refcounting for this problem then... Refcounting is
not the right answer to every problem. :^)

I agree, but then you were the one arguing for reference counting and
against GC not me.
When did I argue against GC? I was responding to Jon blanket false claims.

Jun 27 '08 #15
"Chris Thomasson" <cr*****@comcast.netwrites:
"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
>"Chris Thomasson" <cr*****@comcast.netwrites:
>>"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
Agreed, but unless you have a clever partition for this problem then
that idea is moot.

I don't think its worth the effort. Oh well. Nobody is paying me here...

Nobody was paying you to argue with Jon that reference counting was
sufficient either.

That's not what I was arguing about. You need to re-read the
thread. Jon made misleading claims on reference couning, and I pointed
them out.
Jon's arguments against RC were :-

. Fragility.
. Memory overhead.
. Performance degradation.
. Fragmentation.
. Not incremental, i.e. awful worst case performance.

Whether each one is misleading is a matter of debate. I'd argue that
the first isn't. The last isn't unless you consider at least some
kind of deferred reference counting scheme which just adds to the
first issue, somewhat adds to the second depending on just when the
deferred work is done. The penultimate issue is true when compared to
a copying GC but not say a mark-sweep GC. Regarding performance
degredation for "collection intensive-tasks" you wrote in
<m4******************************@comcast.com:-

What collection-intensive tasks? I know I can compete, and
most likely beat, a traditional GC with handcrafted implementations
that C++ give me the freedom to work with.

This led Jon to suggested a specific benchmark which is clearly small
enough for do that without requiring an onerous amount of work to
prove the point either way. One can certainly question the benchmark
(I certainly do) but as far as I can see you didn't post any kind of
followup at all. That's fine you don't have to, but then it is rather
bold to now claim you were pointing out misleading claims without
specifying which claims were misleading and which were not.

>Jon challenged you to back up your claims that this claims were false
by providing a reference counting version that is faster than GC for
a specific test. I saw no response from you thus I don't see how you
addressed his claims at all.

When did I say that reference counting in general is always faster
than GC?
I quoted what you claimed and so you were free to use reference
counting or not as you see fit. You didn't offer a solution.

When did I argue against GC? I was responding to Jon blanket false
claims.
Claiming they are false is not the same as showing they are false.
Jun 27 '08 #16

"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
"Chris Thomasson" <cr*****@comcast.netwrites:
>"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
>>"Chris Thomasson" <cr*****@comcast.netwrites:
"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
Agreed, but unless you have a clever partition for this problem then
that idea is moot.

I don't think its worth the effort. Oh well. Nobody is paying me
here...

Nobody was paying you to argue with Jon that reference counting was
sufficient either.

That's not what I was arguing about. You need to re-read the
thread. Jon made misleading claims on reference couning, and I pointed
them out.

Jon's arguments against RC were :-

. Fragility.
. Memory overhead.
. Performance degradation.
. Fragmentation.
. Not incremental, i.e. awful worst case performance.
[...]

First question, I don't have time right now to make proper response...

;^(

Stephen, do you honestly think that Jons blanket arguments apply to every
type of reference counting algorithm out there? Keep in mind that he did not
know that some state-of-the-art reference counting algorithms existed.
Please re-read the threads. He even brought up exception handling in the
context of RC to slam C++ and prop up another useful high-level lang. IMVHO,
he was doing a little trolling and I got hooked!

:^/


GC and RC are both great tools to have. Please try to understand that has
been my argument all along. When Jon writes about his false
claims/misunderstandings on "some" RC algorihtms, I point them out. So be
it.
AFAICT, Jon's main mistake is that he totally confuses the eligibility of an
object to be collected with the fact that a GC does not collect anything
unless it actually runs a collection cycle. His claims are wrong because he
is posting in C++ group which has no GC:
{
refcount<Foof(new Foo);
foo(f);
bar();
}
Jon says that GC will collect f while bar is executing. Well, his claim is
totally misleading because he cannot possible know when the GC is going to
decide to run a cycle. I have been over this with Jon. He refuses to
understand. Oh well.
He even slammed Linux, and insulted Linus Torvalds intelligence because he
happened to approve a proxy GC over a traditional one...

Jun 27 '08 #17
"Chris Thomasson" <cr*****@comcast.netwrites:
Stephen, do you honestly think that Jons blanket arguments apply to
every type of reference counting algorithm out there? Keep in mind
that he did not know that some state-of-the-art reference counting
algorithms existed.
The only citations I could find are to three techniques posted by
people in comp.programming.threads (one of which is your own). Any RC
mechanism which doesn't put an RC on every object but instead on some
kind of container/proxy for a group of objects is going reduce the
number of inc/dec calls at the expense of something. There's nothing
particularly state-of-the-art about that idea. Whether the method is
actually applicable depends on the problem. It can be used for Jon's
challenge since there is nothing in the challenge which forces fine
grain sharing between terms. Change the problem slightly (e.g. graph
re-writing) and it is much harder to use a proxy or linearity.

His claims are wrong because he is posting in C++ group which has no GC:

{
refcount<Foof(new Foo);
foo(f);
bar();
}

Jon says that GC will collect f while bar is executing. Well, his
claim is totally misleading because he cannot possible know when the
GC is going to decide to run a cycle. I have been over this with
Jon. He refuses to understand. Oh well.
Since you didn't quote Jon I had a look back through the thread for
that example and I can find it in <13*************@corp.supernews.com>
where Jon writes "can collect" not "will collect". To me that makes a
big difference as to whether it is misleading or not.

He even slammed Linux, and insulted Linus Torvalds intelligence
because he happened to approve a proxy GC over a traditional one...
Whether he slammed Linux or insulted Linus is irrelevant. Just
because I don't agree with him about one issue doesn't mean I
automatically dismiss what he says about another.
Jun 27 '08 #18
"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
"Chris Thomasson" <cr*****@comcast.netwrites:
>Stephen, do you honestly think that Jons blanket arguments apply to
every type of reference counting algorithm out there? Keep in mind
that he did not know that some state-of-the-art reference counting
algorithms existed.

The only citations I could find are to three techniques posted by
people in comp.programming.threads (one of which is your own). Any RC
mechanism which doesn't put an RC on every object but instead on some
kind of container/proxy for a group of objects is going reduce the
number of inc/dec calls at the expense of something. There's nothing
particularly state-of-the-art about that idea.
The most important aspect of state-of-the-art reference counting is the fact
that no dependence on any interlocked RMW and/or memory-barrier instructions
are needed at all. Of course, this means that various MAJOR benefits in the
realm of scalability, throughput and performance can be realized indeed.

Whether the method is
actually applicable depends on the problem. It can be used for Jon's
challenge since there is nothing in the challenge which forces fine
grain sharing between terms. Change the problem slightly (e.g. graph
re-writing) and it is much harder to use a proxy or linearity.
Proxy GC works very well within certain problem arenas. For instance,
classic reader-writer problem is efficiently solved through clever use of
the technique.

>His claims are wrong because he is posting in C++ group which has no GC:

{
refcount<Foof(new Foo);
foo(f);
bar();
}

Jon says that GC will collect f while bar is executing. Well, his
claim is totally misleading because he cannot possible know when the
GC is going to decide to run a cycle. I have been over this with
Jon. He refuses to understand. Oh well.

Since you didn't quote Jon I had a look back through the thread for
that example and I can find it in <13*************@corp.supernews.com>
where Jon writes "can collect" not "will collect". To me that makes a
big difference as to whether it is misleading or not.
Well, that rendered his argument completely moot. Of course a GC "can"
collect f while bar is executing. The misleading part is that he fails to
explain that "can" is predicated on the fact that a GC must run a cycle
while bar is executing. A small tweak to the program changes things:

{
{
refcount<Foof(new Foo);
foo(f);
}
bar();
}
Now, a naive RC WILL collect f _before_ bar has executed. Of course Jon said
that this fact is totally irrelevant. I called him on that point because it
takes advantage of the fact that some types of RC are deterministic. Most GC
algorihtms cannot rise to that level of determinism. We got into a spiral
argument that had no end in sight. Well, that was until he showed his true
colors and started implying that C++ is crap because it has no GC, OCaml
exceptions are faster.

>He even slammed Linux, and insulted Linus Torvalds intelligence
because he happened to approve a proxy GC over a traditional one...

Whether he slammed Linux or insulted Linus is irrelevant.
AFAICT, his insults against Linux and/or Linus are based in a form of
ignorance. Much like his attribution of several problems to _all_ forms of
RC. IMVHO, its relevant indeed. Jon __seems__ to hate RC no matter what
color, shape or form it presents itself in. Hate is "usually" based on utter
ignorance. Oh well, life goes on.

Just
because I don't agree with him about one issue doesn't mean I
automatically dismiss what he says about another.
Well, it gave me an insight into why Jon casts his blanket assertions across
ALL forms of RC. This is why I use the term "BLANKET". Some person might be
mislead into thinking that Jons assertions are true across all forms of RC.
This is totally false and I think Jon knew better. He posted that non-sense
on purpose for some odd reason. Why? I don't know. Perhaps you can tell
me???

Jun 27 '08 #19
"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
Jerry Coffin <jc*****@taeus.comwrites:
>In article <87************@dnsalias.com>, st*****@dino.dnsalias.com
>>The last isn't unless you consider at least some
kind of deferred reference counting scheme which just adds to the
first issue, somewhat adds to the second depending on just when the
deferred work is done.

More accurately, the last is really false (while still giving a fairly
accurate impression). Worse, your comments are incorrect as well --
deferring collection doesn't add to fragility at all.

So while you think that reference counting can be considered fragile
wyou do not consider that using a deferral mechanism makes it any more
fragile. Interesting. I disagree but I don't think that is
important.
[...]

RC is ONLY as fragile as the programmer who makes use if it.

Jun 27 '08 #20

"Chris Thomasson" <cr*****@comcast.netwrote in message
news:jv******************************@comcast.com. ..
"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
>Jerry Coffin <jc*****@taeus.comwrites:
>>In article <87************@dnsalias.com>, st*****@dino.dnsalias.com
The last isn't unless you consider at least some
kind of deferred reference counting scheme which just adds to the
first issue, somewhat adds to the second depending on just when the
deferred work is done.

More accurately, the last is really false (while still giving a fairly
accurate impression). Worse, your comments are incorrect as well --
deferring collection doesn't add to fragility at all.

So while you think that reference counting can be considered fragile
wyou do not consider that using a deferral mechanism makes it any more
fragile. Interesting. I disagree but I don't think that is
important.

[...]

RC is ONLY as fragile as the programmer who makes use if it.
BTW, would you call GC fragile because a programmer can still leak memory?
In certain scenarios, forgetting to set a reference to NULL is analogous to
forgetting to call free.

Jun 27 '08 #21
"Chris Thomasson" <cr*****@comcast.netwrites:
>RC is ONLY as fragile as the programmer who makes use if it.
Indeed, but for the most part humans write programs and humans make
mistakes. The important thing is whether the mistakes can easily be
found or whether they can be missed and only show up in production.
I'm reasonably confident that anyone who has used RC in a non-trivial
program (i.e. at least measured in tens of thousands of lines of code,
written by multiple people and actively used by third parties) has
some war stories about spending time trying to track down bugs due to
RC errors. I certainly have, but I have no choice but to use RC in
one or more of its forms. That doesn't mean I'm oblivious to its
problems and wouldn't prefer to use GC if it were feasible (memory
constrained environment).

BTW, would you call GC fragile because a programmer can still leak
memory? In certain scenarios, forgetting to set a reference to NULL is
analogous to forgetting to call free.
GC and RC share one failure mode: potential to retain memory that is
semantically free. RC (when used manually and not generated by a
compiler) has the other which is to reclaim too early. Neither mode
is good. But two modes of failure puts RC higher up the fragility
scale than GC.
Jun 27 '08 #22
"Chris Thomasson" <cr*****@comcast.netwrites:
The most important aspect of state-of-the-art reference counting is
the fact that no dependence on any interlocked RMW and/or
memory-barrier instructions are needed at all. Of course, this means
that various MAJOR benefits in the realm of scalability, throughput
and performance can be realized indeed.
Despite the thread subject, Jou's challenge is single threaded thus
RMW, memory barriers, scalability ... etc. aren't an issue.

Proxy GC works very well within certain problem arenas. For instance,
classic reader-writer problem is efficiently solved through clever use
of the technique.
Proxy GC may be fantastic in its area, but unless you think that Jon's
challenge is in this area and are willing to write the code and time
it then Proxy GC is irrelevant.

Of course a GC "can" collect f while bar is executing. The
misleading part is that he fails to explain that "can" is predicated
on the fact that a GC must run a cycle while bar is executing.
He probably failed to explain it because he assumed it was obvious
that the GC would have to run. I wouldn't call it misleading to
assume that someone knows a basic fact about GC.

A small tweak to the program changes things:

{
{
refcount<Foof(new Foo);
foo(f);
}
bar();
}
Now, a naive RC WILL collect f _before_ bar has executed. Of course
Jon said that this fact is totally irrelevant.
The reason being that most people don't write code like that from
scratch, they'd only do it once if/when they'd discovered a problem
that forced them to do it. Jon is selling the idea that if you use GC
then you'd never have to go through the problem discovery stage, the
GC would take care of it for you. Whether anyone is buying depends on
how often they see this kind of problem. It isn't high up on my list.

AFAICT, his insults against Linux and/or Linus are based in a form of
ignorance. Much like his attribution of several problems to _all_
forms of RC. IMVHO, its relevant indeed.
To you, but not to me.
Jun 27 '08 #23
"Stephen J. Bevan" <st*****@dino.dnsalias.comwrote in message
news:87************@dnsalias.com...
"Chris Thomasson" <cr*****@comcast.netwrites:
>The most important aspect of state-of-the-art reference counting is
the fact that no dependence on any interlocked RMW and/or
memory-barrier instructions are needed at all. Of course, this means
that various MAJOR benefits in the realm of scalability, throughput
and performance can be realized indeed.

Despite the thread subject, Jou's challenge is single threaded thus
RMW, memory barriers, scalability ... etc. aren't an issue.

>Proxy GC works very well within certain problem arenas. For instance,
classic reader-writer problem is efficiently solved through clever use
of the technique.

Proxy GC may be fantastic in its area, but unless you think that Jon's
challenge is in this area and are willing to write the code and time
it then Proxy GC is irrelevant.

>Of course a GC "can" collect f while bar is executing. The
misleading part is that he fails to explain that "can" is predicated
on the fact that a GC must run a cycle while bar is executing.

He probably failed to explain it because he assumed it was obvious
that the GC would have to run. I wouldn't call it misleading to
assume that someone knows a basic fact about GC.

>A small tweak to the program changes things:

{
{
refcount<Foof(new Foo);
foo(f);
}
bar();
}
Now, a naive RC WILL collect f _before_ bar has executed. Of course
Jon said that this fact is totally irrelevant.

The reason being that most people don't write code like that from
scratch, they'd only do it once if/when they'd discovered a problem
that forced them to do it. Jon is selling the idea that if you use GC
then you'd never have to go through the problem discovery stage, the
GC would take care of it for you. Whether anyone is buying depends on
how often they see this kind of problem. It isn't high up on my list.

>AFAICT, his insults against Linux and/or Linus are based in a form of
ignorance. Much like his attribution of several problems to _all_
forms of RC. IMVHO, its relevant indeed.

To you, but not to me.

Fair enough.

Jun 27 '08 #24
In article <87************@dnsalias.com>, st*****@dino.dnsalias.com
says...

[ ... ]
GC and RC share one failure mode: potential to retain memory that is
semantically free. RC (when used manually and not generated by a
compiler) has the other which is to reclaim too early. Neither mode
is good. But two modes of failure puts RC higher up the fragility
scale than GC.
How would RC free memory that's still in use, short of a bug in the RC
code itself?

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jun 27 '08 #25
Jerry Coffin <jc*****@taeus.comwrites:
In article <87************@dnsalias.com>, st*****@dino.dnsalias.com
says...

[ ... ]
>GC and RC share one failure mode: potential to retain memory that is
semantically free. RC (when used manually and not generated by a
compiler) has the other which is to reclaim too early. Neither mode
is good. But two modes of failure puts RC higher up the fragility
scale than GC.

How would RC free memory that's still in use, short of a bug in the RC
code itself?
This isn't a RC bug per se rather it is a bug that is inherent to all
manual memory (de)allocation methods: human error in writing the
program such that in some context an extra put/release is called which
causes memory to be freed while another part of the program still
thinks it has a (final) reference to the object and so can access it
safely.
Jun 27 '08 #26

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by dixp | last post: by
47 posts views Thread by mihai | last post: by
16 posts views Thread by Robert Zurer | last post: by
5 posts views Thread by sarge | last post: by
9 posts views Thread by tommy | last post: by
2 posts views Thread by Rich | last post: by
55 posts views Thread by Sam | last post: by
5 posts views Thread by sandy82 | last post: by
2 posts views Thread by Pradnya Patil | last post: by
7 posts views Thread by Ray | last post: by
reply views Thread by Salome Sato | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.