Do you use a garbage collector?

Lloyd Bonafide

I followed a link to James Kanze's web site in another thread and was
surprised to read this comment by a link to a GC:

"I can't imagine writing C++ without it"

How many of you c.l.c++'ers use one, and in what percentage of your
projects is one used? I have never used one in personal or professional
C++ programming. Am I a holdover to days gone by?

Apr 10 '08

Subscribe Post Reply

350

11419

« First
<
4
5
6
7
8
>

Chris Thomasson

"Mirek Fidler" <cx*@ntllib.orgwrote in message
news:5e**********************************@w5g2000p rd.googlegroups.com...

On Apr 13, 7:42 pm, "Chris Thomasson" <cris...@comcast.netwrote:

OFFTOPIC: Chris, I have tried to send you an email concerning AppCore.
Have you got it?

Nope. Fuc%ing Outlook and/or Comcast! I am very sorry about that.

;^(...

Jun 27 '08 #251

Razii

On Mon, 14 Apr 2008 09:20:37 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>Please check this one out:

http://pastebin.com/m3a18a8e1

Time: 2859 ms (g++)
Time: 1968 ms (c++)

Well, it certainly dramatically improved the time, especially on VC9

>WARNING!

The slab_allocator template is NOT build for general purpose. I very quickly
created it for this benchmark only! Also, the code compiles with G++ and
VC++, but on Comeau. This is because of the dlist API.

that was a total of 200 lines :) still 200 ms behind though..

Jun 27 '08 #252

Chris Thomasson

"Chris Thomasson" <cr*****@comcast.netwrote in message
news:t_******************************@comcast.com. ..

"Razii" <DO*************@hotmail.comwrote in message
news:96********************************@4ax.com...
>On Mon, 14 Apr 2008 07:48:12 -0700 (PDT), gpderetta
<gp*******@gmail.comwrote:

>>>No, you are not. Add -DNDEBUG. Also did you measure with -O3?

Yes I tried -O3. There was no difference.

>>>Did you
try to tune -march for your architecture (this can make a *lot* of
difference - or not, depending of the program)?

Why? At least commercial C++ software will have to target the
least-common-denominator processor. The flags we use must target the
least-common-denominator processor. In any case, I added

-march=athlon-xp

there was no change.

Time: 26656 ms

Java version (with the flags I suggested) is at

Time: 1789 ms

14 (or is that 15?) times faster.

Here are my results for the various tests posted here...

Compile flags:

G++: -O3 -fomit-frame-pointer -finline-functions -pedantic -Wall -DNDEBUG
Java: -server -Xms86m -Xmx86m -Xmn85m

The times I get on my old machine (P4 3.06ghz HyperThread):

- My first try, cache_allocator <http://pastebin.com/m2493f289>:

Time: 47704 ms

- My second try, slab_allocator <http://pastebin.com/m3a18a8e1>:

Time: 6969 ms

It seems that VC++ 2005 Express compilers faster code in this narrow case...
I am getting an output of:

Time: 4250 ms
using the following build flags:
Compiler: /O2 /Ob2 /Oi /Ot /Oy /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D
"_UNICODE" /D
"UNICODE" /FD /EHsc /MT /Fo"Release\\" /Fd"Release\vc80.pdb" /W3 /nologo /c
/Wp64 /Zi /Gr /TP /errorReport:prompt
Linker: /OUT:"java_vs_cpp.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST
/MANIFESTFILE:"Release\java_vs_cpp.exe.intermediate .manifest" /DEBUG
/PDB:"java_vs_cpp.pdb" /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /LTCG
/MACHINE:X86 /ERRORREPORT:PROMPT kernel32.lib
That's around a 2 second overall improvement over GCC.

>

- The Java version by Razii http://pastebin.com/f3f559ae2:

Time: 2895 ms

Can anybody else post their timings please?

Thanks.

Jun 27 '08 #253

Chris Thomasson

"Razii" <DO*************@hotmail.comwrote in message
news:ju********************************@4ax.com...

On Mon, 14 Apr 2008 09:20:37 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>>Please check this one out:

http://pastebin.com/m3a18a8e1

Time: 2859 ms (g++)
Time: 1968 ms (c++)

Well, it certainly dramatically improved the time, especially on VC9

>>WARNING!

The slab_allocator template is NOT build for general purpose. I very
quickly
created it for this benchmark only! Also, the code compiles with G++ and
VC++, but on Comeau. This is because of the dlist API.

that was a total of 200 lines :) still 200 ms behind though..

;^)

Java's pretty damn hard to beat sometimes!

Jun 27 '08 #254

Chris Thomasson

"Chris Thomasson" <cr*****@comcast.netwrote in message
news:9b******************************@comcast.com. ..

>
"Razii" <DO*************@hotmail.comwrote in message
news:ju********************************@4ax.com...
>On Mon, 14 Apr 2008 09:20:37 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>>>Please check this one out:

http://pastebin.com/m3a18a8e1

Time: 2859 ms (g++)
Time: 1968 ms (c++)

Well, it certainly dramatically improved the time, especially on VC9

>>>WARNING!

The slab_allocator template is NOT build for general purpose. I very
quickly
created it for this benchmark only! Also, the code compiles with G++ and
VC++, but on Comeau. This is because of the dlist API.

that was a total of 200 lines :) still 200 ms behind though..

Perhaps adding another 200 lines of code will allow me to "close in" on that
200ms gap...

lol!

:^)

>
;^)

Java's pretty damn hard to beat sometimes!

Jun 27 '08 #255

Razii

On Mon, 14 Apr 2008 17:47:35 +0200, "Bo Persson" <bo*@gmb.dkwrote:

>Ok, so for Java you must optmize for the actual test machine? :-))

There is no such java flag that optimizes for test machine. Java code
is compiled to bytecode, not machine code. JIT would compile it at run
time to machine code. JIT compiler knows what processor it is running
on, and can generate code specifically for that processor. It knows
whether the processor is a PIV or athlon, and how big the caches are.
A C++ compiler must target the least-common-denominator processor.

Jun 27 '08 #256

Chris Thomasson

"Razii" <DO*************@hotmail.comwrote in message
news:ju********************************@4ax.com...

On Mon, 14 Apr 2008 09:20:37 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>>Please check this one out:

http://pastebin.com/m3a18a8e1

Time: 2859 ms (g++)
Time: 1968 ms (c++)

Well, it certainly dramatically improved the time, especially on VC9

>>WARNING!

The slab_allocator template is NOT build for general purpose. I very
quickly
created it for this benchmark only! Also, the code compiles with G++ and
VC++, but on Comeau. This is because of the dlist API.

that was a total of 200 lines :) still 200 ms behind though..

How did it do wrt memory consumption?

Jun 27 '08 #257

Razii

On Mon, 14 Apr 2008 09:58:22 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>- The Java version by Razii http://pastebin.com/f3f559ae2:

Time: 2895 ms

How did you run it? What were the flags?

Jun 27 '08 #258

Razii

On Mon, 14 Apr 2008 09:58:22 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>Java: -server -Xms86m -Xmx86m -Xmn85m

Ops -- nervermind.

Jun 27 '08 #259

Bo Persson

Razii wrote:

On Mon, 14 Apr 2008 17:47:35 +0200, "Bo Persson" <bo*@gmb.dkwrote:

>Ok, so for Java you must optmize for the actual test machine? :-))

There is no such java flag that optimizes for test machine. Java
code is compiled to bytecode, not machine code. JIT would compile
it at run time to machine code. JIT compiler knows what processor
it is running on, and can generate code specifically for that
processor. It knows whether the processor is a PIV or athlon, and
how big the caches are. A C++ compiler must target the
least-common-denominator processor.

No, the C++ can also target the appropriate system. You optimize for
the minimum target that is fast enough. On anything bigger or faster,
it just runs even better. Targeting x86 does not mean 386!
JIT doesn't buy you anything here. We have been through this before!
Bo Persson

Jun 27 '08 #260

Razii

On Mon, 14 Apr 2008 10:26:16 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>How did it do wrt memory consumption?

45 MB

Jun 27 '08 #261

Chris Thomasson

"gpderetta" <gp*******@gmail.comwrote in message
news:1e**********************************@c65g2000 hsa.googlegroups.com...

On Apr 14, 4:48 pm, gpderetta <gpdere...@gmail.comwrote:
>On Apr 14, 4:23 am, Razii <DONTwhatever...@hotmail.comwrote:

On Sun, 13 Apr 2008 20:15:44 -0700, "Chris Thomasson"

<cris...@comcast.netwrote:
Are you sure that you have been compiling the C++ code in non-debug
mode?

Yes, I am sure..

g++ -O2 -fomit-frame-pointer -finline-functions "new.cpp" -o "new.exe"

No, you are not. Add -DNDEBUG. Also did you measure with -O3? Did you
try to tune -march for your architecture (this can make a *lot* of
difference - or not, depending of the program)?

BTW, could you benchmark this version:

http://pastebin.com/m16980424

I let it run for about 10 minutes, and it still did not complete. Yikes!

This is nothing a decent C++ programmer would ever write [1], but so
it is the benchmark itself.
Be sure to use -O3 (on my machine, with gcc it is two time faster than
-O2).

[1] my version uses a very simple region allocator, which in some
extreme cases (HPC, embedded devices, benchmarks :) ) might actually
make sense.

Did you try and test this?

Jun 27 '08 #262

Chris Thomasson

"gpderetta" <gp*******@gmail.comwrote in message
news:1e**********************************@c65g2000 hsa.googlegroups.com...

On Apr 14, 4:48 pm, gpderetta <gpdere...@gmail.comwrote:
>On Apr 14, 4:23 am, Razii <DONTwhatever...@hotmail.comwrote:

On Sun, 13 Apr 2008 20:15:44 -0700, "Chris Thomasson"

<cris...@comcast.netwrote:
Are you sure that you have been compiling the C++ code in non-debug
mode?

Yes, I am sure..

g++ -O2 -fomit-frame-pointer -finline-functions "new.cpp" -o "new.exe"

No, you are not. Add -DNDEBUG. Also did you measure with -O3? Did you
try to tune -march for your architecture (this can make a *lot* of
difference - or not, depending of the program)?

BTW, could you benchmark this version:

http://pastebin.com/m16980424

You have a bug. I fixed it. Here is the code:
http://pastebin.com/m67c0ee86
The bug was in line 17:

static Tree * heap[HEAP_SIZE];
You need to define it as:
static Tree heap[HEAP_SIZE];
:^)

I am getting:

Time 1935 ms
IMHO, there is one major flaw... This is not an example of dynamic memory.
This is basically static memory. Any thoughts?

This is nothing a decent C++ programmer would ever write [1], but so
it is the benchmark itself.
Be sure to use -O3 (on my machine, with gcc it is two time faster than
-O2).

[1] my version uses a very simple region allocator, which in some
extreme cases (HPC, embedded devices, benchmarks :) ) might actually
make sense.

Jun 27 '08 #263

Chris Thomasson

"Razii" <DO*************@hotmail.comwrote in message
news:s4********************************@4ax.com...

On Mon, 14 Apr 2008 10:26:16 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>>How did it do wrt memory consumption?

45 MB

Humm... That's fairly nice improvement over the 60MB that my first try
yielded...

Jun 27 '08 #264

gpderetta

On Apr 14, 6:37 pm, "Chris Thomasson" <cris...@comcast.netwrote:

"gpderetta" <gpdere...@gmail.comwrote in message

news:1e**********************************@c65g2000 hsa.googlegroups.com...

On Apr 14, 4:48 pm, gpderetta <gpdere...@gmail.comwrote:
On Apr 14, 4:23 am, Razii <DONTwhatever...@hotmail.comwrote:

On Sun, 13 Apr 2008 20:15:44 -0700, "Chris Thomasson"

<cris...@comcast.netwrote:
>Are you sure that you have been compiling the C++ code in non-debug
>mode?

Yes, I am sure..

g++ -O2 -fomit-frame-pointer -finline-functions "new.cpp" -o "new.exe"

No, you are not. Add -DNDEBUG. Also did you measure with -O3? Did you
try to tune -march for your architecture (this can make a *lot* of
difference - or not, depending of the program)?

BTW, could you benchmark this version:

http://pastebin.com/m16980424

This is nothing a decent C++ programmer would ever write [1], but so
it is the benchmark itself.
Be sure to use -O3 (on my machine, with gcc it is two time faster than
-O2).

[1] my version uses a very simple region allocator, which in some
extreme cases (HPC, embedded devices, benchmarks :) ) might actually
make sense.

On my old machine (P4 3.06 HyperThread)

This version <http://pastebin.com/m16980424outputs:

I literally waited for about two minutes, and finally hit Ctrl-C... There is
something wrong. I have not studided your code yet.

A typo: the 'heap' arrays should have type 'Tree', not 'Tree*' (I was
playing with dynamically allocating the heap and forgot the '*' when I
reverted the change back).

try:
http://pastebin.com/m3a18a8e1

And my newest version <http://pastebin.com/m3a18a8e1outputs:

Time: 7015 ms

What times are you getting?

On my laptop (pentium M-2.0gz).

Time: 670 ms

I think that this version is memory bandwidth limited.

If you really want to see how fast a smart compiler could optimize
this program,
refactor the call to DestroyTree(CreateTree()) to another function and
mark it with __attribute__((pure)) (which is legal as the function has
no side effect). In my tests it prints Time: 0. This in practice shows
how useless is this benchmark.

--
gpd

Jun 27 '08 #265

gpderetta

On Apr 14, 8:17 pm, "Chris Thomasson" <cris...@comcast.netwrote:

"gpderetta" <gpdere...@gmail.comwrote in message

news:1e**********************************@c65g2000 hsa.googlegroups.com...

On Apr 14, 4:48 pm, gpderetta <gpdere...@gmail.comwrote:
On Apr 14, 4:23 am, Razii <DONTwhatever...@hotmail.comwrote:

On Sun, 13 Apr 2008 20:15:44 -0700, "Chris Thomasson"

<cris...@comcast.netwrote:
>Are you sure that you have been compiling the C++ code in non-debug
>mode?

Yes, I am sure..

g++ -O2 -fomit-frame-pointer -finline-functions "new.cpp" -o "new.exe"

No, you are not. Add -DNDEBUG. Also did you measure with -O3? Did you
try to tune -march for your architecture (this can make a *lot* of
difference - or not, depending of the program)?

BTW, could you benchmark this version:

http://pastebin.com/m16980424

You have a bug. I fixed it. Here is the code:

http://pastebin.com/m67c0ee86

The bug was in line 17:

static Tree * heap[HEAP_SIZE];

You need to define it as:

static Tree heap[HEAP_SIZE];

Ah, ok, you found the bug :)
Sorry for posting a buggy program, by chance that version worked fine
here. :)

Fortunately I'm in good company here at posting buggy programs on the
first try ;)

>
:^)

I am getting:

Time 1935 ms

IMHO, there is one major flaw... This is not an example of dynamic memory.
This is basically static memory. Any thoughts?

So what? What's get the job done is fine for me. :)
I wouldn't be surprised if the java compiler was transforming the code
in something similar to my program.
Usually c++ compilers are bad at optimizing away allocation: for
example gcc thread malloc as having side effects and won't optimize
call to it.

Region allocation is still dynamic memory allocation (albeit a very
simple one).
It can be only used on limited circumstances (this test being a very
good one), but when it works, it works beautifully.
There are general purposes allocators that use regions when they can.
You probably know about 'reaps'

At most, what this test can show, is that C++ is bad at being Java
(which is what one would have expected).

--
gpd

Jun 27 '08 #266

Razii

On Mon, 14 Apr 2008 11:17:26 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>IMHO, there is one major flaw... This is not an example of dynamic memory.
This is basically static memory. Any thoughts?

Yes, this not dynamic memory. Ian Collins played the same trick. If
you change CreateTree(22)to CreateTree(23) (needs well over 100 MB),
the program will break.

By the way, I can do the same in Java too by Object pooling. Basically
make a big ol' List of Test objects, and then pick one from the list

Jun 27 '08 #267

Chris Thomasson

"gpderetta" <gp*******@gmail.comwrote in message
news:02**********************************@b5g2000p ri.googlegroups.com...

On Apr 14, 6:37 pm, "Chris Thomasson" <cris...@comcast.netwrote:
>"gpderetta" <gpdere...@gmail.comwrote in message

news:1e**********************************@c65g200 0hsa.googlegroups.com...

On Apr 14, 4:48 pm, gpderetta <gpdere...@gmail.comwrote:
On Apr 14, 4:23 am, Razii <DONTwhatever...@hotmail.comwrote:

On Sun, 13 Apr 2008 20:15:44 -0700, "Chris Thomasson"

<cris...@comcast.netwrote:
Are you sure that you have been compiling the C++ code in non-debug
mode?

Yes, I am sure..

g++ -O2 -fomit-frame-pointer -finline-functions "new.cpp" -o
"new.exe"

>No, you are not. Add -DNDEBUG. Also did you measure with -O3? Did you
try to tune -march for your architecture (this can make a *lot* of
difference - or not, depending of the program)?

BTW, could you benchmark this version:

>http://pastebin.com/m16980424

This is nothing a decent C++ programmer would ever write [1], but so
it is the benchmark itself.
Be sure to use -O3 (on my machine, with gcc it is two time faster than
-O2).

[1] my version uses a very simple region allocator, which in some
extreme cases (HPC, embedded devices, benchmarks :) ) might actually
make sense.

On my old machine (P4 3.06 HyperThread)

This version <http://pastebin.com/m16980424outputs:

I literally waited for about two minutes, and finally hit Ctrl-C... There
is
something wrong. I have not studided your code yet.

A typo: the 'heap' arrays should have type 'Tree', not 'Tree*' (I was
playing with dynamically allocating the heap and forgot the '*' when I
reverted the change back).

try:
http://pastebin.com/m3a18a8e1

>And my newest version <http://pastebin.com/m3a18a8e1outputs:

Time: 7015 ms

What times are you getting?

On my laptop (pentium M-2.0gz).

Time: 670 ms

I think that this version is memory bandwidth limited.

Are you referring to my version here:

http://pastebin.com/m3a18a8e1

If so, why do you think that its "memory bandwidth limited"? The code is
ad-hoc, I typed it out and tested it all in about 10 minutes, and it only
really works correctly within this benchmark. It can be improved so much.

Jun 27 '08 #268

Razii

On Mon, 14 Apr 2008 10:48:39 -0700 (PDT), gpderetta
<gp*******@gmail.comwrote:

>So what? What's get the job done is fine for me. :)

No, it doesn't. It beaks if CreeateTree number is changed. That's not
dynamic memory. What you did was can be done in Java too with object
pooling.

Jun 27 '08 #269

gpderetta

On Apr 14, 7:48 pm, Razii <DONTwhatever...@hotmail.comwrote:

On Mon, 14 Apr 2008 11:17:26 -0700, "Chris Thomasson"

<cris...@comcast.netwrote:
IMHO, there is one major flaw... This is not an example of dynamic memory.
This is basically static memory. Any thoughts?

Yes, this not dynamic memory.

Yes it is. Google for region allocator.

Ian Collins played the same trick. If
you change CreateTree(22)to CreateTree(23) (needs well over 100 MB),
the program will break.

Add the appropriate -DHEAP_SIZE=n on the command line.

>
By the way, I can do the same in Java too by Object pooling. Basically
make a big ol' List of Test objects, and then pick one from the list

Of course. The point being if you really are allocation limited (which
is *very* rarely the case, especially in C++),
you are better off doing hand optimizations, instead of relying on the
compiler to do it.

BTW, could you benchmark a Java version using the same trick?

--
gpd

Jun 27 '08 #270

gpderetta

On Apr 14, 8:52 pm, "Chris Thomasson" <cris...@comcast.netwrote:

"gpderetta" <gpdere...@gmail.comwrote in message

news:02**********************************@b5g2000p ri.googlegroups.com...

On Apr 14, 6:37 pm, "Chris Thomasson" <cris...@comcast.netwrote:
"gpderetta" <gpdere...@gmail.comwrote in message

>news:1e**********************************@c65g200 0hsa.googlegroups.com...

On Apr 14, 4:48 pm, gpderetta <gpdere...@gmail.comwrote:
On Apr 14, 4:23 am, Razii <DONTwhatever...@hotmail.comwrote:

On Sun, 13 Apr 2008 20:15:44 -0700, "Chris Thomasson"

<cris...@comcast.netwrote:
>Are you sure that you have been compiling the C++ code in non-debug
>mode?

Yes, I am sure..

g++ -O2 -fomit-frame-pointer -finline-functions "new.cpp" -o
"new.exe"

No, you are not. Add -DNDEBUG. Also did you measure with -O3? Did you
try to tune -march for your architecture (this can make a *lot* of
difference - or not, depending of the program)?

BTW, could you benchmark this version:

http://pastebin.com/m16980424

This is nothing a decent C++ programmer would ever write [1], but so
it is the benchmark itself.
Be sure to use -O3 (on my machine, with gcc it is two time faster than
-O2).

[1] my version uses a very simple region allocator, which in some
extreme cases (HPC, embedded devices, benchmarks :) ) might actually
make sense.

On my old machine (P4 3.06 HyperThread)

This version <http://pastebin.com/m16980424outputs:

I literally waited for about two minutes, and finally hit Ctrl-C... There
is
something wrong. I have not studided your code yet.

A typo: the 'heap' arrays should have type 'Tree', not 'Tree*' (I was
playing with dynamically allocating the heap and forgot the '*' when I
reverted the change back).

try:
http://pastebin.com/m3a18a8e1

And my newest version <http://pastebin.com/m3a18a8e1outputs:

Time: 7015 ms

What times are you getting?

On my laptop (pentium M-2.0gz).

Time: 670 ms

I think that this version is memory bandwidth limited.

Are you referring to my version here:

http://pastebin.com/m3a18a8e1

If so, why do you think that its "memory bandwidth limited"? The code is
ad-hoc, I typed it out and tested it all in about 10 minutes, and it only
really works correctly within this benchmark. It can be improved so much.

No, to my version. It basically it is just filling memory with random
bytes and incrementing a variable.
It *has* to be memory bandwidth limited :)

--
gpd

Jun 27 '08 #271

Chris Thomasson

"gpderetta" <gp*******@gmail.comwrote in message
news:ee**********************************@v26g2000 prm.googlegroups.com...

On Apr 14, 8:17 pm, "Chris Thomasson" <cris...@comcast.netwrote:
>"gpderetta" <gpdere...@gmail.comwrote in message

news:1e**********************************@c65g200 0hsa.googlegroups.com...

On Apr 14, 4:48 pm, gpderetta <gpdere...@gmail.comwrote:
On Apr 14, 4:23 am, Razii <DONTwhatever...@hotmail.comwrote:

On Sun, 13 Apr 2008 20:15:44 -0700, "Chris Thomasson"

<cris...@comcast.netwrote:
Are you sure that you have been compiling the C++ code in non-debug
mode?

Yes, I am sure..

g++ -O2 -fomit-frame-pointer -finline-functions "new.cpp" -o
"new.exe"

>No, you are not. Add -DNDEBUG. Also did you measure with -O3? Did you
try to tune -march for your architecture (this can make a *lot* of
difference - or not, depending of the program)?

BTW, could you benchmark this version:

>http://pastebin.com/m16980424

You have a bug. I fixed it. Here is the code:

http://pastebin.com/m67c0ee86

The bug was in line 17:

static Tree * heap[HEAP_SIZE];

You need to define it as:

static Tree heap[HEAP_SIZE];

Ah, ok, you found the bug :)
Sorry for posting a buggy program, by chance that version worked fine
here. :)

Fortunately I'm in good company here at posting buggy programs on the
first try ;)

:^D

>:^)

I am getting:

Time 1935 ms

IMHO, there is one major flaw... This is not an example of dynamic
memory.
This is basically static memory. Any thoughts?

So what? What's get the job done is fine for me. :)

Well, IMVHO, I would expect a dynamic memory test to at least free some
memory here and there... You can't really free any memory when you define it
as:

static char buf[5000000];

Some might say that this could possibly be a form of "cheating" within the
narrow scope of a benchmark that deals with "dynamic" memory. That's just my
personal opinion of course.

I wouldn't be surprised if the java compiler was transforming the code
in something similar to my program.

Who knows. ;^)

Usually c++ compilers are bad at optimizing away allocation: for
example gcc thread malloc as having side effects and won't optimize
call to it.

There is the __attribute__ ((malloc))... Anyway, I do agree that C++ does
not have that many opportunities to optimize calls into malloc.

Region allocation is still dynamic memory allocation (albeit a very
simple one).

You can certainly implement a dynamic region allocator. However, IMVHO, I
would classify your benchmark code as a static region allocator. Your
basically simulating dynamic memory. I would expect that a dynamic version
could allocate and free multiple regions to/from the underlying allocator
(e.g., malloc/free) or OS (e.g., mmap/unmap, VirtualAlloc/Free).

It can be only used on limited circumstances (this test being a very
good one), but when it works, it works beautifully.

The only problem I have with region allocators is that you need to find a
good enough granularity. The code you posted is extremely coarse.

There are general purposes allocators that use regions when they can.
You probably know about 'reaps'

Yeah. There is certainly nothing wrong with hybrid approaches. I have done
several. In my commercial vZOOM library I make use of every trick I can
think of. Here is some VERY brief info:

http://groups.google.com/group/comp....c825ec9999d3a8

I have implemented several plug-ins for this that scale extremely well.

At most, what this test can show, is that C++ is bad at being Java
(which is what one would have expected).

Indeed.

Jun 27 '08 #272

Chris Thomasson

"Chris Thomasson" <cr*****@comcast.netwrote in message
news:WJ******************************@comcast.com. ..

"gpderetta" <gp*******@gmail.comwrote in message
news:c0**********************************@m36g2000 hse.googlegroups.com...
>On Apr 14, 7:48 pm, Razii <DONTwhatever...@hotmail.comwrote:
>>On Mon, 14 Apr 2008 11:17:26 -0700, "Chris Thomasson"

<cris...@comcast.netwrote:
IMHO, there is one major flaw... This is not an example of dynamic
memory.
This is basically static memory. Any thoughts?

Yes, this not dynamic memory.

Yes it is. Google for region allocator.

>>Ian Collins played the same trick. If
you change CreateTree(22)to CreateTree(23) (needs well over 100 MB),
the program will break.

Add the appropriate -DHEAP_SIZE=n on the command line.

[...]

>

>>By the way, I can do the same in Java too by Object pooling. Basically
make a big ol' List of Test objects, and then pick one from the list

Of course. The point being if you really are allocation limited (which
is *very* rarely the case, especially in C++),
you are better off doing hand optimizations, instead of relying on the
compiler to do it.

BTW, could you benchmark a Java version using the same trick?

[...]

Whoops! Sorry for quoting your sig in the last post!

;^(

Jun 27 '08 #273

Chris Thomasson

"Chris Thomasson" <cr*****@comcast.netwrote in message
news:Vv******************************@comcast.com. ..

"Mirek Fidler" <cx*@ntllib.orgwrote in message
news:5e**********************************@w5g2000p rd.googlegroups.com...
>On Apr 13, 7:42 pm, "Chris Thomasson" <cris...@comcast.netwrote:

OFFTOPIC: Chris, I have tried to send you an email concerning AppCore.
Have you got it?

Nope. Fuc%ing Outlook and/or Comcast! I am very sorry about that.

;^(...

I just received it, and send a response. Thanks.

Jun 27 '08 #274

Razii

On Mon, 14 Apr 2008 10:36:04 -0700 (PDT), gpderetta
<gp*******@gmail.comwrote:

>This in practice shows
how useless is this benchmark.

The benchmark is not useless. You failed to meet the requirement that
objects must be created dynamically. Your version doesn't. If
CreateTree(5) .. your version is wasting memory. If CreateTree(23),
your versions breaks down.

That's not dynamic memory.

Jun 27 '08 #275

Razii

On Mon, 14 Apr 2008 19:14:04 -0500, Razii
<DO*************@hotmail.comwrote:

>java -server -Xmx1024m -Xms1024m -XX:NewRatio=1 Test 24
Time: 7515 ms

Arggh, with n = 25 the C++ version was a zillion times faster. The
problem of course is GC minor collection runs which in the case are
not needed (sine all objects in this app are temporary). There are
many GC flags I don't know all of them. However, Ignore the above. To
get the best result in this benchmark, the flags must be

Java -server -Xmx1024m -Xms1024m -Xmn1023m Test n

where...

-Xmx = max memory available on the comp
-Xms = max memory available on the comp
-Xmn = 1 minus max memory available on the comp

Jun 27 '08 #276

Razii

On Mon, 14 Apr 2008 19:42:35 -0500, Razii
<DO*************@hotmail.comwrote:

>Arggh, with n = 25 the C++ version was a zillion times faster. The
problem of course is GC minor collection runs which in the case are
not needed (sine all objects in this app are temporary). There are
many GC flags I don't know all of them. However, Ignore the above. To
get the best result in this benchmark, the flags must be

Java -server -Xmx1024m -Xms1024m -Xmn1023m Test n

With these flags..

Java -server -Xmx1024m -Xms1024m -Xmn1023m Test 25
Time: 13593 ms

new 25
Time: 20671 ms

My comp doesn't have enough ram to try n = 26 without HDD activity.

Jun 27 '08 #277

Mirek Fidler

On Apr 15, 1:24 am, Razii <DONTwhatever...@hotmail.comwrote:

On Mon, 14 Apr 2008 10:36:04 -0700 (PDT), gpderetta

<gpdere...@gmail.comwrote:
This in practice shows
how useless is this benchmark.

The benchmark is not useless. You failed to meet the requirement that
objects must be created dynamically. Your version doesn't. If
CreateTree(5) .. your version is wasting memory. If CreateTree(23),
your versions breaks down.

That's not dynamic memory.

Well, now you see, this is nearly the same problem as with your Java
flags...

Mirek

Jun 27 '08 #278

gpderetta

On Apr 14, 9:11 pm, "Chris Thomasson" <cris...@comcast.netwrote:

"gpderetta" <gpdere...@gmail.comwrote in message

I am getting:

Time 1935 ms

IMHO, there is one major flaw... This is not an example of dynamic
memory.
This is basically static memory. Any thoughts?

So what? What's get the job done is fine for me. :)

Well, IMVHO, I would expect a dynamic memory test to at least free some
memory here and there...

In a dynamic memory test yes, maybe. But this test is not it. This
test is just showing how fast is a language at doing nothing.
A real test would do something that must be computed at runtime (for
example by reading an input file containing the specific create/delete
operations).

You can't really free any memory when you define it
as:

static char buf[5000000];

Some might say that this could possibly be a form of "cheating" within the
narrow scope of a benchmark that deals with "dynamic" memory.

As James Kanze said, never trust a benchmark you didn't rig
yourself :).
I'll have to try this benchmark with LLVM to see if it can figure out
that it does nothing even without __attribute__((pure)).

>
I wouldn't be surprised if the java compiler was transforming the code
in something similar to my program.

Who knows. ;^)

Usually c++ compilers are bad at optimizing away allocation: for
example gcc thread malloc as having side effects and won't optimize
call to it.

There is the __attribute__ ((malloc))

Unfortunately this simply tells gcc that the result doesn't alias with
any other pointer.
GCC is, by desing, incapable of removing call to mallocs. The
mantainers believe, as you can legally replace the standard malloc,
that every malloc invocation is a visible side effect and cannot be
safely removed. If gcc could do link time optimiations, it could of
course detect if the standard malloc was being overridden.

... Anyway, I do agree that C++ does
not have that many opportunities to optimize calls into malloc.

Actually it is just that current implementations refrain from doing
that. The standard certainly allows those optimizations, if they
cannot be detected (as-if rule).

>
Region allocation is still dynamic memory allocation (albeit a very
simple one).

You can certainly implement a dynamic region allocator. However, IMVHO, I
would classify your benchmark code as a static region allocator.

Yes that's true. I could do dynamic region allocation. It would
probably run a little slower than the current program, but it could
make Razii happy. Unfortunately nobody is paying me to do that :)

Your
basically simulating dynamic memory. I would expect that a dynamic version
could allocate and free multiple regions to/from the underlying allocator
(e.g., malloc/free) or OS (e.g., mmap/unmap, VirtualAlloc/Free).

I do not think that doing any kind of syscall has anything to do with
memory allocation.
Using a syscall instead of a preallocating a static buffer might means
that your program could be a good OS a good citizen,
but for specific high performance applications, you do not care about
that.

>
It can be only used on limited circumstances (this test being a very
good one), but when it works, it works beautifully.

The only problem I have with region allocators is that you need to find a
good enough granularity. The code you posted is extremely coarse.

Sure, I spent 10 minutes to write it :) What would you expect? As I
said, is it nothing that any sane C++ programmer would do in practice.

--
gpd

Jun 27 '08 #279

Razii

On Mon, 14 Apr 2008 21:57:00 -0700 (PDT), Mirek Fidler
<cx*@ntllib.orgwrote:

>Well, now you see, this is nearly the same problem as with your Java
flags...

No, it's not. The following flags would always produce the best
result with *this* benchmark (only restricted by total RAM on the
computer), regardless of what the user enters as "n"

(for my comp with 1024m total memory).

Java -server -Xmx1024m -Xms1024m -Xmn1023m Test n

-Xmx = max memory available
-Xms = max memory available
-Xmn = 1 minus Xms

In fact, if the application creates mostly temporary short lived
objects, it will help if Xmx minus Xmn is a small number. If the
application creates long term older-living objects, then Xms minus Xmn
should be increased. Also, for most applications, Xmx should be lower
than total memory available (default Xmx is 64m).

There are many more flags, but all this proves is that the developer
has a great amount of control on GC. They can tweak it to fit the
needs of a particular application. They can change how GC behaves.

XX:-UseParallelGC
-XX:-UseParallelOldGC
-XX:-UseConcMarkSweepGC
-XX:+ScavengeBeforeFullGC
-XX:-DisableExplicitGC (ignores System.gc() in the application).
-XX:+UseGCOverheadLimit
-XX:-UseSerialGC

and many more...

Jun 27 '08 #280

Razii

On Tue, 15 Apr 2008 00:24:26 -0700 (PDT), gpderetta
<gp*******@gmail.comwrote:

>Exactly my point :).

And btw, you can make N up to the max virtual memory size on your
machine. No need to fine tune for the exact N of your program.
Also, no memory wasted, any modern (read: written in the last 30
years) will fault in memory on demand, so you waste 0 bytes (modulo
the page granularity).

Well, that's not true for your this version since you overloaded
delete and the memory is never released.

Let's say you make N max memory. The user enters CreateTree(25). The
memory goes all the up to whatever 400 MB in your case. After the for
loop ends, the application must perform some different kind of
calculation. However, in your version the 400 MB memory will never go
down. In a sense, you are leaking memory. Isn't that right for the
version that you have right now?

How about if I change the benchmark and have two Tree classes? Tree1
and Tree2. Once the first for loops ends with Tree1, the second for
loop with Tree2 starts. However, your version is not releasing memory
from the first for loop. It's leaking memory.

How about you fix that problem first?

Jun 27 '08 #281

Razii

On Mon, 14 Apr 2008 12:11:12 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>
Well, IMVHO, I would expect a dynamic memory test to at least free some
memory here and there... You can't really free any memory when you define it
as:

Yes. Let's say if there are two classes Tree1 and Tree2, and two for
loops (one loop for Tree1 and the other with Tree2), then his version
is leaking memory because the memory is never freed from loop1.

His version is basically leaking memory but he gets away because the
"benchmark" ends after the first for loop.

Jun 27 '08 #282

gpderetta

On Apr 15, 10:53 am, Razii <DONTwhatever...@hotmail.comwrote:

On Tue, 15 Apr 2008 00:24:26 -0700 (PDT), gpderetta

<gpdere...@gmail.comwrote:
Exactly my point :).

And btw, you can make N up to the max virtual memory size on your
machine. No need to fine tune for the exact N of your program.
Also, no memory wasted, any modern (read: written in the last 30
years) will fault in memory on demand, so you waste 0 bytes (modulo
the page granularity).

Well, that's not true for your this version since you overloaded
delete and the memory is never released.

If it is never aquired in the first place (i.e. faulted in), it is not
a problem.

>
Let's say you make N max memory. The user enters CreateTree(25). The
memory goes all the up to whatever 400 MB in your case. After the for
loop ends, the application must perform some different kind of
calculation. However, in your version the 400 MB memory will never go
down.

If it is faulted in once, and then never reused, it is not a problem
either, it will move to swap and the OS just forget about it.

In a sense, you are leaking memory.

The leak is 'bounded'. The test program will never run out of memory.

Isn't that right for the
version that you have right now?

How about if I change the benchmark and have two Tree classes? Tree1
and Tree2.
Once the first for loops ends with Tree1, the second for
loop with Tree2 starts.

You would use a typeless region allocator: the two tree types can
reuse the same heap region. Anyways, this is another test. there is no
way to win a constest if the rules keep changing.

However, your version is not releasing memory
from the first for loop.

yes, it is: first_free = 0 does exactly that.

It's leaking memory.

How it could, if itsn't calling malloc in the first place?

>
How about you fix that problem first?

In real life I wouldn't have this problem in the first place.

--
gpd

Jun 27 '08 #283

Razii

On Tue, 15 Apr 2008 02:04:20 -0700 (PDT), gpderetta
<gp*******@gmail.comwrote:

>If it is faulted in once, and then never reused, it is not a problem
either, it will move to swap and the OS just forget about it.

>The leak is 'bounded'. The test program will never run out of memory.

Would OS swap it even if your program needs more memory? How would you
do tree classes using the same method?

Jun 27 '08 #284

Razii

On Tue, 15 Apr 2008 04:45:12 -0500, Razii
<DO*************@hotmail.comwrote:

>Would OS swap it even if your program needs more memory? How would you
do tree classes using the same method?

I meant two tree classes

Jun 27 '08 #285

gpderetta

On Apr 15, 12:07 pm, Razii <DONTwhatever...@hotmail.comwrote:

On Tue, 15 Apr 2008 04:45:12 -0500, Razii

<DONTwhatever...@hotmail.comwrote:
Would OS swap it even if your program needs more memory? How would you
do tree classes using the same method?

I meant two tree classes

Use an untyped allocator.

--
gpd

Jun 27 '08 #286

Matthias Buelow

Jerry Coffin wrote:

Offhand, I can't
think of any other language I've used that has nearly as good of support
for domain-specific languages as C++.

Well, I've been using Lisp in the past, maybe you haven't. C++ templates
vs. Lisp macros? Not really a comparison.
Also, correct me if I'm wrong, but the Boost MPL (and similar libraries)
seem to be strictly compile-time extensions. What if I need it at
runtime? That's maybe the biggest problem with languages like C++: Once
the stuff is compiled, the language is gone.

Jun 27 '08 #287

Chris Thomasson

"gpderetta" <gp*******@gmail.comwrote in message
news:e7**********************************@m3g2000h sc.googlegroups.com...

On Apr 14, 9:11 pm, "Chris Thomasson" <cris...@comcast.netwrote:
>"gpderetta" <gpdere...@gmail.comwrote in message

>I am getting:

>Time 1935 ms

>IMHO, there is one major flaw... This is not an example of dynamic
memory.
This is basically static memory. Any thoughts?

So what? What's get the job done is fine for me. :)

Well, IMVHO, I would expect a dynamic memory test to at least free some
memory here and there...

In a dynamic memory test yes, maybe. But this test is not it. This
test is just showing how fast is a language at doing nothing.
A real test would do something that must be computed at runtime (for
example by reading an input file containing the specific create/delete
operations).

Fair enough.

>You can't really free any memory when you define it
as:

static char buf[5000000];

Some might say that this could possibly be a form of "cheating" within
the
narrow scope of a benchmark that deals with "dynamic" memory.

As James Kanze said, never trust a benchmark you didn't rig
yourself :).
I'll have to try this benchmark with LLVM to see if it can figure out
that it does nothing even without __attribute__((pure)).

;^)

I wouldn't be surprised if the java compiler was transforming the code
in something similar to my program.

Who knows. ;^)

Usually c++ compilers are bad at optimizing away allocation: for
example gcc thread malloc as having side effects and won't optimize
call to it.

There is the __attribute__ ((malloc))

Unfortunately this simply tells gcc that the result doesn't alias with
any other pointer.
GCC is, by desing, incapable of removing call to mallocs. The
mantainers believe, as you can legally replace the standard malloc,
that every malloc invocation is a visible side effect and cannot be
safely removed. If gcc could do link time optimiations, it could of
course detect if the standard malloc was being overridden.

>... Anyway, I do agree that C++ does
not have that many opportunities to optimize calls into malloc.

Actually it is just that current implementations refrain from doing
that. The standard certainly allows those optimizations, if they
cannot be detected (as-if rule).

Agreed.

Region allocation is still dynamic memory allocation (albeit a very
simple one).

You can certainly implement a dynamic region allocator. However, IMVHO, I
would classify your benchmark code as a static region allocator.

Yes that's true. I could do dynamic region allocation. It would
probably run a little slower than the current program, but it could
make Razii happy.

You could keep the existing simplistic setup, and add a dynamic model to
take care of overflows... In other words:
__________________________________________________ __________
void* Tree::operator new(size_t sz) {
if (first_free < HEAP_SIZE) {
return heap + first_free++;
}
return std::malloc(sz);
}

void Tree::operator delete(void* ptr) {
if (! PtrInStaticHeap(ptr)) {
std::free(ptr);
}
}
__________________________________________________ __________
That would be very simple to implement indeed. You could also use something
like the naive slab_allocator I quickly whipped up instead of malloc/free.
Humm, I might just do that. It would simply have to satisfy Razii.

Unfortunately nobody is paying me to do that :)

No shi%! ;^)

>Your
basically simulating dynamic memory. I would expect that a dynamic
version
could allocate and free multiple regions to/from the underlying allocator
(e.g., malloc/free) or OS (e.g., mmap/unmap, VirtualAlloc/Free).

I do not think that doing any kind of syscall has anything to do with
memory allocation.

True. However, they can be of service of so-called general purpose
allocators which need to deal with "undefined" usage-patterns.

Using a syscall instead of a preallocating a static buffer might means
that your program could be a good OS a good citizen,
but for specific high performance applications, you do not care about
that.

When you have a good idea on what the allocator will be used for, yes I
agree. On the other hand, if you are writing something that will be used by
the masses, well, that's another story.

It can be only used on limited circumstances (this test being a very
good one), but when it works, it works beautifully.

The only problem I have with region allocators is that you need to find a
good enough granularity. The code you posted is extremely coarse.

Sure, I spent 10 minutes to write it :) What would you expect?

Yeah. I spent about the same amount of time creating that little
slab_allocator thing. Its pretty crappy, but it seems to do fairly well
against Java; within this narrow "benchmark" of course. I think I will just
go ahead and augment it with a static heap and banish its usage to the
slow-path. Just like in the quick example I showed in this post. It should
work fine.

As I said, is it nothing that any sane C++ programmer would do in
practice.

lol. ;^)

Jun 27 '08 #288

Chris Thomasson

"Razii" <DO*************@hotmail.comwrote in message
news:h4********************************@4ax.com...

On Mon, 14 Apr 2008 19:14:04 -0500, Razii
<DO*************@hotmail.comwrote:

>>java -server -Xmx1024m -Xms1024m -XX:NewRatio=1 Test 24
Time: 7515 ms

Arggh, with n = 25 the C++ version was a zillion times faster. The
problem of course is GC minor collection runs which in the case are
not needed (sine all objects in this app are temporary). There are
many GC flags I don't know all of them. However, Ignore the above. To
get the best result in this benchmark, the flags must be

Java -server -Xmx1024m -Xms1024m -Xmn1023m Test n

where...

-Xmx = max memory available on the comp
-Xms = max memory available on the comp
-Xmn = 1 minus max memory available on the comp

I am going to give one more quick try... The third time the charm right?
lol. ;^)

Jun 27 '08 #289

Chris Thomasson

"Chris Thomasson" <cr*****@comcast.netwrote in message
news:HY******************************@comcast.com. ..

"Razii" <DO*************@hotmail.comwrote in message
news:h4********************************@4ax.com...
>On Mon, 14 Apr 2008 19:14:04 -0500, Razii
<DO*************@hotmail.comwrote:

>>>java -server -Xmx1024m -Xms1024m -XX:NewRatio=1 Test 24
Time: 7515 ms

Arggh, with n = 25 the C++ version was a zillion times faster. The
problem of course is GC minor collection runs which in the case are
not needed (sine all objects in this app are temporary). There are
many GC flags I don't know all of them. However, Ignore the above. To
get the best result in this benchmark, the flags must be

Java -server -Xmx1024m -Xms1024m -Xmn1023m Test n

where...

-Xmx = max memory available on the comp
-Xms = max memory available on the comp
-Xmn = 1 minus max memory available on the comp

I am going to give one more quick try... The third time the charm right?
lol. ;^)

Here is my third version:

http://pastebin.com/m303e8cab
Here are results on a P4 3.06ghz HyperThread w/ 512mb of RAM
Build Flags
___________________________________________
java: -server -Xmx512m -Xms512m -XX:NewRatio=1

vcpp: /O2 /Ob2 /Oi /Ot /Oy /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D
"_UNICODE" /D "UNICODE" /FD /EHsc /MT /Fo"Release\\" /Fd"Release\vc80.pdb"
/W3 /nologo /c /Wp64 /Zi /Gr /TP /errorReport:prompt

Results
___________________________________________
java: Test 22 / Time: 2375 ms
vcpp: Test 22 / Time: 1468 ms
-------------------------------------------
java: Test 23 / Time: 3594 ms
vcpp: Test 23 / Time: 2362 ms
-------------------------------------------
java: Test 24 / Time: 33141 ms
vcpp: Test 24 / Time: 9390 ms
-------------------------------------------
java: Test 25 / CRTL-C <took longer than 1 min>
vcpp: Test 25 / Time: 26390 ms
-------------------------------------------
java: Test 26 / CRTL-C <took longer than 1 mins>
vcpp: Test 26 / CRTL-C <took longer than 1 mins>

Can everybody please post the times you get?
Thanks a lot.

Jun 27 '08 #290

Chris Thomasson

"Chris Thomasson" <cr*****@comcast.netwrote in message
news:i6******************************@comcast.com. ..

"Chris Thomasson" <cr*****@comcast.netwrote in message
news:HY******************************@comcast.com. ..
>"Razii" <DO*************@hotmail.comwrote in message
news:h4********************************@4ax.com.. .
>>On Mon, 14 Apr 2008 19:14:04 -0500, Razii
<DO*************@hotmail.comwrote:

java -server -Xmx1024m -Xms1024m -XX:NewRatio=1 Test 24
Time: 7515 ms

Arggh, with n = 25 the C++ version was a zillion times faster. The
problem of course is GC minor collection runs which in the case are
not needed (sine all objects in this app are temporary). There are
many GC flags I don't know all of them. However, Ignore the above. To
get the best result in this benchmark, the flags must be

Java -server -Xmx1024m -Xms1024m -Xmn1023m Test n

where...

-Xmx = max memory available on the comp
-Xms = max memory available on the comp
-Xmn = 1 minus max memory available on the comp

I am going to give one more quick try... The third time the charm right?
lol. ;^)

Here is my third version:

http://pastebin.com/m303e8cab

[...]

You can mess around with the settings by setting the following pre-processor
definitions at compile time:

#if ! defined(REGION_DEPTH)
# define REGION_DEPTH() 5000000
#endif
#if ! defined(SLAB_DEPTH)
# define SLAB_DEPTH() 16384
#endif
#if ! defined(SLAB_PRIME_DEPTH)
# define SLAB_PRIME_DEPTH() 1
#endif
#if ! defined(SLAB_MAX_DEPTH)
# define SLAB_MAX_DEPTH() 6
#endif
Let me briefly explain what they are:

REGION_DEPTH() = The total size of the region_allocator interal heap.

SLAB_DEPTH() = The number of objects the slab_allocator assigns to a slab.

SLAB_PRIME_DEPTH() = The number of slabs the slab_allocator primes itself
with.

SLAB_MAX_DEPTH() = The total number of slabs the slab_allocator will reduce
to.

I have now spent a total of 25 minutes creating these very simplistic
allocator designs for this benchmark Razii. I could spend some more and
really make some marked improvement in the overall performance, but I don't
think I want to do that... Like "gpderetta" says, nobody is paying me...

;^)

Jun 27 '08 #291

Chris Thomasson

"Chris Thomasson" <cr*****@comcast.netwrote in message
news:x-******************************@comcast.com...

"Chris Thomasson" <cr*****@comcast.netwrote in message
news:i6******************************@comcast.com. ..
>"Chris Thomasson" <cr*****@comcast.netwrote in message
news:HY******************************@comcast.com ...
>>"Razii" <DO*************@hotmail.comwrote in message
news:h4********************************@4ax.com. ..
On Mon, 14 Apr 2008 19:14:04 -0500, Razii
<DO*************@hotmail.comwrote:

>java -server -Xmx1024m -Xms1024m -XX:NewRatio=1 Test 24
>Time: 7515 ms

Arggh, with n = 25 the C++ version was a zillion times faster. The
problem of course is GC minor collection runs which in the case are
not needed (sine all objects in this app are temporary). There are
many GC flags I don't know all of them. However, Ignore the above. To
get the best result in this benchmark, the flags must be

Java -server -Xmx1024m -Xms1024m -Xmn1023m Test n

where...

-Xmx = max memory available on the comp
-Xms = max memory available on the comp
-Xmn = 1 minus max memory available on the comp

I am going to give one more quick try... The third time the charm right?
lol. ;^)

Here is my third version:

http://pastebin.com/m303e8cab

[...]

You can mess around with the settings by setting the following
pre-processor definitions at compile time:

#if ! defined(REGION_DEPTH)
# define REGION_DEPTH() 5000000
#endif
#if ! defined(SLAB_DEPTH)
# define SLAB_DEPTH() 16384
#endif
#if ! defined(SLAB_PRIME_DEPTH)
# define SLAB_PRIME_DEPTH() 1
#endif
#if ! defined(SLAB_MAX_DEPTH)
# define SLAB_MAX_DEPTH() 6
#endif
Let me briefly explain what they are:

REGION_DEPTH() = The total size of the region_allocator interal heap.

SLAB_DEPTH() = The number of objects the slab_allocator assigns to a slab.

SLAB_PRIME_DEPTH() = The number of slabs the slab_allocator primes itself
with.

SLAB_MAX_DEPTH() = The total number of slabs the slab_allocator will
reduce to.

[...]
Crap! I don't think you can define macro functions on the command line. Here
is a version of my third test that does not use them:

http://pastebin.com/m45f642a5
Now, they are defined as plain old macros:
#if ! defined(REGION_DEPTH)
# define REGION_DEPTH 5000000
#endif
#if ! defined(SLAB_DEPTH)
# define SLAB_DEPTH 16384
#endif
#if ! defined(SLAB_PRIME_DEPTH)
# define SLAB_PRIME_DEPTH 1
#endif
#if ! defined(SLAB_MAX_DEPTH)
# define SLAB_MAX_DEPTH 6
#endif

Sorry about that non-sense!

Jun 27 '08 #292

Razii

I haven't tried your version yet but...

On Tue, 15 Apr 2008 15:56:40 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>java: Test 25 / CRTL-C <took longer than 1 min>

>java: -server -Xmx512m -Xms512m -XX:NewRatio=1

Well, I asked you to use this command instead...

java -server -Xmx512m -Xms512m -Xmn511m

It will be much faster..

Jun 27 '08 #293

Chris Thomasson

"Razii" <DO*************@hotmail.comwrote in message
news:mf********************************@4ax.com...

>
I haven't tried your version yet but...

On Tue, 15 Apr 2008 15:56:40 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>>java: Test 25 / CRTL-C <took longer than 1 min>

>>java: -server -Xmx512m -Xms512m -XX:NewRatio=1

Well, I asked you to use this command instead...

java -server -Xmx512m -Xms512m -Xmn511m

It will be much faster..

Whoops! Sorry about that. I have run all the tests again using the following
build options:

java -server -Xmx512m -Xms512m -Xmn511m

vcpp: /O2 /Ob2 /Oi /Ot /Oy /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D
"_UNICODE" /D "UNICODE" /FD /EHsc /MT /Fo"Release\\" /Fd"Release\vc80.pdb"
/W3 /nologo /c /Wp64 /Zi /Gr /TP /errorReport:prompt

Results
___________________________________________
java: Test 22 / Time: 2204 ms
vcpp: Test 22 / Time: 1501 ms
-------------------------------------------
java: Test 23 / Time: 3452 ms
vcpp: Test 23 / Time: 2198 ms
-------------------------------------------
java: Test 24 / Time: 12534 ms
vcpp: Test 24 / Time: 9254 ms
-------------------------------------------
java: Test 25 / CRTL-C <took longer than 1 min>
vcpp: Test 25 / Time: 26124 ms
-------------------------------------------
java: Test 26 / CRTL-C <took longer than 1 min>
vcpp: Test 26 / CRTL-C <took longer than 1 min>
The new build args made the Test 24 java version go from 33142 ms to 12534
ms. Definitely faster!

Jun 27 '08 #294

Chris Thomasson

"Chris Thomasson" <cr*****@comcast.netwrote in message
news:zb******************************@comcast.com. ..

"Razii" <DO*************@hotmail.comwrote in message
news:mf********************************@4ax.com...
>>
I haven't tried your version yet but...

On Tue, 15 Apr 2008 15:56:40 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>>>java: Test 25 / CRTL-C <took longer than 1 min>

>>>java: -server -Xmx512m -Xms512m -XX:NewRatio=1

Well, I asked you to use this command instead...

java -server -Xmx512m -Xms512m -Xmn511m

It will be much faster..

Whoops! Sorry about that. I have run all the tests again using the
following build options:

java -server -Xmx512m -Xms512m -Xmn511m

vcpp: /O2 /Ob2 /Oi /Ot /Oy /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D
"_UNICODE" /D "UNICODE" /FD /EHsc /MT /Fo"Release\\" /Fd"Release\vc80.pdb"
/W3 /nologo /c /Wp64 /Zi /Gr /TP /errorReport:prompt

>
Results
___________________________________________

[...]

-------------------------------------------
java: Test 25 / CRTL-C <took longer than 1 min>
vcpp: Test 25 / Time: 26124 ms
-------------------------------------------

[...]

>

The new build args made the Test 24 java version go from 33142 ms to 12534
ms. Definitely faster!

Razii, are there any build flags I can set to help the Java: Test 25? I
don't know why it takes longer than a minute.

Jun 27 '08 #295

Razii

On Tue, 15 Apr 2008 15:56:40 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>___________________________________________
java: Test 22 / Time: 2375 ms
vcpp: Test 22 / Time: 1468 ms
-------------------------------------------
java: Test 23 / Time: 3594 ms
vcpp: Test 23 / Time: 2362 ms
-------------------------------------------
java: Test 24 / Time: 33141 ms
vcpp: Test 24 / Time: 9390 ms
-------------------------------------------
java: Test 25 / CRTL-C <took longer than 1 min>
vcpp: Test 25 / Time: 26390 ms
-------------------------------------------
java: Test 26 / CRTL-C <took longer than 1 mins>
vcpp: Test 26 / CRTL-C <took longer than 1 mins>

http://pastebin.com/m45f642a5 (c++) Is that the right one?
http://pastebin.com/f3f559ae2 (java)

cl /O2 /GL /D "NDEBUG" new.cpp /link /ltcg

java -server -Xmx1024m -Xms1024m -Xmn1023m Test n

java: Test 22 / Time: 1812 ms
cpp: Test 22 / Time: 2031 ms

java: Test 23 / Time: 3391 ms
cpp: Test 23 / Time: 4390 ms

java: Test 24 / Time: 7281 ms
cpp: Test 24 / Time: 10406 ms

java: Test 25 / Time: 14344 ms
cpp: Test 25 / Time: 22875 ms

java: Test 26 / CRTL-C
cpp: Test 26 / CRTL-C

I don't get the same results...

Jun 27 '08 #296

Razii

On Tue, 15 Apr 2008 16:34:33 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>java: Test 25 / CRTL-C <took longer than 1 min>
vcpp: Test 25 / Time: 26124 ms

Yes, with 512m ram the java is much slower probably because it uses
more memory than the c++ version.

Jun 27 '08 #297

Razii

On Tue, 15 Apr 2008 16:40:57 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>Razii, are there any build flags I can set to help the Java: Test 25? I
don't know why it takes longer than a minute.

It doesn't take longer on my comp with 1024m ram. With n = 25, it
probably uses just more memory than the C++ version and the OS starts
swapping memory.

Jun 27 '08 #298

Razii

On Tue, 15 Apr 2008 16:06:16 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>I have now spent a total of 25 minutes creating these very simplistic
allocator designs for this benchmark Razii. I could spend some more and
really make some marked improvement in the overall performance, but I don't
think I want to do that... Like "gpderetta" says, nobody is paying me...

But isn't the problem that this allocator works with only one object,
only this tree? You have overloaded both the delete and new. How would
this work with applications that have dozens of classes and objects
that need to be created dynamically?

Jun 27 '08 #299

Chris Thomasson

"Razii" <DO*************@hotmail.comwrote in message
news:7d********************************@4ax.com...

On Tue, 15 Apr 2008 16:06:16 -0700, "Chris Thomasson"
<cr*****@comcast.netwrote:

>>I have now spent a total of 25 minutes creating these very simplistic
allocator designs for this benchmark Razii. I could spend some more and
really make some marked improvement in the overall performance, but I
don't
think I want to do that... Like "gpderetta" says, nobody is paying me...

But isn't the problem that this allocator works with only one object,
only this tree? You have overloaded both the delete and new. How would
this work with applications that have dozens of classes and objects
that need to be created dynamically?

I said I only created these things for this benchmark.

Jun 27 '08 #300

Similar topics