By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,288 Members | 3,027 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,288 IT Pros & Developers. It's quick & easy.

The SETT Programming Contest: The fastest set<T> implementation

P: n/a
The SETT Programming Contest: The fastest set<Timplementation

Write the fastest set<Timplementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Here, speed and correctness are the 2 most important factors.
Functionally it should behave similar to std::set,
have practically no limitation on the number of items,
and be significantly faster, either always or under specified
conditions (for example for random items etc.).

The class should be placed in an own namespace.
Supply also a benchmark routine which compares your
implementation with that of std::set, and briefly document
and comment these results.

Hint: Since items in a set are always in an ordered sequence
(as determined by operator<()) you need to use some tree
routines (AVL, red-black, splay, etc.) or use your own tree
method in the heart of your implementation.

See also this paper:
Ben Pfaff, "Performance Analysis of BST in System Software", Stanford University
http://www.stanford.edu/~blp/papers/libavl.pdf

Publish your finished project somewhere on the Web
(for example at your own homepage or at http://sourceforge.net/ )
and announce it in comp.lang.c++ .

There is no deadline on this contest. The fastest method
will be the winner of the SETT Programming Contest,
until someone comes up with a faster method.

Good Luck!

Jun 27 '08 #1
Share this Question
Share on Google+
22 Replies


P: n/a
"SETT Programming Contest" <se**@sett.programming.contest.org.invalidwrote
in message news:g2**********@aioe.org...
The SETT Programming Contest: The fastest set<Timplementation
What's the prize?

[...]

Jun 27 '08 #2

P: n/a
SETT Programming Contest wrote:
The SETT Programming Contest: The fastest set<Timplementation
Considerable speedups can be achieved by simply using the default
std::set with a custom memory allocator. I have done exactly that here:

http://warp.povusers.org/FSBAllocator/

I suppose that could be my entry.
Jun 27 '08 #3

P: n/a
On Jun 3, 4:55 pm, "SETT Programming Contest"
<s...@sett.programming.contest.org.invalidwrote:
The SETT Programming Contest: The fastest set<Timplementation

Write the fastest set<Timplementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Here, speed and correctness are the 2 most important factors.
Functionally it should behave similar to std::set,
have practically no limitation on the number of items,
and be significantly faster, either always or under specified
conditions (for example for random items etc.).

The class should be placed in an own namespace.
Supply also a benchmark routine which compares your
implementation with that of std::set, and briefly document
and comment these results.

Hint: Since items in a set are always in an ordered sequence
(as determined by operator<()) you need to use some tree
routines (AVL, red-black, splay, etc.) or use your own tree
method in the heart of your implementation.

See also this paper:
Ben Pfaff, "Performance Analysis of BST in System Software", Stanford Universityhttp://www.stanford.edu/~blp/papers/libavl.pdf

Publish your finished project somewhere on the Web
(for example at your own homepage or athttp://sourceforge.net/)
and announce it in comp.lang.c++ .

There is no deadline on this contest. The fastest method
will be the winner of the SETT Programming Contest,
until someone comes up with a faster method.

Good Luck!
Interesting contest. However, the assumption that there should be no
limitation on the number of items implies that the underlying memory
management model will be similar to the one used by std::set. It
appears that a change to this memory model (particularly within modern
multi-threaded environments) may have a more dramatic affect on
performance benchmarks, then the selection of the underlying data
structure, as demonstrated by the String library
(see: http://strinx.sourceforge.net/docs/strinx.html).

ShacharS.
Jun 27 '08 #4

P: n/a
"SETT Programming Contest" <se**@sett.programming.contest.org.invalidwrites :
The SETT Programming Contest: The fastest set<Timplementation

Write the fastest set<Timplementation using only standard C++/C.
Ideally it should have the same interface like std::set.
[snip]

What is "C++/C"?

Since C doesn't have C++-style templates or namespaces, the problem as
stated cannot be solved in C (though you could probably do the
equivalent).

Why not just say "standard C++" and leave C out of it?

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jun 27 '08 #5

P: n/a
As others have observed, the conditions stop C being used.

The basic issue for most set implmentations is that insert/delete is slowest
as memory allocation/deallocation occurs per-node.
If you double this, the perforamance of set doubles.
So an allocator geared towards allocation of nodes the same size is where
time should be spent.

Stephen Howe


Jun 27 '08 #6

P: n/a
"Keith Thompson" <ks***@mib.orgwrote:
"SETT Programming Contest" writes:

The SETT Programming Contest: The fastest set<Timplementation

Write the fastest set<Timplementation using only standard C++/C.
Ideally it should have the same interface like std::set.
[snip]

What is "C++/C"?

Since C doesn't have C++-style templates or namespaces, the problem as
stated cannot be solved in C (though you could probably do the
equivalent).

Why not just say "standard C++" and leave C out of it?
It means that you can also use pure C for the underlying methods,
ie. by using the 'extern "C"' interface mechanism of the C++ compiler.
Only the public interface must be in C++, the rest can also be
in portable standard C. For example if you already have some
working C code (for example the tree routines) for re-use here...

Jun 27 '08 #7

P: n/a
On Jun 4, 4:37*am, "SETT Programming Contest"
<s...@sett.programming.contest.org.invalidwrote:
"Richard Harter" <c...@tiac.netwrote:
"SETT Programming Contest" wrote:
>Write the fastest set<Timplementation using only standard C++/C.
>Here, speed and correctness are the 2 most important factors.

In-memory methods only; much like std::set
Too bad the contest doesn't include memory needs as a
criterion; I've a method with the same compaction as
Cleary's Compact Tree, but faster -- though not as fast as
non-compact methods.

Obviously, a compact method will be faster than a "fast"
method, if the former fits in core and the latter doesn't.

James Dow Allen
Jun 27 '08 #8

P: n/a
SETT Programming Contest wrote:
The SETT Programming Contest: The fastest set<Timplementation

Write the fastest set<Timplementation using only standard C++/C...
That basically rules out persistence and efficient set theoretic operations.

The set implementations in languages like OCaml are up to a million times
faster in practice but to be competitively performant without missing
features you would need to Greenspun an efficient GC in C++ (and
assembler). So why restrict your competition to archaic languages like C++?

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com/products/?u
Jun 27 '08 #9

P: n/a
On Jun 3, 3:55 pm, "SETT Programming Contest"
<s...@sett.programming.contest.org.invalidwrote:
The SETT Programming Contest: The fastest set<Timplementation
Write the fastest set<Timplementation using only standard
C++/C.
Fastest at what? On what machine? With which implementation of
C++?

[...]
See also this paper:
Ben Pfaff, "Performance Analysis of BST in System Software", Stanford Universityhttp://www.stanford.edu/~blp/papers/libavl.pdf
Which more or less establishes that "fastest" is meaningless
unless you define at what?

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jun 27 '08 #10

P: n/a

"SETT Programming Contest" <se**@sett.programming.contest.org.invalidwrote
in message news:g2**********@aioe.org...
The SETT Programming Contest: The fastest set<Timplementation

Write the fastest set<Timplementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.
I'm not going to enter this (since, until today, I'd never heard of set<T>
or BST).

But I'm interested in clearing up what the task is.

A set<Tis just an arbitrary collection of values (each unique) of type T?
Like a Pascal set? (OK a Pascal set is limited to scalars and has a pre-set
range).

And T can be anything at all, provided operator < is meaningful for it?

In this case it all sounds pretty straightforward, so why limit the contest
to C++ only? Is it to eliminate languages that already have very fast
implementations built-in?

--
Bartc
Jun 27 '08 #11

P: n/a
On Tue, 3 Jun 2008 21:57:52 +0100, "Stephen Howe"
<sjhoweATdialDOTpipexDOTcomwrote:
>As others have observed, the conditions stop C being used.

The basic issue for most set implmentations is that insert/delete is slowest
as memory allocation/deallocation occurs per-node.
If you double this, the perforamance of set doubles.
So an allocator geared towards allocation of nodes the same size is where
time should be spent.
Point well taken; however that part is fairly trivial (for
sufficiently difficult of versions of trivial, of course.) The
idea is to create a linked list of blocks of storage that are
used as the equivalent of automatic storage, along with a free
list of nodes. Insert snatches a node from the free list, delete
puts one back on the free list. If the free list is empty,
insert nibbles at the current block. The ime cost is a small
fixed cost per operation, negligible compared to the operation
time cost.

The space is returned by freeing the chained list of blocks one
at a time when the set is deleted.

Richard Harter, cr*@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.
Jun 27 '08 #12

P: n/a
On Tue, 3 Jun 2008 16:36:06 +0200, "SETT Programming Contest"
<se**@sett.programming.contest.org.invalidwrote:
>The SETT Programming Contest: The fastest set<Timplementation

Write the fastest set<Timplementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.
What about union, intersection, and set difference?
Richard Harter, cr*@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.
Jun 27 '08 #13

P: n/a
Point well taken; however that part is fairly trivial (for
sufficiently difficult of versions of trivial, of course.) The
idea is to create a linked list of blocks of storage that are
used as the equivalent of automatic storage, along with a free
list of nodes. Insert snatches a node from the free list, delete
puts one back on the free list. If the free list is empty,
insert nibbles at the current block. The ime cost is a small
fixed cost per operation, negligible compared to the operation
time cost.

The space is returned by freeing the chained list of blocks one
at a time when the set is deleted.
I wouldn't do that. The scheme above works fine providing you are recycling
nodes.
If you intend to insert nodes and the only point of deletion is when the set
is deleted, you have not gained anything.

I would looking to use a pool scheme, where large blocks of raw memory are
carved off, and sliced up per node.
Anything to reduce the cost of node allocation.

Stephen Howe


Jun 27 '08 #14

P: n/a
"SETT Programming Contest" <se**@sett.programming.contest.org.invalid>
writes:
Write the fastest set<Timplementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.
Wouldn't it be easier to compare the implementations if you
required them to meet all the requirements placed on std::set by
the C++ standard?
See also this paper:
Ben Pfaff, "Performance Analysis of BST in System Software", Stanford University
http://www.stanford.edu/~blp/papers/libavl.pdf
How amusing.
--
Ben Pfaff
http://benpfaff.org
Jun 27 '08 #15

P: n/a
I would looking to use a pool scheme, where large blocks of raw memory are
carved off, and sliced up per node.
Anything to reduce the cost of node allocation.
Perhaps run both schemes or similar. A reap springs to mind (heap inside a
region)

Stephen Howe
Jun 27 '08 #16

P: n/a
On Wed, 4 Jun 2008 17:31:39 +0100, "Stephen Howe"
<sjhoweATdialDOTpipexDOTcomwrote:
>Point well taken; however that part is fairly trivial (for
sufficiently difficult of versions of trivial, of course.) The
idea is to create a linked list of blocks of storage that are
used as the equivalent of automatic storage, along with a free
list of nodes. Insert snatches a node from the free list, delete
puts one back on the free list. If the free list is empty,
insert nibbles at the current block. The ime cost is a small
fixed cost per operation, negligible compared to the operation
time cost.

The space is returned by freeing the chained list of blocks one
at a time when the set is deleted.

I wouldn't do that. The scheme above works fine providing you are recycling
nodes.
If you intend to insert nodes and the only point of deletion is when the set
is deleted, you have not gained anything.

I would looking to use a pool scheme, where large blocks of raw memory are
carved off, and sliced up per node.
Anything to reduce the cost of node allocation.
What we have here is a failure to communicate. I read you as
saying, "Don't do that. It won't work. Instead do what you're
doing." :-)

No doubt the fault is mine for not spelling out more details.
The aforementioned blocks are big blocks, aka pools, large enough
to hold upwards of 100 nodes. (I sort of took it for granted
that saying "used as the equivalent of automatic storage" was
sufficient.) Blocks are in a structure that looks something like
this:

struct stgpool {
stgpool * link;
size_t pos;
char * space;
};

We have a routine to create a pool, one to get storage from the
pool, and one to delete the pool. The argument for the routine
to get storage from a pool is the amount of storage required.
The pool allocator adds the requested size to the position
(making due allowance for alignment) and returns the old
position. In case it is not clear, "space" is the big block of
space and "pos" is the index into "space". When a block is
created we malloc space for the struct, space for a block, link
in the new stgpool, and set pos = 0.

The freelist is a refinement, letting us recycle nodes if there
are inserts and deletes. This rather simple pool allocator is a
high water allocator, i.e., it keeps all of the storage it
mallocs until the pool delete routine is called.

Incidentally, you do need to chain the pools when you don't know
in advance how much space you are actually going to need. Also,
there is no need to realloc.

Richard Harter, cr*@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.
Jun 27 '08 #17

P: n/a
On Wed, 4 Jun 2008 19:34:09 +0100, "Stephen Howe"
<sjhoweATdialDOTpipexDOTcomwrote:
>I would looking to use a pool scheme, where large blocks of raw memory are
carved off, and sliced up per node.
Anything to reduce the cost of node allocation.

Perhaps run both schemes or similar. A reap springs to mind (heap inside a
region)
Actually, what I proposed was a combination of a pool allocator
and a freelist. Apparently that wasn't clear. I don't see that
a reap buys you anything.
Richard Harter, cr*@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.
Jun 27 '08 #18

P: n/a
In article <g2**********@aioe.org>, se**@sett.programming.contest.org.invalid says...
The SETT Programming Contest: The fastest set<Timplementation

Write the fastest set<Timplementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Here, speed and correctness are the 2 most important factors.
Does this mean that a totally incorect but exceedingly fast implementation can still get a 50% pass mark?
Jun 27 '08 #19

P: n/a
On Jun 4, 7:56 pm, Ben Pfaff <b...@cs.stanford.eduwrote:
"SETT Programming Contest" <s...@sett.programming.contest.org.invalid>
writes:
Write the fastest set<Timplementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Wouldn't it be easier to compare the implementations if you
required them to meet all the requirements placed on std::set by
the C++ standard?
See also this paper:
Ben Pfaff, "Performance Analysis of BST in System Software", Stanford University
http://www.stanford.edu/~blp/papers/libavl.pdf

How amusing.
--
Ben Pfaffhttp://benpfaff.org

As stated in previous post, this requirement alone assumes certain
memory model, which may affect the actual performance. Nevertheless, I
would like propose the following code fragment as possible candidate
for the actual test (after-all, we should start somewhere):
template <typename SetT>
void sett_contest(SetT& s, int n)
{
typedef typename SetT::value_type value_type;

int i;
std::vector<value_type v(n);

for (i = 0; i < n; ++i)
{
v[i] = value_type(i);
}

std::random_shuffle(v.begin(), v.end());
s.insert(v.begin(), v.end());

std::random_shuffle(v.begin(), v.end());
for (i = 0; i < n; ++i)
{
s.erase(s.find(v[i]));
}
}

ShacharS.
Jun 27 '08 #20

P: n/a
On Jun 3, 3:55 pm, "SETT Programming Contest"
<s...@sett.programming.contest.org.invalidwrote:
Write the fastest set<Timplementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.
Is that you Razii? What are you trying to prove?

Regards,
Anders
Jun 27 '08 #21

P: n/a
I don't see any specifications on iterator invalidation in the
original post.
I think the intention is it follows the same as std::set.
And that means an iterator to a element within the container is always valid
for as long as the element exists.
So any rebalancing does not affect the iterator.
That iterator validity is true for all nodal containers in C++ : std::list,
std::map etc.

Stephen Howe
Jun 27 '08 #22

P: n/a
"Anders Dalvander" <go****@dalvander.comwrote in message
news:84**********************************@c58g2000 hsc.googlegroups.com...
On Jun 3, 3:55 pm, "SETT Programming Contest"
<s...@sett.programming.contest.org.invalidwrote:
>Write the fastest set<Timplementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Is that you Razii?
:^)

What are you trying to prove?

Jun 27 '08 #23

This discussion thread is closed

Replies have been disabled for this discussion.