P: n/a

I need a data structure with the following capabilities:
1. Fast access to the smallest element.
2. Fast insertion of a new element.
3. Fast deletion of an existing element.
I have been using a sorted immutable balanced binary tree set implementation
that provides the above operations each in O(log n) but it is extremely
slow on .NET, around 8x slower than OCaml+Linux.
I do not benefit from the use of an immutable data structure, so what
mutable data structure would a C# programmer use?

Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u  
Share this Question
P: n/a

SortedDictionary<,is O(log n) for retreival, insert and removal; see
"remarks" here: http://msdn.microsoft.com/enus/library/f7fta44c.aspx
Note that this only allows one value per unique key. Of course, you
could always craft something grungy looking like ILookup<,>, using a
SortedDictionary<Foo, List<Bar>(not ideal).
Marc  
P: n/a

Hi Marc
Do you know how this differs from Dictionary<T,Vinternally? I thought
that looking up an index based on a hash would be quicker than sorting the
keys. If you don't know OTTOYH I will spend some time later with reflector
:)

Pete
==== http://mrpmorris.blogspot.com http://www.capableobjects.com  
P: n/a

Not of the top of my head; I would have to check with either reflector
or the reference source symbol server.
Marc  
P: n/a

Jon Harrop wrote:
I need a data structure with the following capabilities:
1. Fast access to the smallest element.
2. Fast insertion of a new element.
3. Fast deletion of an existing element.
I have been using a sorted immutable balanced binary tree set implementation
that provides the above operations each in O(log n) but it is extremely
slow on .NET, around 8x slower than OCaml+Linux.
I do not benefit from the use of an immutable data structure, so what
mutable data structure would a C# programmer use?
How often do you need to get the smallest element compared to how often
you insert and delete elements?
You could create a wrapper that contains a Dictionary<Key,Valuewith
all items, and a short array Key[] with a few of the smallest items,
that you keep sorted. When you insert an item, check if it should be
added to the array of smallest items, and when you delete an item, check
if it's in the array of smallest items.
If the array of smallest items happens to be empty when you try to
access the smallest item (i.e. you have deleted all of the smallest
items), you take a hit as you have to repopulate the array by looking
through the dictionary, otherwise it is an O(1) operation.

Göran Andersson
_____ http://www.guffa.com  
P: n/a

I can see that the hash could be more expensive for a key that is a long
string, but I wouldn't have thought many objects need such a complicated
hash do that? In my case there are no structs in my apps so mostely my keys
would be
enums
short strings
integers
instance of an object
I'd suspect in this case the Dictionary<T,Vwould probably be quicker?

Pete
==== http://mrpmorris.blogspot.com http://www.capableobjects.com  
P: n/a

I use a Binary Heap in my application. It is immediate (aka, O(1))
access to the smallest element. It is O(log n) to delete or insert an
element. Perhaps another type of heap (like a Soft Heap) would help
you. See the chart here: http://en.wikipedia.org/wiki/Fibonacci_heap.
I coded a Soft Heap in C# before. Unfortunately it does two
allocations per insert. This was significantly more expensive in
performance than my binary heap that stores everything in an array in
that lots of small allocations are expensive in C#. (The Binary Heap
changes the array size and copies when it runs out of room similar to
List<>).  
P: n/a

GĂ¶ran Andersson wrote:
Jon Harrop wrote:
>I need a data structure with the following capabilities:
1. Fast access to the smallest element.
2. Fast insertion of a new element.
3. Fast deletion of an existing element.
I have been using a sorted immutable balanced binary tree set implementation that provides the above operations each in O(log n) but it is extremely slow on .NET, around 8x slower than OCaml+Linux.
I do not benefit from the use of an immutable data structure, so what mutable data structure would a C# programmer use?
How often do you need to get the smallest element compared to how often
you insert and delete elements?
1 get smallest to 4 inserts and 2 removes.
You could create a wrapper that contains a Dictionary<Key,Valuewith
all items, and a short array Key[] with a few of the smallest items,
that you keep sorted. When you insert an item, check if it should be
added to the array of smallest items, and when you delete an item, check
if it's in the array of smallest items.
If the array of smallest items happens to be empty when you try to
access the smallest item (i.e. you have deleted all of the smallest
items), you take a hit as you have to repopulate the array by looking
through the dictionary, otherwise it is an O(1) operation.
That is O(n) instead of O(log n) though, so it is definitely slower for
sufficiently large problems.

Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u  
P: n/a

Marc Gravell wrote:
SortedDictionary<,is O(log n) for retreival, insert and removal; see
"remarks" here:
http://msdn.microsoft.com/enus/library/f7fta44c.aspx
Note that this only allows one value per unique key. Of course, you
could always craft something grungy looking like ILookup<,>, using a
SortedDictionary<Foo, List<Bar>(not ideal).
SortedDictionary does not appear to provide the minimum element. Also, its
second type parameter is superfluous in this case, so it will probably
waste space.

Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u  
P: n/a

Peter Morris wrote:
Hi Marc
Do you know how this differs from Dictionary<T,Vinternally?
SortedDictionary is a balanced binary tree. This is essentially the
implementation I have been using but it is extremely slow on .NET.
I thought that looking up an index based on a hash would be quicker than
sorting the keys.
Only if you know the key you are looking for which I do not: I need the
smallest element. So hashing is obviously worse than useless here.

Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u  
P: n/a

not_a_commie wrote:
I use a Binary Heap in my application. It is immediate (aka, O(1))
access to the smallest element. It is O(log n) to delete or insert an
element. Perhaps another type of heap (like a Soft Heap) would help
you. See the chart here: http://en.wikipedia.org/wiki/Fibonacci_heap.
I coded a Soft Heap in C# before. Unfortunately it does two
allocations per insert. This was significantly more expensive in
performance than my binary heap that stores everything in an array in
that lots of small allocations are expensive in C#. (The Binary Heap
changes the array size and copies when it runs out of room similar to
List<>).
Thanks for the lead. I'll check this out.

Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u  
P: n/a

SortedDictionary does not appear to provide the minimum element.
Since it is sorted, it'll be the first one.
But if the performance isn't up to it, then fine; just a thought...
Marc  
P: n/a

Marc Gravell wrote:
>SortedDictionary does not appear to provide the minimum element.
Since it is sorted, it'll be the first one.
Right. So you create an enumeration of the keys, get the first key and use
that to remove the first element:
ignore(Heap.Remove(Seq.hd(heap.Keys)))
But if the performance isn't up to it, then fine; just a thought...
Actually, the performance of SortedDictionary is substantially better than
the performance of the data structure I was using.
Time taken to insert 10^6 floats and then remove them all by popping the
smallest:
F#.NET: 13.4s Set.Make
F#.NET: 5.6s System.Collections.Generic.SortedDictionary<float, obj>
F#.NET: 4.1s System.Collections.Generic.SortedDictionary<float, float>
F#.NET: 3.8s System.Collections.Generic.SortedDictionary<float, int>
OCaml: 2.3s Set.Make
The performance of SortedDictionary could be improved upon in this context
because the values are not used and can be removed and it seems wasteful to
create an enumeration just to get the first element.

Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u  
P: n/a

Jon Harrop wrote:
Marc Gravell wrote:
>>SortedDictionary does not appear to provide the minimum element.
Since it is sorted, it'll be the first one.
Right. So you create an enumeration of the keys, get the first key and use
that to remove the first element:
ignore(Heap.Remove(Seq.hd(heap.Keys)))
>But if the performance isn't up to it, then fine; just a thought...
Actually, the performance of SortedDictionary is substantially better than
the performance of the data structure I was using.
Time taken to insert 10^6 floats and then remove them all by popping the
smallest:
F#.NET: 13.4s Set.Make
F#.NET: 5.6s System.Collections.Generic.SortedDictionary<float, obj>
F#.NET: 4.1s System.Collections.Generic.SortedDictionary<float, float>
F#.NET: 3.8s System.Collections.Generic.SortedDictionary<float, int>
OCaml: 2.3s Set.Make
Sorry, that should have been:
OCaml: 11.5s Set.Make
OCaml: 2.3s JeanChristophe Fillaitre's Imperative Heap
So .NET is actually doing quite well here. What I don't understand is
why .NET does fine on this benchmark but is doing very poorly on my actual
application. I'll try using SortedDictionary anyway: thanks!

Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u  
P: n/a

Jon Harrop wrote:
What I don't understand is why .NET does fine on this benchmark but is
doing very poorly on my actual application.
I just discovered why: F# was generating a very inefficient comparison
function. Writing the comparison function by hand makes the whole program
7x faster!

Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u  
P: n/a

Jon Harrop wrote:
GĂ¶ran Andersson wrote:
>Jon Harrop wrote:
>>I need a data structure with the following capabilities:
1. Fast access to the smallest element.
2. Fast insertion of a new element.
3. Fast deletion of an existing element.
I have been using a sorted immutable balanced binary tree set implementation that provides the above operations each in O(log n) but it is extremely slow on .NET, around 8x slower than OCaml+Linux.
I do not benefit from the use of an immutable data structure, so what mutable data structure would a C# programmer use?
How often do you need to get the smallest element compared to how often you insert and delete elements?
1 get smallest to 4 inserts and 2 removes.
>You could create a wrapper that contains a Dictionary<Key,Valuewith all items, and a short array Key[] with a few of the smallest items, that you keep sorted. When you insert an item, check if it should be added to the array of smallest items, and when you delete an item, check if it's in the array of smallest items.
If the array of smallest items happens to be empty when you try to access the smallest item (i.e. you have deleted all of the smallest items), you take a hit as you have to repopulate the array by looking through the dictionary, otherwise it is an O(1) operation.
That is O(n) instead of O(log n) though, so it is definitely slower for
sufficiently large problems.
Yes, that is slower when the array of smallest items is empty, but that
is far from every time you access the smallest item. If you for example
set the size of the array to 100, you have to delete at least 100 items
from the collection before you have to refill the array.

GĂ¶ran Andersson
_____ http://www.guffa.com  
P: n/a

GĂ¶ran Andersson wrote:
Jon Harrop wrote:
>That is O(n) instead of O(log n) though, so it is definitely slower for sufficiently large problems.
Yes, that is slower when the array of smallest items is empty, but that
is far from every time you access the smallest item. If you for example
set the size of the array to 100, you have to delete at least 100 items
from the collection before you have to refill the array.
And it will take 100x as long to repopulate the array every 100th call so
the overall cost is still O(n) whereas the original is O(log n).
Writing a real heap implementation, as not_a_commie suggested, is actually
O(1) for this operation and it runs a *lot* faster as a consequence. Once I
fixed my comparison function the performance of .NET was actually superb
using this technique...

Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u   This discussion thread is closed Replies have been disabled for this discussion.   Question stats  viewed: 7128
 replies: 16
 date asked: Oct 17 '08
