By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,058 Members | 1,213 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,058 IT Pros & Developers. It's quick & easy.

Efficient sorted heap

P: n/a

I need a data structure with the following capabilities:

1. Fast access to the smallest element.

2. Fast insertion of a new element.

3. Fast deletion of an existing element.

I have been using a sorted immutable balanced binary tree set implementation
that provides the above operations each in O(log n) but it is extremely
slow on .NET, around 8x slower than OCaml+Linux.

I do not benefit from the use of an immutable data structure, so what
mutable data structure would a C# programmer use?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #1
Share this Question
Share on Google+
16 Replies


P: n/a
SortedDictionary<,is O(log n) for retreival, insert and removal; see
"remarks" here:

http://msdn.microsoft.com/en-us/library/f7fta44c.aspx

Note that this only allows one value per unique key. Of course, you
could always craft something grungy looking like ILookup<,>, using a
SortedDictionary<Foo, List<Bar>(not ideal).

Marc
Oct 17 '08 #2

P: n/a
Hi Marc

Do you know how this differs from Dictionary<T,Vinternally? I thought
that looking up an index based on a hash would be quicker than sorting the
keys. If you don't know OTTOYH I will spend some time later with reflector
:-)

--
Pete
====
http://mrpmorris.blogspot.com
http://www.capableobjects.com

Oct 17 '08 #3

P: n/a
Not of the top of my head; I would have to check with either reflector
or the reference source symbol server.

Marc
Oct 17 '08 #4

P: n/a
Jon Harrop wrote:
I need a data structure with the following capabilities:

1. Fast access to the smallest element.

2. Fast insertion of a new element.

3. Fast deletion of an existing element.

I have been using a sorted immutable balanced binary tree set implementation
that provides the above operations each in O(log n) but it is extremely
slow on .NET, around 8x slower than OCaml+Linux.

I do not benefit from the use of an immutable data structure, so what
mutable data structure would a C# programmer use?
How often do you need to get the smallest element compared to how often
you insert and delete elements?

You could create a wrapper that contains a Dictionary<Key,Valuewith
all items, and a short array Key[] with a few of the smallest items,
that you keep sorted. When you insert an item, check if it should be
added to the array of smallest items, and when you delete an item, check
if it's in the array of smallest items.

If the array of smallest items happens to be empty when you try to
access the smallest item (i.e. you have deleted all of the smallest
items), you take a hit as you have to repopulate the array by looking
through the dictionary, otherwise it is an O(1) operation.

--
Göran Andersson
_____
http://www.guffa.com
Oct 17 '08 #5

P: n/a
I can see that the hash could be more expensive for a key that is a long
string, but I wouldn't have thought many objects need such a complicated
hash do that? In my case there are no structs in my apps so mostely my keys
would be

enums
short strings
integers
instance of an object
I'd suspect in this case the Dictionary<T,Vwould probably be quicker?

--
Pete
====
http://mrpmorris.blogspot.com
http://www.capableobjects.com

Oct 17 '08 #6

P: n/a
I use a Binary Heap in my application. It is immediate (aka, O(1))
access to the smallest element. It is O(log n) to delete or insert an
element. Perhaps another type of heap (like a Soft Heap) would help
you. See the chart here: http://en.wikipedia.org/wiki/Fibonacci_heap.
I coded a Soft Heap in C# before. Unfortunately it does two
allocations per insert. This was significantly more expensive in
performance than my binary heap that stores everything in an array in
that lots of small allocations are expensive in C#. (The Binary Heap
changes the array size and copies when it runs out of room similar to
List<>).
Oct 17 '08 #7

P: n/a
Göran Andersson wrote:
Jon Harrop wrote:
>I need a data structure with the following capabilities:

1. Fast access to the smallest element.

2. Fast insertion of a new element.

3. Fast deletion of an existing element.

I have been using a sorted immutable balanced binary tree set
implementation that provides the above operations each in O(log n) but it
is extremely slow on .NET, around 8x slower than OCaml+Linux.

I do not benefit from the use of an immutable data structure, so what
mutable data structure would a C# programmer use?

How often do you need to get the smallest element compared to how often
you insert and delete elements?
1 get smallest to 4 inserts and 2 removes.
You could create a wrapper that contains a Dictionary<Key,Valuewith
all items, and a short array Key[] with a few of the smallest items,
that you keep sorted. When you insert an item, check if it should be
added to the array of smallest items, and when you delete an item, check
if it's in the array of smallest items.

If the array of smallest items happens to be empty when you try to
access the smallest item (i.e. you have deleted all of the smallest
items), you take a hit as you have to repopulate the array by looking
through the dictionary, otherwise it is an O(1) operation.
That is O(n) instead of O(log n) though, so it is definitely slower for
sufficiently large problems.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #8

P: n/a
Marc Gravell wrote:
SortedDictionary<,is O(log n) for retreival, insert and removal; see
"remarks" here:

http://msdn.microsoft.com/en-us/library/f7fta44c.aspx

Note that this only allows one value per unique key. Of course, you
could always craft something grungy looking like ILookup<,>, using a
SortedDictionary<Foo, List<Bar>(not ideal).
SortedDictionary does not appear to provide the minimum element. Also, its
second type parameter is superfluous in this case, so it will probably
waste space.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #9

P: n/a
Peter Morris wrote:
Hi Marc

Do you know how this differs from Dictionary<T,Vinternally?
SortedDictionary is a balanced binary tree. This is essentially the
implementation I have been using but it is extremely slow on .NET.
I thought that looking up an index based on a hash would be quicker than
sorting the keys.
Only if you know the key you are looking for which I do not: I need the
smallest element. So hashing is obviously worse than useless here.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #10

P: n/a
not_a_commie wrote:
I use a Binary Heap in my application. It is immediate (aka, O(1))
access to the smallest element. It is O(log n) to delete or insert an
element. Perhaps another type of heap (like a Soft Heap) would help
you. See the chart here: http://en.wikipedia.org/wiki/Fibonacci_heap.
I coded a Soft Heap in C# before. Unfortunately it does two
allocations per insert. This was significantly more expensive in
performance than my binary heap that stores everything in an array in
that lots of small allocations are expensive in C#. (The Binary Heap
changes the array size and copies when it runs out of room similar to
List<>).
Thanks for the lead. I'll check this out.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #11

P: n/a
SortedDictionary does not appear to provide the minimum element.

Since it is sorted, it'll be the first one.

But if the performance isn't up to it, then fine; just a thought...

Marc
Oct 17 '08 #12

P: n/a
Marc Gravell wrote:
>SortedDictionary does not appear to provide the minimum element.

Since it is sorted, it'll be the first one.
Right. So you create an enumeration of the keys, get the first key and use
that to remove the first element:

ignore(Heap.Remove(Seq.hd(heap.Keys)))
But if the performance isn't up to it, then fine; just a thought...
Actually, the performance of SortedDictionary is substantially better than
the performance of the data structure I was using.

Time taken to insert 10^6 floats and then remove them all by popping the
smallest:

F#.NET: 13.4s Set.Make
F#.NET: 5.6s System.Collections.Generic.SortedDictionary<float, obj>
F#.NET: 4.1s System.Collections.Generic.SortedDictionary<float, float>
F#.NET: 3.8s System.Collections.Generic.SortedDictionary<float, int>
OCaml: 2.3s Set.Make

The performance of SortedDictionary could be improved upon in this context
because the values are not used and can be removed and it seems wasteful to
create an enumeration just to get the first element.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #13

P: n/a
Jon Harrop wrote:
Marc Gravell wrote:
>>SortedDictionary does not appear to provide the minimum element.

Since it is sorted, it'll be the first one.

Right. So you create an enumeration of the keys, get the first key and use
that to remove the first element:

ignore(Heap.Remove(Seq.hd(heap.Keys)))
>But if the performance isn't up to it, then fine; just a thought...

Actually, the performance of SortedDictionary is substantially better than
the performance of the data structure I was using.

Time taken to insert 10^6 floats and then remove them all by popping the
smallest:

F#.NET: 13.4s Set.Make
F#.NET: 5.6s System.Collections.Generic.SortedDictionary<float, obj>
F#.NET: 4.1s System.Collections.Generic.SortedDictionary<float, float>
F#.NET: 3.8s System.Collections.Generic.SortedDictionary<float, int>
OCaml: 2.3s Set.Make
Sorry, that should have been:

OCaml: 11.5s Set.Make
OCaml: 2.3s Jean-Christophe Fillaitre's Imperative Heap

So .NET is actually doing quite well here. What I don't understand is
why .NET does fine on this benchmark but is doing very poorly on my actual
application. I'll try using SortedDictionary anyway: thanks!

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #14

P: n/a
Jon Harrop wrote:
What I don't understand is why .NET does fine on this benchmark but is
doing very poorly on my actual application.
I just discovered why: F# was generating a very inefficient comparison
function. Writing the comparison function by hand makes the whole program
7x faster!

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #15

P: n/a
Jon Harrop wrote:
Göran Andersson wrote:
>Jon Harrop wrote:
>>I need a data structure with the following capabilities:

1. Fast access to the smallest element.

2. Fast insertion of a new element.

3. Fast deletion of an existing element.

I have been using a sorted immutable balanced binary tree set
implementation that provides the above operations each in O(log n) but it
is extremely slow on .NET, around 8x slower than OCaml+Linux.

I do not benefit from the use of an immutable data structure, so what
mutable data structure would a C# programmer use?
How often do you need to get the smallest element compared to how often
you insert and delete elements?

1 get smallest to 4 inserts and 2 removes.
>You could create a wrapper that contains a Dictionary<Key,Valuewith
all items, and a short array Key[] with a few of the smallest items,
that you keep sorted. When you insert an item, check if it should be
added to the array of smallest items, and when you delete an item, check
if it's in the array of smallest items.

If the array of smallest items happens to be empty when you try to
access the smallest item (i.e. you have deleted all of the smallest
items), you take a hit as you have to repopulate the array by looking
through the dictionary, otherwise it is an O(1) operation.

That is O(n) instead of O(log n) though, so it is definitely slower for
sufficiently large problems.
Yes, that is slower when the array of smallest items is empty, but that
is far from every time you access the smallest item. If you for example
set the size of the array to 100, you have to delete at least 100 items
from the collection before you have to refill the array.

--
Göran Andersson
_____
http://www.guffa.com
Oct 18 '08 #16

P: n/a
Göran Andersson wrote:
Jon Harrop wrote:
>That is O(n) instead of O(log n) though, so it is definitely slower for
sufficiently large problems.

Yes, that is slower when the array of smallest items is empty, but that
is far from every time you access the smallest item. If you for example
set the size of the array to 100, you have to delete at least 100 items
from the collection before you have to refill the array.
And it will take 100x as long to repopulate the array every 100th call so
the overall cost is still O(n) whereas the original is O(log n).

Writing a real heap implementation, as not_a_commie suggested, is actually
O(1) for this operation and it runs a *lot* faster as a consequence. Once I
fixed my comparison function the performance of .NET was actually superb
using this technique...

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 18 '08 #17

This discussion thread is closed

Replies have been disabled for this discussion.