473,396 Members | 1,816 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Efficient sorted heap


I need a data structure with the following capabilities:

1. Fast access to the smallest element.

2. Fast insertion of a new element.

3. Fast deletion of an existing element.

I have been using a sorted immutable balanced binary tree set implementation
that provides the above operations each in O(log n) but it is extremely
slow on .NET, around 8x slower than OCaml+Linux.

I do not benefit from the use of an immutable data structure, so what
mutable data structure would a C# programmer use?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #1
16 7678
SortedDictionary<,is O(log n) for retreival, insert and removal; see
"remarks" here:

http://msdn.microsoft.com/en-us/library/f7fta44c.aspx

Note that this only allows one value per unique key. Of course, you
could always craft something grungy looking like ILookup<,>, using a
SortedDictionary<Foo, List<Bar>(not ideal).

Marc
Oct 17 '08 #2
Hi Marc

Do you know how this differs from Dictionary<T,Vinternally? I thought
that looking up an index based on a hash would be quicker than sorting the
keys. If you don't know OTTOYH I will spend some time later with reflector
:-)

--
Pete
====
http://mrpmorris.blogspot.com
http://www.capableobjects.com

Oct 17 '08 #3
Not of the top of my head; I would have to check with either reflector
or the reference source symbol server.

Marc
Oct 17 '08 #4
Jon Harrop wrote:
I need a data structure with the following capabilities:

1. Fast access to the smallest element.

2. Fast insertion of a new element.

3. Fast deletion of an existing element.

I have been using a sorted immutable balanced binary tree set implementation
that provides the above operations each in O(log n) but it is extremely
slow on .NET, around 8x slower than OCaml+Linux.

I do not benefit from the use of an immutable data structure, so what
mutable data structure would a C# programmer use?
How often do you need to get the smallest element compared to how often
you insert and delete elements?

You could create a wrapper that contains a Dictionary<Key,Valuewith
all items, and a short array Key[] with a few of the smallest items,
that you keep sorted. When you insert an item, check if it should be
added to the array of smallest items, and when you delete an item, check
if it's in the array of smallest items.

If the array of smallest items happens to be empty when you try to
access the smallest item (i.e. you have deleted all of the smallest
items), you take a hit as you have to repopulate the array by looking
through the dictionary, otherwise it is an O(1) operation.

--
Göran Andersson
_____
http://www.guffa.com
Oct 17 '08 #5
I can see that the hash could be more expensive for a key that is a long
string, but I wouldn't have thought many objects need such a complicated
hash do that? In my case there are no structs in my apps so mostely my keys
would be

enums
short strings
integers
instance of an object
I'd suspect in this case the Dictionary<T,Vwould probably be quicker?

--
Pete
====
http://mrpmorris.blogspot.com
http://www.capableobjects.com

Oct 17 '08 #6
I use a Binary Heap in my application. It is immediate (aka, O(1))
access to the smallest element. It is O(log n) to delete or insert an
element. Perhaps another type of heap (like a Soft Heap) would help
you. See the chart here: http://en.wikipedia.org/wiki/Fibonacci_heap.
I coded a Soft Heap in C# before. Unfortunately it does two
allocations per insert. This was significantly more expensive in
performance than my binary heap that stores everything in an array in
that lots of small allocations are expensive in C#. (The Binary Heap
changes the array size and copies when it runs out of room similar to
List<>).
Oct 17 '08 #7
Göran Andersson wrote:
Jon Harrop wrote:
>I need a data structure with the following capabilities:

1. Fast access to the smallest element.

2. Fast insertion of a new element.

3. Fast deletion of an existing element.

I have been using a sorted immutable balanced binary tree set
implementation that provides the above operations each in O(log n) but it
is extremely slow on .NET, around 8x slower than OCaml+Linux.

I do not benefit from the use of an immutable data structure, so what
mutable data structure would a C# programmer use?

How often do you need to get the smallest element compared to how often
you insert and delete elements?
1 get smallest to 4 inserts and 2 removes.
You could create a wrapper that contains a Dictionary<Key,Valuewith
all items, and a short array Key[] with a few of the smallest items,
that you keep sorted. When you insert an item, check if it should be
added to the array of smallest items, and when you delete an item, check
if it's in the array of smallest items.

If the array of smallest items happens to be empty when you try to
access the smallest item (i.e. you have deleted all of the smallest
items), you take a hit as you have to repopulate the array by looking
through the dictionary, otherwise it is an O(1) operation.
That is O(n) instead of O(log n) though, so it is definitely slower for
sufficiently large problems.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #8
Marc Gravell wrote:
SortedDictionary<,is O(log n) for retreival, insert and removal; see
"remarks" here:

http://msdn.microsoft.com/en-us/library/f7fta44c.aspx

Note that this only allows one value per unique key. Of course, you
could always craft something grungy looking like ILookup<,>, using a
SortedDictionary<Foo, List<Bar>(not ideal).
SortedDictionary does not appear to provide the minimum element. Also, its
second type parameter is superfluous in this case, so it will probably
waste space.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #9
Peter Morris wrote:
Hi Marc

Do you know how this differs from Dictionary<T,Vinternally?
SortedDictionary is a balanced binary tree. This is essentially the
implementation I have been using but it is extremely slow on .NET.
I thought that looking up an index based on a hash would be quicker than
sorting the keys.
Only if you know the key you are looking for which I do not: I need the
smallest element. So hashing is obviously worse than useless here.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #10
not_a_commie wrote:
I use a Binary Heap in my application. It is immediate (aka, O(1))
access to the smallest element. It is O(log n) to delete or insert an
element. Perhaps another type of heap (like a Soft Heap) would help
you. See the chart here: http://en.wikipedia.org/wiki/Fibonacci_heap.
I coded a Soft Heap in C# before. Unfortunately it does two
allocations per insert. This was significantly more expensive in
performance than my binary heap that stores everything in an array in
that lots of small allocations are expensive in C#. (The Binary Heap
changes the array size and copies when it runs out of room similar to
List<>).
Thanks for the lead. I'll check this out.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #11
SortedDictionary does not appear to provide the minimum element.

Since it is sorted, it'll be the first one.

But if the performance isn't up to it, then fine; just a thought...

Marc
Oct 17 '08 #12
Marc Gravell wrote:
>SortedDictionary does not appear to provide the minimum element.

Since it is sorted, it'll be the first one.
Right. So you create an enumeration of the keys, get the first key and use
that to remove the first element:

ignore(Heap.Remove(Seq.hd(heap.Keys)))
But if the performance isn't up to it, then fine; just a thought...
Actually, the performance of SortedDictionary is substantially better than
the performance of the data structure I was using.

Time taken to insert 10^6 floats and then remove them all by popping the
smallest:

F#.NET: 13.4s Set.Make
F#.NET: 5.6s System.Collections.Generic.SortedDictionary<float, obj>
F#.NET: 4.1s System.Collections.Generic.SortedDictionary<float, float>
F#.NET: 3.8s System.Collections.Generic.SortedDictionary<float, int>
OCaml: 2.3s Set.Make

The performance of SortedDictionary could be improved upon in this context
because the values are not used and can be removed and it seems wasteful to
create an enumeration just to get the first element.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #13
Jon Harrop wrote:
Marc Gravell wrote:
>>SortedDictionary does not appear to provide the minimum element.

Since it is sorted, it'll be the first one.

Right. So you create an enumeration of the keys, get the first key and use
that to remove the first element:

ignore(Heap.Remove(Seq.hd(heap.Keys)))
>But if the performance isn't up to it, then fine; just a thought...

Actually, the performance of SortedDictionary is substantially better than
the performance of the data structure I was using.

Time taken to insert 10^6 floats and then remove them all by popping the
smallest:

F#.NET: 13.4s Set.Make
F#.NET: 5.6s System.Collections.Generic.SortedDictionary<float, obj>
F#.NET: 4.1s System.Collections.Generic.SortedDictionary<float, float>
F#.NET: 3.8s System.Collections.Generic.SortedDictionary<float, int>
OCaml: 2.3s Set.Make
Sorry, that should have been:

OCaml: 11.5s Set.Make
OCaml: 2.3s Jean-Christophe Fillaitre's Imperative Heap

So .NET is actually doing quite well here. What I don't understand is
why .NET does fine on this benchmark but is doing very poorly on my actual
application. I'll try using SortedDictionary anyway: thanks!

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #14
Jon Harrop wrote:
What I don't understand is why .NET does fine on this benchmark but is
doing very poorly on my actual application.
I just discovered why: F# was generating a very inefficient comparison
function. Writing the comparison function by hand makes the whole program
7x faster!

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 17 '08 #15
Jon Harrop wrote:
Göran Andersson wrote:
>Jon Harrop wrote:
>>I need a data structure with the following capabilities:

1. Fast access to the smallest element.

2. Fast insertion of a new element.

3. Fast deletion of an existing element.

I have been using a sorted immutable balanced binary tree set
implementation that provides the above operations each in O(log n) but it
is extremely slow on .NET, around 8x slower than OCaml+Linux.

I do not benefit from the use of an immutable data structure, so what
mutable data structure would a C# programmer use?
How often do you need to get the smallest element compared to how often
you insert and delete elements?

1 get smallest to 4 inserts and 2 removes.
>You could create a wrapper that contains a Dictionary<Key,Valuewith
all items, and a short array Key[] with a few of the smallest items,
that you keep sorted. When you insert an item, check if it should be
added to the array of smallest items, and when you delete an item, check
if it's in the array of smallest items.

If the array of smallest items happens to be empty when you try to
access the smallest item (i.e. you have deleted all of the smallest
items), you take a hit as you have to repopulate the array by looking
through the dictionary, otherwise it is an O(1) operation.

That is O(n) instead of O(log n) though, so it is definitely slower for
sufficiently large problems.
Yes, that is slower when the array of smallest items is empty, but that
is far from every time you access the smallest item. If you for example
set the size of the array to 100, you have to delete at least 100 items
from the collection before you have to refill the array.

--
Göran Andersson
_____
http://www.guffa.com
Oct 18 '08 #16
Göran Andersson wrote:
Jon Harrop wrote:
>That is O(n) instead of O(log n) though, so it is definitely slower for
sufficiently large problems.

Yes, that is slower when the array of smallest items is empty, but that
is far from every time you access the smallest item. If you for example
set the size of the array to 100, you have to delete at least 100 items
from the collection before you have to refill the array.
And it will take 100x as long to repopulate the array every 100th call so
the overall cost is still O(n) whereas the original is O(log n).

Writing a real heap implementation, as not_a_commie suggested, is actually
O(1) for this operation and it runs a *lot* faster as a consequence. Once I
fixed my comparison function the performance of .NET was actually superb
using this technique...

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
Oct 18 '08 #17

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Raghavendra Mahuli | last post by:
i need to store a lot of integers(say 10,000) in a datastructure. The constraints are: 1. It should be fast. 2. It should be orderded or sorted. 3.Insterting and deleting should be fast. 4. The...
24
by: Lasse Vågsæther Karlsen | last post by:
I need to merge several sources of values into one stream of values. All of the sources are sorted already and I need to retrieve the values from them all in sorted order. In other words: s1 = ...
1
by: Dan H. | last post by:
Hello, I have an application that requires a collection that is sorted by a double field first, and then by a long field. So, lets say I have an class that has 2 fields called time, priority. I...
3
by: Daniel Weinand | last post by:
hello ng, i have a problem and a imho an insufficient method of solution. strings should be sorted by specific text pattern and dispayed in groups. the strings are stored in a db and have the...
11
by: hoopsho | last post by:
Hi Everyone, I am trying to write a program that does a few things very fast and with efficient use of memory... a) I need to parse a space-delimited file that is really large, upwards fo a...
22
by: Curious | last post by:
Hi, I am searching for a data structure that stores key-value pairs in it. This data structure is to hold large amounts of key-value pairs, and so needs to be efficient both in insertion and...
1
by: KK | last post by:
Hi, I'm trying to come up with an efficient way to solve the following problem. Let A = { (1, b), (2, d), (3, a), (4, c) ,... } B = { (1, d), (2, a), (3, e), (4, f),... } where a,b,c......
74
by: copx | last post by:
In "Learning Standard C++ as a New Language" Bjarne Stroustrup claims that properly written C++ outperforms C code. I will just copy his first example here, which is supposed to demonstrate how C++...
0
by: sturlamolden | last post by:
On May 25, 8:02 pm, Rodrigo Lazo <rlazo....@gmail.comwrote: Heap is the data structure to use for 'fast (nearly) sorted inserts'. But heapq do not support (as far as I know) deletion of...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.