473,804 Members | 2,194 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Stable priority queue

Hello,

I understand that an easy way to make the standard std::priority_q ueue
stable is by including an integer stamp with each node that is
incremented each time a new node is pushed into the queue. However, no
matter what reasonably-sized type I use for the stamp, eventually the
stamp will 'wrap around' and possibly cause incorrect ordering of
elements. For my application, which queues elements very quickly and
runs for an indefinite amount of time, this scenario is a real concern,
and will eventually cause incorrect results.

Is there any easy and painless way of correcting this order-stamp
wraparound problem?
Aaron W. LaFramboise

Jul 22 '05
38 5807
"Siemel Naran" <Si*********@RE MOVE.att.net> wrote:
Siemel Naran wrote:
Then maybe we can use the protected member 'c'.


'std::priority_ queue' actually just uses 'std::make_heap ', etc. Anyway,
accessing the member directly is no good: the internal organization of
the heap essentially makes immediate stability without some tool
impossible.


I don't understand what you mean. The protected member 'c' is already has
the data organized in the exact manner.


The problem is, that you cannot get stable behavior of the priority queue
through simply manipulating the member 'c' in an 'std::priority_ queue'.
The way elements in a priority queue simple does not provide any useful
form of stability - unless you somehow encode the order in the key used
for comparison.
Thanks for jogging my memory. I implemented it once. It was an array of
fixed length (2^n)-1, and was essentially a perfectly balanced binary tree
stored in a regular C array.
I implemented a bunch of heap variations, including two array based version
(one allowing key modification, the other not allowing it; the one allowing
key modifications is a *lot* slower!). The implementation should be available
from <http://www.dietmar-kuehl.de/cxxrt/heaps.tar.gz>.
Now out of curiosity, how much faster is a heap implemented through
std::priority_q ueue<T, std::vector<T>> using make_heap, over one implemented
through std::set<T>.
It depends somewhat on 'T': if 'T' is expensive to copy or swap, the vector
version will probably eventually become slower with increasing costs. For
'T' being something like 'int' the vector version is something like ten
times faster than typical node based approaches, including sets. At least,
this is what I seem to remember from running very naive tests.
A related question. Have you or anyone measured the time improvement of
inserting N elements into a std::vector<T> using reserve(N) and
push_back(const value_type&) and calling strd::sort, versus inserting N
elements into a std::multimap<T > by calling insert(const value_type&)?
I don't know for the inserts themselves but if the set of elements is fixed,
using an array with binary search is *much* faster than using 'std::map' or
'std::multimap' (again, I don't have current figures...). Likewise, a hash
(see eg. the library TR's 'std::tr1::unor dered_map' and
'std::tr1::unor dered_multimap' ) is much faster. Of course, these figure also
depend somewhat on the data types.
This form of heap is called a "d-heap" in its general form where "d"
is the arity of the tree: the same approach works with non-binary
trees, too, although the computation of child and parent nodes changes
slightly. When I compared different priority queues, I found that
"d == 5" had the best performance for integer values stored in the
heap. An even bigger 'd' is probably better if the nodes itself become
bigger. Of course, this was based on a few tests with my own
implementation and probably does not hold up to a scientific analysis.


What does this term "arity" mean? Perhaps the heap I implemented has arity
= 2 because each parent has 2 children?


Yes, this is what I mean with "arity": the [maximum] number of children of a
node. Of course, in a d-heap only leaves and at most one other node have less
than 'd' children.
What do you mean by the "nodes itself become bigger"? Is it that sizeof(T)
where T is the object stored in the heap gets bigger, or that the number of
nodes increases?
I mean 'sizeof(T)', ie. the data stored in a node: with bigger 'd' the number
of necessary moves when modifying the heap becomes smaller. If the type 'T'
is "bigger" or at least more expensive to move, the cost of the moves
are more dominant than the costs of other operations.

[relating to Fibonacci-heaps]
Thanks for the info. Changing a key in O(1) time is fascinating.


It is not O(1), it is *amortized* O(1): occasionally a higher costs is
incurred but if this cost can be averaged over multiple operations and
then becomes O(1) again. Of course, changing the key involves reordering
the corresponding node such that the other performance guarantees still
hold. Just the actual change of the key is trivially O(1) but doing the
reordering in amortized O(1) is somewhat tricky.
--
<mailto:di***** ******@yahoo.co m> <http://www.dietmar-kuehl.de/>
<www.contendix. com> - Software Development & Consulting
Jul 22 '05 #31
Claudio Puviani wrote:
Gaa! Do people no longer do basic back-of-the envelope calculations?
Yes, in fact.

If a 5GHz computer had an integer increment that took only one CPU cycle and
did NOTHING but increment a 64-bit int 5 billion times per second 24-hours a
day without interruption, it would take 117 _YEARS_ before you ran out of
numbers!!! If you throw in the obligatory jump to implement the loop and
made the jump also take a single CPU cycle, the duration leaps to a whopping
234 years.
Well, my original question was related to a rewrite of some fairly
critical support code within a library that my organization intends to
reuse for many projects in the future. It is the intent that the code
be written such that its useful lifetime may be maximized. Certainly,
the code is expected to be useful for at least a decade.

Lets say this code is eventually applied to some sort of computer
intended to run for a year without restarting. (Not coincidentally,
this is the goal for some of our present systems.) If a 64-bit integer
was used, as I assume you are suggesting, that would mean the processor
or processors handling this queue could handle no more than about
600,000,000,000 elements per second. You are quite right to note that
the best processor availible presently at a reasonable price is safe by
a margin of several orders of magnitude. However, in ten years, do you
think you will still be safe? What about multiprocessor systems, which,
by the way, many are predicting will someday be commonplace in even
ordinary PCs? I don't know about you, but ten years ago, I think I was
using a 32MHz 386.

The most generous thing I can say about using a 128-bit integer for this
purpose is that it's absurdly over-engineered. There's a lot worse that
could be said about the idea.


Speaking of absurd engineering:

unsigned long long stamp; /* XXX FIXME hello if you are a programmer
working on this code in 2014, could you please check to make sure this
is still big enough? */

I have been trained--and I beleive the training to be good--to not make
assumptions in code I write that may well not be true in the foreseeable
future. That is largely why I posted to this group.

On the other hand, I would feel a lot better about using 96 bits or 128
bits. By the time those integers would overflow, with any luck, I'll be
a rich, retired man. I was concerned, though, that the stamp portion of
the queue element was a lot larger than the rest of the data, and this
would have an unfortunate effect on efficiency. Oh well. :)

I wonder if people posting concerns about Y2K wraparound in the 1980's
were also flamed?
Aaron W. LaFramboise

Jul 22 '05 #32
Claudio Puviani wrote:
Gaa! Do people no longer do basic back-of-the envelope calculations?
Yes, in fact.

If a 5GHz computer had an integer increment that took only one CPU cycle and
did NOTHING but increment a 64-bit int 5 billion times per second 24-hours a
day without interruption, it would take 117 _YEARS_ before you ran out of
numbers!!! If you throw in the obligatory jump to implement the loop and
made the jump also take a single CPU cycle, the duration leaps to a whopping
234 years.
Well, my original question was related to a rewrite of some fairly
critical support code within a library that my organization intends to
reuse for many projects in the future. It is the intent that the code
be written such that its useful lifetime may be maximized. Certainly,
the code is expected to be useful for at least a decade.

Lets say this code is eventually applied to some sort of computer
intended to run for a year without restarting. (Not coincidentally,
this is the goal for some of our present systems.) If a 64-bit integer
was used, as I assume you are suggesting, that would mean the processor
or processors handling this queue could handle no more than about
600,000,000,000 elements per second. You are quite right to note that
the best processor availible presently at a reasonable price is safe by
a margin of several orders of magnitude. However, in ten years, do you
think you will still be safe? What about multiprocessor systems, which,
by the way, many are predicting will someday be commonplace in even
ordinary PCs? I don't know about you, but ten years ago, I think I was
using a 32MHz 386.

The most generous thing I can say about using a 128-bit integer for this
purpose is that it's absurdly over-engineered. There's a lot worse that
could be said about the idea.


Speaking of absurd engineering:

unsigned long long stamp; /* XXX FIXME hello if you are a programmer
working on this code in 2014, could you please check to make sure this
is still big enough? */

I have been trained--and I beleive the training to be good--to not make
assumptions in code I write that may well not be true in the foreseeable
future. That is largely why I posted to this group.

On the other hand, I would feel a lot better about using 96 bits or 128
bits. By the time those integers would overflow, with any luck, I'll be
a rich, retired man. I was concerned, though, that the stamp portion of
the queue element was a lot larger than the rest of the data, and this
would have an unfortunate effect on efficiency. Oh well. :)

I wonder if people posting concerns about Y2K wraparound in the 1980's
were also flamed?
Aaron W. LaFramboise

Jul 22 '05 #33
"Dietmar Kuehl" <di***********@ yahoo.com> wrote in message
"Siemel Naran" <Si*********@RE MOVE.att.net> wrote:
Then maybe we can use the protected member 'c'. 'std::priority_ queue' actually just uses 'std::make_heap ', etc. Anyway,
accessing the member directly is no good: the internal organization of
the heap essentially makes immediate stability without some tool
impossible.
I don't understand what you mean. The protected member 'c' is already has the data organized in the exact manner.
The problem is, that you cannot get stable behavior of the priority queue
through simply manipulating the member 'c' in an 'std::priority_ queue'.
The way elements in a priority queue simple does not provide any useful
form of stability - unless you somehow encode the order in the key used
for comparison.


Why not? Yes, it could be a problem if we write the container 'c' in one
platform/compiler that implements heaps one way, then read the container 'c'
in another platform/compiler that implements heaps another way. But if we
read and write on the same platform it should be OK, right?

Thanks for jogging my memory. I implemented it once. It was an array of fixed length (2^n)-1, and was essentially a perfectly balanced binary tree stored in a regular C array.


I implemented a bunch of heap variations, including two array based

version (one allowing key modification, the other not allowing it; the one allowing key modifications is a *lot* slower!). The implementation should be available from <http://www.dietmar-kuehl.de/cxxrt/heaps.tar.gz>.
Thanks for this and all the other info.

I think if T objects get big we could use shared_ptr<T> objects instead.

Now out of curiosity, how much faster is a heap implemented through
std::priority_q ueue<T, std::vector<T>> using make_heap, over one implemented through std::set<T>.


It depends somewhat on 'T': if 'T' is expensive to copy or swap, the

vector version will probably eventually become slower with increasing costs. For
'T' being something like 'int' the vector version is something like ten
times faster than typical node based approaches, including sets. At least,
this is what I seem to remember from running very naive tests.
A related question. Have you or anyone measured the time improvement of
inserting N elements into a std::vector<T> using reserve(N) and
push_back(const value_type&) and calling strd::sort, versus inserting N
elements into a std::multimap<T > by calling insert(const value_type&)?
I don't know for the inserts themselves but if the set of elements is

fixed, using an array with binary search is *much* faster than using 'std::map' or 'std::multimap' (again, I don't have current figures...). Likewise, a hash
(see eg. the library TR's 'std::tr1::unor dered_map' and
'std::tr1::unor dered_multimap' ) is much faster. Of course, these figure also depend somewhat on the data types.
This form of heap is called a "d-heap" in its general form where "d"
is the arity of the tree: the same approach works with non-binary
trees, too, although the computation of child and parent nodes changes
slightly. When I compared different priority queues, I found that
"d == 5" had the best performance for integer values stored in the
heap. An even bigger 'd' is probably better if the nodes itself become
bigger. Of course, this was based on a few tests with my own
implementation and probably does not hold up to a scientific analysis.
What does this term "arity" mean? Perhaps the heap I implemented has arity = 2 because each parent has 2 children?


Yes, this is what I mean with "arity": the [maximum] number of children of

a node. Of course, in a d-heap only leaves and at most one other node have less than 'd' children.
What do you mean by the "nodes itself become bigger"? Is it that sizeof(T) where T is the object stored in the heap gets bigger, or that the number of nodes increases?
I mean 'sizeof(T)', ie. the data stored in a node: with bigger 'd' the

number of necessary moves when modifying the heap becomes smaller. If the type 'T' is "bigger" or at least more expensive to move, the cost of the moves
are more dominant than the costs of other operations.

[relating to Fibonacci-heaps]
Thanks for the info. Changing a key in O(1) time is fascinating.


It is not O(1), it is *amortized* O(1): occasionally a higher costs is
incurred but if this cost can be averaged over multiple operations and
then becomes O(1) again. Of course, changing the key involves reordering
the corresponding node such that the other performance guarantees still
hold. Just the actual change of the key is trivially O(1) but doing the
reordering in amortized O(1) is somewhat tricky.

Jul 22 '05 #34
"Dietmar Kuehl" <di***********@ yahoo.com> wrote in message
"Siemel Naran" <Si*********@RE MOVE.att.net> wrote:
Then maybe we can use the protected member 'c'. 'std::priority_ queue' actually just uses 'std::make_heap ', etc. Anyway,
accessing the member directly is no good: the internal organization of
the heap essentially makes immediate stability without some tool
impossible.
I don't understand what you mean. The protected member 'c' is already has the data organized in the exact manner.
The problem is, that you cannot get stable behavior of the priority queue
through simply manipulating the member 'c' in an 'std::priority_ queue'.
The way elements in a priority queue simple does not provide any useful
form of stability - unless you somehow encode the order in the key used
for comparison.


Why not? Yes, it could be a problem if we write the container 'c' in one
platform/compiler that implements heaps one way, then read the container 'c'
in another platform/compiler that implements heaps another way. But if we
read and write on the same platform it should be OK, right?

Thanks for jogging my memory. I implemented it once. It was an array of fixed length (2^n)-1, and was essentially a perfectly balanced binary tree stored in a regular C array.


I implemented a bunch of heap variations, including two array based

version (one allowing key modification, the other not allowing it; the one allowing key modifications is a *lot* slower!). The implementation should be available from <http://www.dietmar-kuehl.de/cxxrt/heaps.tar.gz>.
Thanks for this and all the other info.

I think if T objects get big we could use shared_ptr<T> objects instead.

Now out of curiosity, how much faster is a heap implemented through
std::priority_q ueue<T, std::vector<T>> using make_heap, over one implemented through std::set<T>.


It depends somewhat on 'T': if 'T' is expensive to copy or swap, the

vector version will probably eventually become slower with increasing costs. For
'T' being something like 'int' the vector version is something like ten
times faster than typical node based approaches, including sets. At least,
this is what I seem to remember from running very naive tests.
A related question. Have you or anyone measured the time improvement of
inserting N elements into a std::vector<T> using reserve(N) and
push_back(const value_type&) and calling strd::sort, versus inserting N
elements into a std::multimap<T > by calling insert(const value_type&)?
I don't know for the inserts themselves but if the set of elements is

fixed, using an array with binary search is *much* faster than using 'std::map' or 'std::multimap' (again, I don't have current figures...). Likewise, a hash
(see eg. the library TR's 'std::tr1::unor dered_map' and
'std::tr1::unor dered_multimap' ) is much faster. Of course, these figure also depend somewhat on the data types.
This form of heap is called a "d-heap" in its general form where "d"
is the arity of the tree: the same approach works with non-binary
trees, too, although the computation of child and parent nodes changes
slightly. When I compared different priority queues, I found that
"d == 5" had the best performance for integer values stored in the
heap. An even bigger 'd' is probably better if the nodes itself become
bigger. Of course, this was based on a few tests with my own
implementation and probably does not hold up to a scientific analysis.
What does this term "arity" mean? Perhaps the heap I implemented has arity = 2 because each parent has 2 children?


Yes, this is what I mean with "arity": the [maximum] number of children of

a node. Of course, in a d-heap only leaves and at most one other node have less than 'd' children.
What do you mean by the "nodes itself become bigger"? Is it that sizeof(T) where T is the object stored in the heap gets bigger, or that the number of nodes increases?
I mean 'sizeof(T)', ie. the data stored in a node: with bigger 'd' the

number of necessary moves when modifying the heap becomes smaller. If the type 'T' is "bigger" or at least more expensive to move, the cost of the moves
are more dominant than the costs of other operations.

[relating to Fibonacci-heaps]
Thanks for the info. Changing a key in O(1) time is fascinating.


It is not O(1), it is *amortized* O(1): occasionally a higher costs is
incurred but if this cost can be averaged over multiple operations and
then becomes O(1) again. Of course, changing the key involves reordering
the corresponding node such that the other performance guarantees still
hold. Just the actual change of the key is trivially O(1) but doing the
reordering in amortized O(1) is somewhat tricky.

Jul 22 '05 #35
"Siemel Naran" <Si*********@RE MOVE.att.net> wrote in message
news:FcQcc.3016 6
We could also use std::bitset<64> , but it doesn't have an operator++ or
operator+.


Anyway know how to implement a quick operator++ for std::bitset?
Jul 22 '05 #36
"Siemel Naran" <Si*********@RE MOVE.att.net> wrote in message
news:FcQcc.3016 6
We could also use std::bitset<64> , but it doesn't have an operator++ or
operator+.


Anyway know how to implement a quick operator++ for std::bitset?
Jul 22 '05 #37
"Siemel Naran" <Si*********@RE MOVE.att.net> wrote
"Claudio Puviani" <pu*****@hotmai l.com> wrote
If a 5GHz computer had an integer increment that
took only one CPU cycle and did NOTHING but increment
a 64-bit int 5 billion times per second 24-hours a day without
interruption, it would take 117 _YEARS_ before you ran out
of numbers!!! If you throw in the obligatory jump to implement
the loop and made the jump also take a single CPU cycle, the
duration leaps to a whopping 234 years.


How do you get 117 years? In a 64 integer there are 2^64 =
1.84... * 10^19 possible numbers. In one year we can perform
86,400 seconds * 5 billion increments/second = 4.32 * 10^14
increments. I believe 1 billion = 10^9. Then 2^64 / 4.32e14 =
42700 years. What am I doing wrong in my calculation?


Ops/sec = 5e9
Ops/min = 5e9 ops/sec * 60 sec/min = 3e11
Ops/hr = 3e11 ops/min * 60 min/hr = 1.8e13
Ops/day = 1.8e13 ops/hr * 24 hr/day = 4.32e14
Ops/yr = 4.32e14 ops/day * 365 day/year = 1.577e17
Ops = 2^64 = 1.84e19
Duration = 1.84e19 ops / 1.577e17 ops/yr = 117 years

It looks like you just forgot to factor in the 365 days per year.

Claudio Puviani
Jul 22 '05 #38
"Siemel Naran" <Si*********@RE MOVE.att.net> wrote
"Claudio Puviani" <pu*****@hotmai l.com> wrote
If a 5GHz computer had an integer increment that
took only one CPU cycle and did NOTHING but increment
a 64-bit int 5 billion times per second 24-hours a day without
interruption, it would take 117 _YEARS_ before you ran out
of numbers!!! If you throw in the obligatory jump to implement
the loop and made the jump also take a single CPU cycle, the
duration leaps to a whopping 234 years.


How do you get 117 years? In a 64 integer there are 2^64 =
1.84... * 10^19 possible numbers. In one year we can perform
86,400 seconds * 5 billion increments/second = 4.32 * 10^14
increments. I believe 1 billion = 10^9. Then 2^64 / 4.32e14 =
42700 years. What am I doing wrong in my calculation?


Ops/sec = 5e9
Ops/min = 5e9 ops/sec * 60 sec/min = 3e11
Ops/hr = 3e11 ops/min * 60 min/hr = 1.8e13
Ops/day = 1.8e13 ops/hr * 24 hr/day = 4.32e14
Ops/yr = 4.32e14 ops/day * 365 day/year = 1.577e17
Ops = 2^64 = 1.84e19
Duration = 1.84e19 ops / 1.577e17 ops/yr = 117 years

It looks like you just forgot to factor in the 365 days per year.

Claudio Puviani
Jul 22 '05 #39

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

38
671
by: Aaron W. LaFramboise | last post by:
Hello, I understand that an easy way to make the standard std::priority_queue stable is by including an integer stamp with each node that is incremented each time a new node is pushed into the queue. However, no matter what reasonably-sized type I use for the stamp, eventually the stamp will 'wrap around' and possibly cause incorrect ordering of elements. For my application, which queues elements very quickly and runs for an...
5
13251
by: Dan H. | last post by:
Hello, I have implemented a C# priority queue using an ArrayList. The objects being inserted into the priority queue are being sorted by 2 fields, Time (ulong) and Priority (0-100). When I enqueue, I do a binary search on where to put the object and then insert using that index and arraylist.insert(index, obj) The bottom of my arraylist is always my lowest, therefore, dequeueing is very fast and effecient.
0
9715
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9595
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10353
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10356
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10099
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7643
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6869
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5675
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4314
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.