Stable priority queue

Aaron W. LaFramboise

Hello,

I understand that an easy way to make the standard std::priority_q ueue
stable is by including an integer stamp with each node that is
incremented each time a new node is pushed into the queue. However, no
matter what reasonably-sized type I use for the stamp, eventually the
stamp will 'wrap around' and possibly cause incorrect ordering of
elements. For my application, which queues elements very quickly and
runs for an indefinite amount of time, this scenario is a real concern,
and will eventually cause incorrect results.

Is there any easy and painless way of correcting this order-stamp
wraparound problem?
Aaron W. LaFramboise

Jul 22 '05

Subscribe Reply

5807

« First
<
2
3
4

Dietmar Kuehl

"Siemel Naran" <Si*********@RE MOVE.att.net> wrote:

Siemel Naran wrote:
Then maybe we can use the protected member 'c'.

'std::priority_ queue' actually just uses 'std::make_heap ', etc. Anyway,
accessing the member directly is no good: the internal organization of
the heap essentially makes immediate stability without some tool
impossible.

I don't understand what you mean. The protected member 'c' is already has
the data organized in the exact manner.

The problem is, that you cannot get stable behavior of the priority queue
through simply manipulating the member 'c' in an 'std::priority_ queue'.
The way elements in a priority queue simple does not provide any useful
form of stability - unless you somehow encode the order in the key used
for comparison.
Thanks for jogging my memory. I implemented it once. It was an array of
fixed length (2^n)-1, and was essentially a perfectly balanced binary tree
stored in a regular C array.
I implemented a bunch of heap variations, including two array based version
(one allowing key modification, the other not allowing it; the one allowing
key modifications is a *lot* slower!). The implementation should be available
from <http://www.dietmar-kuehl.de/cxxrt/heaps.tar.gz>.
Now out of curiosity, how much faster is a heap implemented through
std::priority_q ueue<T, std::vector<T>> using make_heap, over one implemented
through std::set<T>.
It depends somewhat on 'T': if 'T' is expensive to copy or swap, the vector
version will probably eventually become slower with increasing costs. For
'T' being something like 'int' the vector version is something like ten
times faster than typical node based approaches, including sets. At least,
this is what I seem to remember from running very naive tests.
A related question. Have you or anyone measured the time improvement of
inserting N elements into a std::vector<T> using reserve(N) and
push_back(const value_type&) and calling strd::sort, versus inserting N
elements into a std::multimap<T > by calling insert(const value_type&)?
I don't know for the inserts themselves but if the set of elements is fixed,
using an array with binary search is *much* faster than using 'std::map' or
'std::multimap' (again, I don't have current figures...). Likewise, a hash
(see eg. the library TR's 'std::tr1::unor dered_map' and
'std::tr1::unor dered_multimap' ) is much faster. Of course, these figure also
depend somewhat on the data types.

This form of heap is called a "d-heap" in its general form where "d"
is the arity of the tree: the same approach works with non-binary
trees, too, although the computation of child and parent nodes changes
slightly. When I compared different priority queues, I found that
"d == 5" had the best performance for integer values stored in the
heap. An even bigger 'd' is probably better if the nodes itself become
bigger. Of course, this was based on a few tests with my own
implementation and probably does not hold up to a scientific analysis.

What does this term "arity" mean? Perhaps the heap I implemented has arity
= 2 because each parent has 2 children?

Yes, this is what I mean with "arity": the [maximum] number of children of a
node. Of course, in a d-heap only leaves and at most one other node have less
than 'd' children.
What do you mean by the "nodes itself become bigger"? Is it that sizeof(T)
where T is the object stored in the heap gets bigger, or that the number of
nodes increases?
I mean 'sizeof(T)', ie. the data stored in a node: with bigger 'd' the number
of necessary moves when modifying the heap becomes smaller. If the type 'T'
is "bigger" or at least more expensive to move, the cost of the moves
are more dominant than the costs of other operations.

[relating to Fibonacci-heaps]
Thanks for the info. Changing a key in O(1) time is fascinating.

It is not O(1), it is *amortized* O(1): occasionally a higher costs is
incurred but if this cost can be averaged over multiple operations and
then becomes O(1) again. Of course, changing the key involves reordering
the corresponding node such that the other performance guarantees still
hold. Just the actual change of the key is trivially O(1) but doing the
reordering in amortized O(1) is somewhat tricky.
--
<mailto:di***** ******@yahoo.co m> <http://www.dietmar-kuehl.de/>
<www.contendix. com> - Software Development & Consulting

Jul 22 '05 #31

Aaron W. LaFramboise

Claudio Puviani wrote:

Gaa! Do people no longer do basic back-of-the envelope calculations?
Yes, in fact.

If a 5GHz computer had an integer increment that took only one CPU cycle and
did NOTHING but increment a 64-bit int 5 billion times per second 24-hours a
day without interruption, it would take 117 _YEARS_ before you ran out of
numbers!!! If you throw in the obligatory jump to implement the loop and
made the jump also take a single CPU cycle, the duration leaps to a whopping
234 years.
Well, my original question was related to a rewrite of some fairly
critical support code within a library that my organization intends to
reuse for many projects in the future. It is the intent that the code
be written such that its useful lifetime may be maximized. Certainly,
the code is expected to be useful for at least a decade.

Lets say this code is eventually applied to some sort of computer
intended to run for a year without restarting. (Not coincidentally,
this is the goal for some of our present systems.) If a 64-bit integer
was used, as I assume you are suggesting, that would mean the processor
or processors handling this queue could handle no more than about
600,000,000,000 elements per second. You are quite right to note that
the best processor availible presently at a reasonable price is safe by
a margin of several orders of magnitude. However, in ten years, do you
think you will still be safe? What about multiprocessor systems, which,
by the way, many are predicting will someday be commonplace in even
ordinary PCs? I don't know about you, but ten years ago, I think I was
using a 32MHz 386.

The most generous thing I can say about using a 128-bit integer for this
purpose is that it's absurdly over-engineered. There's a lot worse that
could be said about the idea.

Speaking of absurd engineering:

unsigned long long stamp; /* XXX FIXME hello if you are a programmer
working on this code in 2014, could you please check to make sure this
is still big enough? */

I have been trained--and I beleive the training to be good--to not make
assumptions in code I write that may well not be true in the foreseeable
future. That is largely why I posted to this group.

On the other hand, I would feel a lot better about using 96 bits or 128
bits. By the time those integers would overflow, with any luck, I'll be
a rich, retired man. I was concerned, though, that the stamp portion of
the queue element was a lot larger than the rest of the data, and this
would have an unfortunate effect on efficiency. Oh well. :)

I wonder if people posting concerns about Y2K wraparound in the 1980's
were also flamed?
Aaron W. LaFramboise

Jul 22 '05 #32

Aaron W. LaFramboise

Claudio Puviani wrote:

Gaa! Do people no longer do basic back-of-the envelope calculations?
Yes, in fact.

If a 5GHz computer had an integer increment that took only one CPU cycle and
did NOTHING but increment a 64-bit int 5 billion times per second 24-hours a
day without interruption, it would take 117 _YEARS_ before you ran out of
numbers!!! If you throw in the obligatory jump to implement the loop and
made the jump also take a single CPU cycle, the duration leaps to a whopping
234 years.
Well, my original question was related to a rewrite of some fairly
critical support code within a library that my organization intends to
reuse for many projects in the future. It is the intent that the code
be written such that its useful lifetime may be maximized. Certainly,
the code is expected to be useful for at least a decade.

Lets say this code is eventually applied to some sort of computer
intended to run for a year without restarting. (Not coincidentally,
this is the goal for some of our present systems.) If a 64-bit integer
was used, as I assume you are suggesting, that would mean the processor
or processors handling this queue could handle no more than about
600,000,000,000 elements per second. You are quite right to note that
the best processor availible presently at a reasonable price is safe by
a margin of several orders of magnitude. However, in ten years, do you
think you will still be safe? What about multiprocessor systems, which,
by the way, many are predicting will someday be commonplace in even
ordinary PCs? I don't know about you, but ten years ago, I think I was
using a 32MHz 386.

The most generous thing I can say about using a 128-bit integer for this
purpose is that it's absurdly over-engineered. There's a lot worse that
could be said about the idea.

Jul 22 '05 #33

Siemel Naran

"Dietmar Kuehl" <di***********@ yahoo.com> wrote in message

"Siemel Naran" <Si*********@RE MOVE.att.net> wrote:
Then maybe we can use the protected member 'c'. 'std::priority_ queue' actually just uses 'std::make_heap ', etc. Anyway,
accessing the member directly is no good: the internal organization of
the heap essentially makes immediate stability without some tool
impossible.
I don't understand what you mean. The protected member 'c' is already has the data organized in the exact manner.
The problem is, that you cannot get stable behavior of the priority queue
through simply manipulating the member 'c' in an 'std::priority_ queue'.
The way elements in a priority queue simple does not provide any useful
form of stability - unless you somehow encode the order in the key used
for comparison.

Why not? Yes, it could be a problem if we write the container 'c' in one
platform/compiler that implements heaps one way, then read the container 'c'
in another platform/compiler that implements heaps another way. But if we
read and write on the same platform it should be OK, right?

Thanks for jogging my memory. I implemented it once. It was an array of fixed length (2^n)-1, and was essentially a perfectly balanced binary tree stored in a regular C array.

I implemented a bunch of heap variations, including two array based

version (one allowing key modification, the other not allowing it; the one allowing key modifications is a *lot* slower!). The implementation should be available from <http://www.dietmar-kuehl.de/cxxrt/heaps.tar.gz>.
Thanks for this and all the other info.

I think if T objects get big we could use shared_ptr<T> objects instead.

Now out of curiosity, how much faster is a heap implemented through
std::priority_q ueue<T, std::vector<T>> using make_heap, over one implemented through std::set<T>.

It depends somewhat on 'T': if 'T' is expensive to copy or swap, the

vector version will probably eventually become slower with increasing costs. For
'T' being something like 'int' the vector version is something like ten
times faster than typical node based approaches, including sets. At least,
this is what I seem to remember from running very naive tests.
A related question. Have you or anyone measured the time improvement of
inserting N elements into a std::vector<T> using reserve(N) and
push_back(const value_type&) and calling strd::sort, versus inserting N
elements into a std::multimap<T > by calling insert(const value_type&)?
I don't know for the inserts themselves but if the set of elements is

fixed, using an array with binary search is *much* faster than using 'std::map' or 'std::multimap' (again, I don't have current figures...). Likewise, a hash
(see eg. the library TR's 'std::tr1::unor dered_map' and
'std::tr1::unor dered_multimap' ) is much faster. Of course, these figure also depend somewhat on the data types.
This form of heap is called a "d-heap" in its general form where "d"
is the arity of the tree: the same approach works with non-binary
trees, too, although the computation of child and parent nodes changes
slightly. When I compared different priority queues, I found that
"d == 5" had the best performance for integer values stored in the
heap. An even bigger 'd' is probably better if the nodes itself become
bigger. Of course, this was based on a few tests with my own
implementation and probably does not hold up to a scientific analysis.
What does this term "arity" mean? Perhaps the heap I implemented has arity = 2 because each parent has 2 children?

Yes, this is what I mean with "arity": the [maximum] number of children of

a node. Of course, in a d-heap only leaves and at most one other node have less than 'd' children.
What do you mean by the "nodes itself become bigger"? Is it that sizeof(T) where T is the object stored in the heap gets bigger, or that the number of nodes increases?
I mean 'sizeof(T)', ie. the data stored in a node: with bigger 'd' the

number of necessary moves when modifying the heap becomes smaller. If the type 'T' is "bigger" or at least more expensive to move, the cost of the moves
are more dominant than the costs of other operations.

[relating to Fibonacci-heaps]
Thanks for the info. Changing a key in O(1) time is fascinating.

It is not O(1), it is *amortized* O(1): occasionally a higher costs is
incurred but if this cost can be averaged over multiple operations and
then becomes O(1) again. Of course, changing the key involves reordering
the corresponding node such that the other performance guarantees still
hold. Just the actual change of the key is trivially O(1) but doing the
reordering in amortized O(1) is somewhat tricky.

Jul 22 '05 #34

Siemel Naran

"Dietmar Kuehl" <di***********@ yahoo.com> wrote in message

"Siemel Naran" <Si*********@RE MOVE.att.net> wrote:
Then maybe we can use the protected member 'c'. 'std::priority_ queue' actually just uses 'std::make_heap ', etc. Anyway,
accessing the member directly is no good: the internal organization of
the heap essentially makes immediate stability without some tool
impossible.
I don't understand what you mean. The protected member 'c' is already has the data organized in the exact manner.
The problem is, that you cannot get stable behavior of the priority queue
through simply manipulating the member 'c' in an 'std::priority_ queue'.
The way elements in a priority queue simple does not provide any useful
form of stability - unless you somehow encode the order in the key used
for comparison.

Why not? Yes, it could be a problem if we write the container 'c' in one
platform/compiler that implements heaps one way, then read the container 'c'
in another platform/compiler that implements heaps another way. But if we
read and write on the same platform it should be OK, right?

Thanks for jogging my memory. I implemented it once. It was an array of fixed length (2^n)-1, and was essentially a perfectly balanced binary tree stored in a regular C array.

I implemented a bunch of heap variations, including two array based

version (one allowing key modification, the other not allowing it; the one allowing key modifications is a *lot* slower!). The implementation should be available from <http://www.dietmar-kuehl.de/cxxrt/heaps.tar.gz>.
Thanks for this and all the other info.

I think if T objects get big we could use shared_ptr<T> objects instead.

Now out of curiosity, how much faster is a heap implemented through
std::priority_q ueue<T, std::vector<T>> using make_heap, over one implemented through std::set<T>.

It depends somewhat on 'T': if 'T' is expensive to copy or swap, the

vector version will probably eventually become slower with increasing costs. For
'T' being something like 'int' the vector version is something like ten
times faster than typical node based approaches, including sets. At least,
this is what I seem to remember from running very naive tests.
A related question. Have you or anyone measured the time improvement of
inserting N elements into a std::vector<T> using reserve(N) and
push_back(const value_type&) and calling strd::sort, versus inserting N
elements into a std::multimap<T > by calling insert(const value_type&)?
I don't know for the inserts themselves but if the set of elements is

fixed, using an array with binary search is *much* faster than using 'std::map' or 'std::multimap' (again, I don't have current figures...). Likewise, a hash
(see eg. the library TR's 'std::tr1::unor dered_map' and
'std::tr1::unor dered_multimap' ) is much faster. Of course, these figure also depend somewhat on the data types.
This form of heap is called a "d-heap" in its general form where "d"
is the arity of the tree: the same approach works with non-binary
trees, too, although the computation of child and parent nodes changes
slightly. When I compared different priority queues, I found that
"d == 5" had the best performance for integer values stored in the
heap. An even bigger 'd' is probably better if the nodes itself become
bigger. Of course, this was based on a few tests with my own
implementation and probably does not hold up to a scientific analysis.
What does this term "arity" mean? Perhaps the heap I implemented has arity = 2 because each parent has 2 children?

Yes, this is what I mean with "arity": the [maximum] number of children of

a node. Of course, in a d-heap only leaves and at most one other node have less than 'd' children.
What do you mean by the "nodes itself become bigger"? Is it that sizeof(T) where T is the object stored in the heap gets bigger, or that the number of nodes increases?
I mean 'sizeof(T)', ie. the data stored in a node: with bigger 'd' the

number of necessary moves when modifying the heap becomes smaller. If the type 'T' is "bigger" or at least more expensive to move, the cost of the moves
are more dominant than the costs of other operations.

[relating to Fibonacci-heaps]
Thanks for the info. Changing a key in O(1) time is fascinating.

It is not O(1), it is *amortized* O(1): occasionally a higher costs is
incurred but if this cost can be averaged over multiple operations and
then becomes O(1) again. Of course, changing the key involves reordering
the corresponding node such that the other performance guarantees still
hold. Just the actual change of the key is trivially O(1) but doing the
reordering in amortized O(1) is somewhat tricky.

Jul 22 '05 #35

Siemel Naran

"Siemel Naran" <Si*********@RE MOVE.att.net> wrote in message
news:FcQcc.3016 6

We could also use std::bitset<64> , but it doesn't have an operator++ or
operator+.

Anyway know how to implement a quick operator++ for std::bitset?

Jul 22 '05 #36

Siemel Naran

"Siemel Naran" <Si*********@RE MOVE.att.net> wrote in message
news:FcQcc.3016 6

We could also use std::bitset<64> , but it doesn't have an operator++ or
operator+.

Anyway know how to implement a quick operator++ for std::bitset?

Jul 22 '05 #37

Claudio Puviani

"Siemel Naran" <Si*********@RE MOVE.att.net> wrote

"Claudio Puviani" <pu*****@hotmai l.com> wrote
If a 5GHz computer had an integer increment that
took only one CPU cycle and did NOTHING but increment
a 64-bit int 5 billion times per second 24-hours a day without
interruption, it would take 117 _YEARS_ before you ran out
of numbers!!! If you throw in the obligatory jump to implement
the loop and made the jump also take a single CPU cycle, the
duration leaps to a whopping 234 years.

How do you get 117 years? In a 64 integer there are 2^64 =
1.84... * 10^19 possible numbers. In one year we can perform
86,400 seconds * 5 billion increments/second = 4.32 * 10^14
increments. I believe 1 billion = 10^9. Then 2^64 / 4.32e14 =
42700 years. What am I doing wrong in my calculation?

Ops/sec = 5e9
Ops/min = 5e9 ops/sec * 60 sec/min = 3e11
Ops/hr = 3e11 ops/min * 60 min/hr = 1.8e13
Ops/day = 1.8e13 ops/hr * 24 hr/day = 4.32e14
Ops/yr = 4.32e14 ops/day * 365 day/year = 1.577e17
Ops = 2^64 = 1.84e19
Duration = 1.84e19 ops / 1.577e17 ops/yr = 117 years

It looks like you just forgot to factor in the 365 days per year.

Claudio Puviani

Jul 22 '05 #38

Claudio Puviani

"Siemel Naran" <Si*********@RE MOVE.att.net> wrote

"Claudio Puviani" <pu*****@hotmai l.com> wrote
If a 5GHz computer had an integer increment that
took only one CPU cycle and did NOTHING but increment
a 64-bit int 5 billion times per second 24-hours a day without
interruption, it would take 117 _YEARS_ before you ran out
of numbers!!! If you throw in the obligatory jump to implement
the loop and made the jump also take a single CPU cycle, the
duration leaps to a whopping 234 years.

How do you get 117 years? In a 64 integer there are 2^64 =
1.84... * 10^19 possible numbers. In one year we can perform
86,400 seconds * 5 billion increments/second = 4.32 * 10^14
increments. I believe 1 billion = 10^9. Then 2^64 / 4.32e14 =
42700 years. What am I doing wrong in my calculation?

Jul 22 '05 #39

Similar topics