How to make this program more efficient?

Bill David

SUBJECT: How to make this program more efficient?

In my program, a thread will check update from server periodically and
generate a stl::map for other part of this program to read data from.
Let's name the update method as doUpdate and stl::map read methods as
getData and copyData.
Since stl::map is not thread-safe, we should do synchronization by
ourselves. A usable solution is to create a boost::mutex::scoped_lock
object in all above methods to make the access to all methods
synchronized. But since the doUpdate method will be executed
periodically and the interval may be 1 hour or longer, while all other
parts can only read the stl::map, the above solution may do
synchronization too much. We only need synchronization on all methods
when we are doUpdating.
Then is it possible to make this program more efficient?
I have thought out some other solution like add some updating flag.
But it's not safe since stl::map may be updated when the following
part of read methods is executing. And the operation on the flag may
also be interrupted by other threads.

Sep 13 '08 #1

Subscribe Post Reply

3593

James Kanze

On Sep 13, 3:18 am, Bill David <billdavi...@gmail.comwrote:

SUBJECT: How to make this program more efficient?

In my program, a thread will check update from server
periodically and generate a stl::map for other part of this
program to read data from. Let's name the update method as
doUpdate and stl::map read methods as getData and copyData.
Since stl::map is not thread-safe, we should do
synchronization by ourselves. A usable solution is to create a
boost::mutex::scoped_lock object in all above methods to make
the access to all methods synchronized. But since the doUpdate
method will be executed periodically and the interval may be 1
hour or longer, while all other parts can only read the
stl::map, the above solution may do synchronization too much.
We only need synchronization on all methods when we are
doUpdating.

If there's any thread modifying the object, then all accesses
must be synchronized. Period. Given the usage pattern, it may
be more interesting to use a rwlock than a mutex, but this
really depends on how long the readers hold the mutex, and how
many of them there are.

Then is it possible to make this program more efficient?

Probably. Where does the profiler show that there are problems?

I have thought out some other solution like add some updating
flag. But it's not safe since stl::map may be updated when
the following part of read methods is executing. And the
operation on the flag may also be interrupted by other
threads.

You need some sort of synchronization, including for all of the
reads. (Don't forget, too, that std::map returns references to
internal data, and you need to hold the lock as long as these
references are valid.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 13 '08 #2

Jon Harrop

James Kanze wrote:

On Sep 13, 3:18 am, Bill David <billdavi...@gmail.comwrote:
>SUBJECT: How to make this program more efficient?

>In my program, a thread will check update from server
periodically and generate a stl::map for other part of this
program to read data from. Let's name the update method as
doUpdate and stl::map read methods as getData and copyData.
Since stl::map is not thread-safe, we should do
synchronization by ourselves. A usable solution is to create a
boost::mutex::scoped_lock object in all above methods to make
the access to all methods synchronized. But since the doUpdate
method will be executed periodically and the interval may be 1
hour or longer, while all other parts can only read the
stl::map, the above solution may do synchronization too much.
We only need synchronization on all methods when we are
doUpdating.

If there's any thread modifying the object, then all accesses
must be synchronized. Period.

That is incorrect. Caches are an obvious counter example where the writer
can write a single word (a pointer to the updated immutable cache)
atomically and readers can read the word atomically safely without any
locking at all.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 13 '08 #3

Jon Harrop

Bill David wrote:

SUBJECT: How to make this program more efficient?

Unless you mean "within the confines of C++" the most obvious improvement
would be to use an immutable map because they greatly reduce the locking
required and can even completely eliminate it.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 13 '08 #4

courpron

On 13 sep, 12:28, Jon Harrop <j...@ffconsultancy.comwrote:

James Kanze wrote:
On Sep 13, 3:18 am, Bill David <billdavi...@gmail.comwrote:
SUBJECT: How to make this program more efficient?

In my program, a thread will check update from server
periodically and generate a stl::map for other part of this
program to read data from. *Let's name the update method as
doUpdate and stl::map read methods as getData and copyData.
Since stl::map is not thread-safe, we should do
synchronization by ourselves. A usable solution is to create a
boost::mutex::scoped_lock object in all above methods to make
the access to all methods synchronized. But since the doUpdate
method will be executed periodically and the interval may be 1
hour or longer, while all other parts can only read the
stl::map, the above solution may do synchronization too much.
We only need synchronization on all methods when we are
doUpdating.

If there's any thread modifying the object, then all accesses
must be synchronized. *Period.

That is incorrect. Caches are an obvious counter example where the writer
can write a single word (a pointer to the updated immutable cache)
atomically and readers can read the word atomically safely without any
locking at all.

This is semantically wrong. In fact, many programmers believe that
"synchonizing threads to a shared ressource" necessarily means that a
lock of some sort is needed, whereas there are many synchronization
techniques that avoids the use of locks. What you are refering to
seems to be an instance of the RCU algorithm, a kind of lock-free
*synchronization* algorithm.

Back to the original question, the RCU algorithm might indeed be, for
the given problem, more efficient than the suggested RWL by James
Kanz, but since the data update happens every hour or so, the
performance degradation of using a RWL should be minimal. Moreover a
RWL is easier to use, and answers in a straightforward way to the OP's
question which asks for a way to lock only when the ressource is
written. Now, if the OP wants extreme speed, RCU *might* be a better
choice but a profiling is needed since RCU has downsides too ( keeping
copies of old data until all readers have finished, adding a level of
indirection to the data, ...).

Alexandre Courpron.

Sep 13 '08 #5

Jerry Coffin

In article <Tc******************************@bt.com>,
jo*@ffconsultancy.com says...

Bill David wrote:
SUBJECT: How to make this program more efficient?

Unless you mean "within the confines of C++" the most obvious improvement
would be to use an immutable map because they greatly reduce the locking
required and can even completely eliminate it.

It's entirely possible to create an immutable map in C++, and from a
viewpoint of thread safety, it can have the same basic properties as an
immutable map in any other language.

The problem is that from what he's said, he specifically wants to change
the map, which clearly doesn't fit very well with it being immutable --
and, again, the language in which it's implemented doesn't change this.
--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 13 '08 #6

Jerry Coffin

In article <d7d947bf-a634-4b9b-9d51-
75**********@s1g2000pra.googlegroups.com>, bi*********@gmail.com says...

SUBJECT: How to make this program more efficient?

In my program, a thread will check update from server periodically and
generate a stl::map for other part of this program to read data from.
Let's name the update method as doUpdate and stl::map read methods as
getData and copyData.
Since stl::map is not thread-safe, we should do synchronization by
ourselves. A usable solution is to create a boost::mutex::scoped_lock
object in all above methods to make the access to all methods
synchronized. But since the doUpdate method will be executed
periodically and the interval may be 1 hour or longer, while all other
parts can only read the stl::map, the above solution may do
synchronization too much. We only need synchronization on all methods
when we are doUpdating.
Then is it possible to make this program more efficient?
I have thought out some other solution like add some updating flag.
But it's not safe since stl::map may be updated when the following
part of read methods is executing. And the operation on the flag may
also be interrupted by other threads.

I'd collect the data into a separate map (or perhaps vector or whatever)
and only when you've collected all the data for a single update, copy it
all into the map. This way the map only needs to be locked for the
duration of the copy.

If this is still longer than you like, you can access the map via a
pointer. When you're going to do an update, you create a copy of the
map, update the copy, and then update the pointer to refer to the newly
updated copy of the map (then dispose of the old map). This way, you
only need a "lock" for the duration of one pointer update -- but most
architectures will support an atomic swap operation without needing any
other locks.

This, however, is one of the rare cases where using volatile is
necessary for thread safety -- the other threads need to treat the
pointer as volatile, so as soon as the pointer changes, they use the new
pointer value and refer only to the new map. If any of the other threads
"stores" the value of the pointer (at all, even just passing it as a
parameter) you'll need to synchronize access to the pointer, only
disposing of the old map after all other threads have signaled that
they're finished using it.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 13 '08 #7

Jon Harrop

Jerry Coffin wrote:

In article <Tc******************************@bt.com>,
jo*@ffconsultancy.com says...
>Bill David wrote:
SUBJECT: How to make this program more efficient?

Unless you mean "within the confines of C++" the most obvious improvement
would be to use an immutable map because they greatly reduce the locking
required and can even completely eliminate it.

It's entirely possible to create an immutable map in C++, and from a
viewpoint of thread safety, it can have the same basic properties as an
immutable map in any other language.

Efficient immutable data structures rely heavily upon efficient allocation
and garbage collection. C++ has neither and, consequently, it is
practically impossible to solve this problem in C++. Particularly in the
case of multicore programming, you would have to write your own concurrent
run-time (the core of the JVM or CLR) which is just not feasible.

Multicore programming is precisely the buy-in of modern languages for
precisely this reason. In a modern programming environment, this problem is
trivially solved.

The problem is that from what he's said, he specifically wants to change
the map, which clearly doesn't fit very well with it being immutable --
and, again, the language in which it's implemented doesn't change this.

On the contrary, it fits perfectly with a mutable reference to an immutable
map because reads and writes are word-sized and, therefore, atomic. You can
even remove all locks when applicable for huge performance gains.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 13 '08 #8

Jerry Coffin

In article <xq*********************@bt.com>, jo*@ffconsultancy.com
says...

[ ... ]

Efficient immutable data structures rely heavily upon efficient allocation
and garbage collection.

Nonsense! They depend on nothing of the sort. Fast allocation is handy,
but hardly necessary. Garbage collection is even less often useful, to
the point of barely being useful under the circumstances at all.

C++ has neither and, consequently, it is
practically impossible to solve this problem in C++.

Even more complete nonense! While your posts indicate a level if bigotry
and ignorance that probably make it practically impossible for _you_ to
solve the problem in C++ (or probably just about anything else, really)
somebody with a reasonably open mind and a clue of how to program can
solve is in C++ quite easily.

Particularly in the
case of multicore programming, you would have to write your own concurrent
run-time (the core of the JVM or CLR) which is just not feasible.

You clearly haven't even a clue. If anything, the JVM (to use one of
your examples), far from providing a solution, is actually a serious
hindrance to doing the job well at all.

Multicore programming is precisely the buy-in of modern languages for
precisely this reason.

Based on the sentence (using the term loosely) above, your true calling
is in marketing, not engineering.

In a modern programming environment, this problem is trivially solved.

In C++, the problem is fairly easy to solve. In _most_ of the languages
that claim to be more modern, it's trivial to solve as well -- but
solving it nearly as _well_ is far from trivial in those languages. In
fact, nearly the only way to solve it well is solve it in something like
C++, and then use whatever mechanism that language provides for linking
to the C++.

The problem is that from what he's said, he specifically wants to change
the map, which clearly doesn't fit very well with it being immutable --
and, again, the language in which it's implemented doesn't change this.

On the contrary, it fits perfectly with a mutable reference to an immutable
map because reads and writes are word-sized and, therefore, atomic. You can
even remove all locks when applicable for huge performance gains.

You display your ignorance yet again. Pointers and references are not
necessarily word sized, and even if they were they wouldn't necesssarily
be atomic -- just for example, on the x86, some word-sized reads and
writes aren't atomic.

So, at its most basic level, you have an idea that can be workable --
but everything else you had to say was wrong, and everything you added
in your follow up was even more completely wrong. You clearly belong in
a marketing department.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 14 '08 #9

Ian Collins

Jerry Coffin wrote:

In article <xq*********************@bt.com>, jo*@ffconsultancy.com
says...

[ ... ]

>Efficient immutable data structures rely heavily upon efficient allocation
and garbage collection.

Nonsense! They depend on nothing of the sort. Fast allocation is handy,
but hardly necessary. Garbage collection is even less often useful, to
the point of barely being useful under the circumstances at all.

Try not to feed the troll.

--
Ian Collins.

Sep 14 '08 #10

Bill David

On 9ÔÂ13ÈÕ, ÏÂÎç9Ê±59·Ö, Jerry Coffin <jcof...@taeus.comwrote:

In article <d7d947bf-a634-4b9b-9d51-
7541de1d1...@s1g2000pra.googlegroups.com>, billdavi...@gmail.com says...

SUBJECT: How to make this program more efficient?

In my program, a thread will check update from server periodically and
generate a stl::map for other part of this program to read data from.
Let's name the update method as doUpdate and stl::map read methods as
getData and copyData.
Since stl::map is not thread-safe, we should do synchronization by
ourselves. A usable solution is to create a boost::mutex::scoped_lock
object in all above methods to make the access to all methods
synchronized. But since the doUpdate method will be executed
periodically and the interval may be 1 hour or longer, while all other
parts can only read the stl::map, the above solution may do
synchronization too much. We only need synchronization on all methods
when we are doUpdating.
Then is it possible to make this program more efficient?
I have thought out some other solution like add some updating flag.
But it's not safe since stl::map may be updated when the following
part of read methods is executing. And the operation on the flag may
also be interrupted by other threads.

I'd collect the data into a separate map (or perhaps vector or whatever)
and only when you've collected all the data for a single update, copy it
all into the map. This way the map only needs to be locked for the
duration of the copy.

If this is still longer than you like, you can access the map via a
pointer. When you're going to do an update, you create a copy of the
map, update the copy, and then update the pointer to refer to the newly
updated copy of the map (then dispose of the old map). This way, you
only need a "lock" for the duration of one pointer update -- but most
architectures will support an atomic swap operation without needing any
other locks.

This, however, is one of the rare cases where using volatile is
necessary for thread safety -- the other threads need to treat the
pointer as volatile, so as soon as the pointer changes, they use the new
pointer value and refer only to the new map. If any of the other threads
"stores" the value of the pointer (at all, even just passing it as a
parameter) you'll need to synchronize access to the pointer, only
disposing of the old map after all other threads have signaled that
they're finished using it.

--
Later,
Jerry.

The universe is a figment of its own imagination.- Òþ²Ø±»ÒýÓÃÎÄ×Ö -

- ÏÔÊ¾ÒýÓÃµÄÎÄ×Ö -

Sorry, I am still not so clear about what you mean. I am not sure if
it's a little like RCU algorithom mentioned by Jon Harrop. But as I
know RCU is also based on some lock.
I can collect the data into a separate map and try to copy it to old
map after update. But for readers, how could they know the map is in
updating or not if they don't try to check some condition object (will
fail to work if condition check pass but map is updated during
following access) or retrieve a scoped_lock before they read data from
map? Then it will be almost same as the original lock solution I have
said.
To atomic swap, although I don't know how to implement it, I still
wonder if it's safe enough to the following scenario:
1) Thread 1 is reading data from the map via the old pointer of map,
then it is suspended by OS.
2) Thread 2 begin to update and swap the pointer to the map to a new
one.
Then can Thread 1 work well if it's copying data from the original map
when it's interrupted?

To stl::map or some other implementation of map, read the old map may
cause some memory access violation error.

Sep 14 '08 #11

Jon Harrop

Jerry Coffin wrote:

In article <xq*********************@bt.com>, jo*@ffconsultancy.com
says...
>Efficient immutable data structures rely heavily upon efficient
allocation and garbage collection.

Nonsense! They depend on nothing of the sort. Fast allocation is handy,
but hardly necessary. Garbage collection is even less often useful, to
the point of barely being useful under the circumstances at all.

Yet your own proposed solution depends upon lots of copying and long global
locks, both of which are hugely inefficient.

>C++ has neither and, consequently, it is
practically impossible to solve this problem in C++.

Even more complete nonense! While your posts indicate a level if bigotry
and ignorance that probably make it practically impossible for _you_ to
solve the problem in C++ (or probably just about anything else, really)
somebody with a reasonably open mind and a clue of how to program can
solve is in C++ quite easily.

Given how ubiquitous this problem is on modern multicore machines you would
expect that a C++ solution would already exist if it is half as easy to
implement as you claim. Yet no efficient solutions exist in C++ and the
standard library provides nothing of use.

>In a modern programming environment, this problem is trivially solved.

In C++, the problem is fairly easy to solve. In _most_ of the languages
that claim to be more modern, it's trivial to solve as well -- but
solving it nearly as _well_ is far from trivial in those languages.

That is incorrect: Scala and F# bundle complete solutions in their standard
libraries.

In
fact, nearly the only way to solve it well is solve it in something like
C++, and then use whatever mechanism that language provides for linking
to the C++.

Nobody has ever solved this problem efficiently in C++: it is woefully
inadequate for this (and most other) tasks.

>On the contrary, it fits perfectly with a mutable reference to an
immutable map because reads and writes are word-sized and, therefore,
atomic. You can even remove all locks when applicable for huge
performance gains.

You display your ignorance yet again. Pointers and references are not
necessarily word sized, and even if they were they wouldn't necesssarily
be atomic -- just for example, on the x86, some word-sized reads and
writes aren't atomic.

Atomicity is assured by the VM:

http://en.csharp-online.net/ECMA-334...ble_references

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 14 '08 #12

Jon Harrop

co******@gmail.com wrote:

...keeping copies of old data until all readers have finished...

Automated by a concurrent GC.

...adding a level of indirection to the data...

Replacing a struct with a pointer to an object will not have a significant
effect on performance.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 14 '08 #13

=?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=

On 2008-09-14 07:27, Bill David wrote:

On 9æœˆ13æ—¥, ä¸‹åˆ9æ—¶59åˆ†, Jerry Coffin <jcof...@taeus.comwrote:
>In article <d7d947bf-a634-4b9b-9d51-
7541de1d1...@s1g2000pra.googlegroups.com>, billdavi...@gmail.com says...

SUBJECT: How to make this program more efficient?

In my program, a thread will check update from server periodically and
generate a stl::map for other part of this program to read data from.
Let's name the update method as doUpdate and stl::map read methods as
getData and copyData.
Since stl::map is not thread-safe, we should do synchronization by
ourselves. A usable solution is to create a boost::mutex::scoped_lock
object in all above methods to make the access to all methods
synchronized. But since the doUpdate method will be executed
periodically and the interval may be 1 hour or longer, while all other
parts can only read the stl::map, the above solution may do
synchronization too much. We only need synchronization on all methods
when we are doUpdating.
Then is it possible to make this program more efficient?
I have thought out some other solution like add some updating flag.
But it's not safe since stl::map may be updated when the following
part of read methods is executing. And the operation on the flag may
also be interrupted by other threads.

I'd collect the data into a separate map (or perhaps vector or whatever)
and only when you've collected all the data for a single update, copy it
all into the map. This way the map only needs to be locked for the
duration of the copy.

If this is still longer than you like, you can access the map via a
pointer. When you're going to do an update, you create a copy of the
map, update the copy, and then update the pointer to refer to the newly
updated copy of the map (then dispose of the old map). This way, you
only need a "lock" for the duration of one pointer update -- but most
architectures will support an atomic swap operation without needing any
other locks.

This, however, is one of the rare cases where using volatile is
necessary for thread safety -- the other threads need to treat the
pointer as volatile, so as soon as the pointer changes, they use the new
pointer value and refer only to the new map. If any of the other threads
"stores" the value of the pointer (at all, even just passing it as a
parameter) you'll need to synchronize access to the pointer, only
disposing of the old map after all other threads have signaled that
they're finished using it.

--
Later,
Jerry.

The universe is a figment of its own imagination.- éšè—è¢«å¼•ç”¨æ–‡å*— -

- æ˜¾ç¤ºå¼•ç”¨çš„æ–‡å*— -

Sorry, I am still not so clear about what you mean. I am not sure if
it's a little like RCU algorithom mentioned by Jon Harrop. But as I
know RCU is also based on some lock.
I can collect the data into a separate map and try to copy it to old
map after update. But for readers, how could they know the map is in
updating or not if they don't try to check some condition object (will
fail to work if condition check pass but map is updated during
following access) or retrieve a scoped_lock before they read data from
map? Then it will be almost same as the original lock solution I have
said.
To atomic swap, although I don't know how to implement it, I still
wonder if it's safe enough to the following scenario:
1) Thread 1 is reading data from the map via the old pointer of map,
then it is suspended by OS.
2) Thread 2 begin to update and swap the pointer to the map to a new
one.
Then can Thread 1 work well if it's copying data from the original map
when it's interrupted?

To stl::map or some other implementation of map, read the old map may
cause some memory access violation error.

There are two solutions, the first is to use a lock (rwlock or something
like it) and replace the content in the map with the new content. The
lock will assure that no problems occur and this solution is quite easy
to implement. Just make sure that the threads hold the lock for the
whole duration of the operations it need the map for.

The other solution is to let each thread access the map through a
pointer, when you get the new data all you have to do is to change the
pointer to point to the new map. If you use a smart-pointer you can even
ensure that two consecutive reads to the map will always give the same
value as long as you read using the same pointer. To make sure that the
update of the pointer is atomic you can use either an atomic operation
(might not work with a smart pointer) or a rwlock. This solution is a
bit more complex but should scale much better than the first.

--
Erik WikstrÃ¶m

Sep 14 '08 #14

Jerry Coffin

In article <p9*********************@bt.com>, jo*@ffconsultancy.com
says...

[ ... ]

Nonsense! They depend on nothing of the sort. Fast allocation is handy,
but hardly necessary. Garbage collection is even less often useful, to
the point of barely being useful under the circumstances at all.

Yet your own proposed solution depends upon lots of copying and long global
locks, both of which are hugely inefficient.

Further discussion is pointless until you have at least some clue of the
subject matter. You claim I offered one solution that involved a long
global lock.

In reality, I offered two solutions, neither of which involved a long
global lock. Given that you can't even count to two correctly, a brick
wall is at least a capable as you are of offering intelligent discussion
of multithreaded programming. Based on your comments so far, it's more
capable: it never says anything, which at least avoids spouting all
sorts of blatantly false nonsense.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 14 '08 #15

Jerry Coffin

In article <692dead8-f8f7-43d5-99c4-
58**********@r15g2000prh.googlegroups.com>, bi*********@gmail.com
says...

[ Please don't quote signatures ... ]

Sorry, I am still not so clear about what you mean. I am not sure if
it's a little like RCU algorithom mentioned by Jon Harrop. But as I
know RCU is also based on some lock.
I can collect the data into a separate map and try to copy it to old
map after update. But for readers, how could they know the map is in
updating or not if they don't try to check some condition object (will
fail to work if condition check pass but map is updated during
following access) or retrieve a scoped_lock before they read data from
map? Then it will be almost same as the original lock solution I have
said.

Yes, you still need a lock. All you're doing is shortening the time
during which the lock is needed -- while you're collecting the data
(which typically tends to be fairly slow) the original map remains
unlocked. Only after you've accumulated all the data for an update, you
lock the map, update it, and then unlock it. If accumulating the data
for an update is almost as fast as, or faster than, updating the map
itself, the gain is quite small. More often, however, accumulating the
data is fairly slow, and the update itself is fairly fast, in which case
this can gain quite a bit.

To atomic swap, although I don't know how to implement it, I still
wonder if it's safe enough to the following scenario:
1) Thread 1 is reading data from the map via the old pointer of map,
then it is suspended by OS.
2) Thread 2 begin to update and swap the pointer to the map to a new
one.
Then can Thread 1 work well if it's copying data from the original map
when it's interrupted?

To stl::map or some other implementation of map, read the old map may
cause some memory access violation error.

The atomic swap was involved in an entirely separate solution. You start
by creating a copy of the original map. You then update your copy. You
then (and this is where the atomic swap comes into play) substitute the
newly updated map for the old one.

To do that, you start with a pointer to the original map. You then have
a pointer to your modified map. You do an atomic swap to exchange the
two, updating the pointer to point to the new map, and getting a pointer
to the old map so you can dispose of it.

This is the one point at which John Harrop's (otherwise nonsensical)
statements did give a glimmer of truth. Knowing when you can dispose of
the old map requires knowing when nobody else is accessing it. You can
do that a number of different ways. One is to use garbage collection.
This is simple, but creates the possibility of a map existing long after
it's no longer really in use. Another is to use something like reference
counting so you can determinstically dispose of the map as soon as it's
no longer in use. Depending on usage pattern, you can often reduce the
reference count to a single bit for each thread (basically a mutex).

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 14 '08 #16

courpron

On 14 sep, 11:32, Jon Harrop <j...@ffconsultancy.comwrote:

courp...@gmail.com wrote:
...keeping copies of old data until all readers have finished...

Automated by a concurrent GC.

That is not the point. The point is that the RCU algorithm can
drastically increase memory usage over the RWL solution (from what
I've seen about the given problem, this consideration should be
minimal though).

>
...adding a level of indirection to the data...

Replacing a struct with a pointer to an object will not have a significant
effect on performance.

Probably not, but that ultimately depends on the program + the
architecture it should run on.

Alexandre Courpron.

Sep 14 '08 #17

courpron

On 14 sep, 07:27, Bill David <billdavi...@gmail.comwrote:

[...]

Sorry, I am still not so clear about what you mean. I am not sure if
it's a little like RCU algorithom mentioned by Jon Harrop. But as I
know RCU is also based on some lock.
I can collect the data into a separate map and try to copy it to old
map after update. But for readers, how could they know the map is in
updating or not if they don't try to check some condition object (will
fail to work if condition check pass but map is updated during
following access) or retrieve a scoped_lock before they read data from
map? Then it will be almost same as the original lock solution I have
said.
To atomic swap, although I don't know how to implement it, I still
wonder if it's safe enough to the following scenario:
1) Thread 1 is reading data from the map via the old pointer of map,
then it is suspended by OS.
2) Thread 2 begin to update and swap the pointer to the map to a new
one.
Then can Thread 1 work well if it's copying data from the original map
when it's interrupted?

To stl::map or some other implementation of map, read the old map may
cause some memory access violation error.

There are many possible solutions to your problem, but there are 2
well-known techniques to improve efficiency in your situation :
- the RWL "Read Write Lock" solution (and specifically a RWL, not
another lock), it is designed to allow concurrent, non-blocking, read
accesses, while ensuring that a write access is mutually exclusive
with any other access.
- the RCU "Read Copy Update" solution, which allows concurrent, non-
blocking, read and write accesses.

The RWL solution is easier to implement; the RCU solution should be
more efficient (in terms of speed, not memory usage), but is more
complex to implement. Don't judge the complexity of RCU by the
descriptions found in this discussion, they are oversimplifications of
the algorithm. Knowing that, you should document yourself about one or
both of these techniques. My advice would be to follow James Kanze's
suggestion of using a RWL, it's a fast solution to implement ( or you
may even find some RWL C++ class on the net ) and answers directly to
your question. Since the update is not frequent, the potential speed
gain of using a RCU over a RWL is reduced.

Alexandre Courpron.

Sep 14 '08 #18

James Kanze

On Sep 13, 12:28 pm, Jon Harrop <j...@ffconsultancy.comwrote:

James Kanze wrote:
On Sep 13, 3:18 am, Bill David <billdavi...@gmail.comwrote:
SUBJECT: How to make this program more efficient?

In my program, a thread will check update from server
periodically and generate a stl::map for other part of this
program to read data from. Let's name the update method as
doUpdate and stl::map read methods as getData and copyData.
Since stl::map is not thread-safe, we should do
synchronization by ourselves. A usable solution is to
create a boost::mutex::scoped_lock object in all above
methods to make the access to all methods synchronized. But
since the doUpdate method will be executed periodically and
the interval may be 1 hour or longer, while all other parts
can only read the stl::map, the above solution may do
synchronization too much. We only need synchronization on
all methods when we are doUpdating.

If there's any thread modifying the object, then all
accesses must be synchronized. Period.

That is incorrect.

Not according to Posix. And not on some of the more recent
architectures.

Caches are an obvious counter example where the writer can
write a single word (a pointer to the updated immutable cache)
atomically and readers can read the word atomically safely
without any locking at all.

First, there's formally no guarantee about writing even a single
word. But that's not the issue; the issue is one of ordering.
And the upcoming C++ standard will be clear here: if more than
one thread accesses the container, and any thread modifies it,
then all accesses must be synchronized.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 14 '08 #19

James Kanze

On Sep 13, 2:33 pm, courp...@gmail.com wrote:

On 13 sep, 12:28, Jon Harrop <j...@ffconsultancy.comwrote:
James Kanze wrote:
On Sep 13, 3:18 am, Bill David <billdavi...@gmail.comwrote:
>SUBJECT: How to make this program more efficient?

>In my program, a thread will check update from server
>periodically and generate a stl::map for other part of this
>program to read data from. Let's name the update method as
>doUpdate and stl::map read methods as getData and copyData.
>Since stl::map is not thread-safe, we should do
>synchronization by ourselves. A usable solution is to create a
>boost::mutex::scoped_lock object in all above methods to make
>the access to all methods synchronized. But since the doUpdate
>method will be executed periodically and the interval may be 1
>hour or longer, while all other parts can only read the
>stl::map, the above solution may do synchronization too much.
>We only need synchronization on all methods when we are
>doUpdating.

If there's any thread modifying the object, then all accesses
must be synchronized. Period.

That is incorrect. Caches are an obvious counter example
where the writer can write a single word (a pointer to the
updated immutable cache) atomically and readers can read the
word atomically safely without any locking at all.

This is semantically wrong. In fact, many programmers believe
that "synchonizing threads to a shared ressource" necessarily
means that a lock of some sort is needed, whereas there are
many synchronization techniques that avoids the use of locks.
What you are refering to seems to be an instance of the RCU
algorithm, a kind of lock-free *synchronization* algorithm.

It is certainly possible to design some types of containers to
be lock-free. If you're using the standard containers, however,
and just wrapping them, I don't think that a lock-free algorithm
can be made to work.

And of course, as you say, it's a lot more work, and rarely
necessary.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 14 '08 #20

James Kanze

On Sep 13, 6:38 pm, Jon Harrop <j...@ffconsultancy.comwrote:

Jerry Coffin wrote:
In article <TcSdnUV3p4REGlbVnZ2dneKdnZydn...@bt.com>,
j...@ffconsultancy.com says...
Bill David wrote:
SUBJECT: How to make this program more efficient?

Unless you mean "within the confines of C++" the most
obvious improvement would be to use an immutable map
because they greatly reduce the locking required and can
even completely eliminate it.

It's entirely possible to create an immutable map in C++,
and from a viewpoint of thread safety, it can have the same
basic properties as an immutable map in any other language.

Efficient immutable data structures rely heavily upon
efficient allocation and garbage collection. C++ has neither

If you don't want it to. I've used garbage collection with C++,
and it works quite well. For almost all applications, it's a
definite win in terms of development times and robustness, and
for a lot of applications, it's also a win in terms of
performance, and for others, it's a win in terms of perceived
performance, even if total performance isn't as good.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 14 '08 #21

James Kanze

On Sep 13, 3:59 pm, Jerry Coffin <jcof...@taeus.comwrote:

In article <d7d947bf-a634-4b9b-9d51-
7541de1d1...@s1g2000pra.googlegroups.com>, billdavi...@gmail.com says...
SUBJECT: How to make this program more efficient?

In my program, a thread will check update from server periodically and
generate a stl::map for other part of this program to read data from.
Let's name the update method as doUpdate and stl::map read methods as
getData and copyData.
Since stl::map is not thread-safe, we should do synchronization by
ourselves. A usable solution is to create a boost::mutex::scoped_lock
object in all above methods to make the access to all methods
synchronized. But since the doUpdate method will be executed
periodically and the interval may be 1 hour or longer, while all other
parts can only read the stl::map, the above solution may do
synchronization too much. We only need synchronization on all methods
when we are doUpdating.
Then is it possible to make this program more efficient?
I have thought out some other solution like add some updating flag.
But it's not safe since stl::map may be updated when the following
part of read methods is executing. And the operation on the flag may
also be interrupted by other threads.

I'd collect the data into a separate map (or perhaps vector or
whatever) and only when you've collected all the data for a
single update, copy it all into the map. This way the map only
needs to be locked for the duration of the copy.

All accesses, including reads, still need to be synchronized.

If this is still longer than you like, you can access the map
via a pointer. When you're going to do an update, you create a
copy of the map, update the copy, and then update the pointer
to refer to the newly updated copy of the map (then dispose of
the old map). This way, you only need a "lock" for the
duration of one pointer update -- but most architectures will
support an atomic swap operation without needing any other
locks.

Provided you do it in assembler. std::swap doesn't guarantee
atomicity and synchronization. (And you likely need additional
instructions for synchronization of the swap on any architecture
other than 80x86.)

This, however, is one of the rare cases where using volatile
is necessary for thread safety -- the other threads need to
treat the pointer as volatile, so as soon as the pointer
changes, they use the new pointer value and refer only to the
new map.

Volatile, at least as implemented in the compilers I know (Sun
CC, g++ and VC++) doesn't guarantee this, although arguably,
given the expressed intent behind the keyword in the C standard,
they should. You typically need some sort of fence or memory
barrier, which compilers don't generate.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 14 '08 #22

James Kanze

On Sep 14, 7:27 am, Bill David <billdavi...@gmail.comwrote:

On 9æœˆ13æ—¥, ä¸‹åˆ9æ—¶59åˆ†, Jerry Coffin <jcof...@taeus.comwrote:
In article <d7d947bf-a634-4b9b-9d51-
7541de1d1...@s1g2000pra.googlegroups.com>, billdavi...@gmail.com says....

Sorry, I am still not so clear about what you mean. I am not
sure if it's a little like RCU algorithom mentioned by Jon
Harrop. But as I know RCU is also based on some lock. I can
collect the data into a separate map and try to copy it to old
map after update. But for readers, how could they know the map
is in updating or not if they don't try to check some
condition object (will fail to work if condition check pass
but map is updated during following access) or retrieve a
scoped_lock before they read data from map? Then it will be
almost same as the original lock solution I have said.

I'm not sure which of Jerry's suggestions you were referring to,
but if contention were a real problem, and the readers had very
tight real time requirments, I would (in this case, I actually
did once) use rwlocks, with the updating thread first taking
just a read lock and copying the entire map, then updating the
copy, and finally releasing the read lock and requesting a write
lock to swap the pointer to the map, so that later readers would
see the newly updated lock. (In the actual case, the updating
thread made the copy as soon as it released the updated version,
so when the event which triggered the update occured, it already
had a complete copy.)

Since the only critical point is swapping the pointers, this can
be done without locking if some form of atomic swap is used, AND
measures are taken to ensure that the object designated by the
old pointer is not modified or freed until all threads using
that pointer have finished with it. The easiest way to ensure
this is to use dynamically allocated maps and garbage
collection; without garbage collection, it's a bit more
difficult, but far from impossible. (In our case, we didn't try
this, and we used two, statically allocated maps---in our case,
the "threads" were in fact different processes, and the maps
were in shared memory, so classical dynamic allocation wasn't an
alternative.)

To atomic swap, although I don't know how to implement it, I
still wonder if it's safe enough to the following scenario:
1) Thread 1 is reading data from the map via the old pointer of map,
then it is suspended by OS.
2) Thread 2 begin to update and swap the pointer to the map to a new
one.
Then can Thread 1 work well if it's copying data from the
original map when it's interrupted?

As long as the original copy isn't deleted or otherwise
modified. That's the key to making this work.

Given that the updates are not very frequent, either garbage
collection or some sort of reference counting could be used.
Reference counting means you also need some sort of dynamic
increment and decrement, but most platforms also support this,
if you're willing to use a bit of assembler. (It's easy to get
wrong, however.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientÃ©e objet/
Beratung in objektorientierter Datenverarbeitung
9 place SÃ©mard, 78210 St.-Cyr-l'Ã‰cole, France, +33 (0)1 30 23 00 34

Sep 14 '08 #23

James Kanze

On Sep 14, 5:00 pm, Jerry Coffin <jcof...@taeus.comwrote:

In article <692dead8-f8f7-43d5-99c4-
58534dfff...@r15g2000prh.googlegroups.com>, billdavi...@gmail.com
says...

[ Please don't quote signatures ... ]

Sorry, I am still not so clear about what you mean. I am not
sure if it's a little like RCU algorithom mentioned by Jon
Harrop. But as I know RCU is also based on some lock. I can
collect the data into a separate map and try to copy it to
old map after update. But for readers, how could they know
the map is in updating or not if they don't try to check
some condition object (will fail to work if condition check
pass but map is updated during following access) or retrieve
a scoped_lock before they read data from map? Then it will
be almost same as the original lock solution I have said.

Yes, you still need a lock. All you're doing is shortening the
time during which the lock is needed -- while you're
collecting the data (which typically tends to be fairly slow)
the original map remains unlocked.

More correctly, it remains read-locked. You only need the write
lock when updating the image. Since many threads can acquire
the read lock simultaneously, however, this doesn't really slow
up the readers much.

Only after you've accumulated all the data for an update, you
lock the map, update it, and then unlock it. If accumulating
the data for an update is almost as fast as, or faster than,
updating the map itself, the gain is quite small. More often,
however, accumulating the data is fairly slow, and the update
itself is fairly fast, in which case this can gain quite a
bit.

To atomic swap, although I don't know how to implement it, I
still wonder if it's safe enough to the following scenario:
1) Thread 1 is reading data from the map via the old pointer of map,
then it is suspended by OS.
2) Thread 2 begin to update and swap the pointer to the map to a new
one.
Then can Thread 1 work well if it's copying data from the original map
when it's interrupted?

To stl::map or some other implementation of map, read the old map may
cause some memory access violation error.

The atomic swap was involved in an entirely separate solution.
You start by creating a copy of the original map. You then
update your copy. You then (and this is where the atomic swap
comes into play) substitute the newly updated map for the old
one.

To do that, you start with a pointer to the original map. You
then have a pointer to your modified map. You do an atomic
swap to exchange the two, updating the pointer to point to the
new map, and getting a pointer to the old map so you can
dispose of it.

This is the one point at which John Harrop's (otherwise
nonsensical) statements did give a glimmer of truth. Knowing
when you can dispose of the old map requires knowing when
nobody else is accessing it. You can do that a number of
different ways. One is to use garbage collection. This is
simple, but creates the possibility of a map existing long
after it's no longer really in use.

But only if no one else needs the memory it's using. Typically,
that's not a problem. However, I've argued strongly in favor of
garbage collection before, but in this case, the main difference
compared to reference counting is that it's already implemented.
Because you're not making many copies of the root pointer, and
only the root of the map (and not each individual allocated
block) needs to be reference counted, I doubt that garbage
collection will gain much other than the fact that the thread
safe implementation is already there. And since the upcoming
standard will almost certainly require a thread safe
implementation of boost::shared_ptr, that could really be a
viable alternative. (Another advantage of garbage collection
here is that the garbage collector I use was written by Hans
Boehm. Someone who really understands threading issues. I feel
a bit less sure about Boost---I'd like to be proven wrong, but
I've seen so many problems with supposedly thread safe solutions
to different problems, that I've become exceedingly sceptical.)

Another is to use something like reference counting so you can
determinstically dispose of the map as soon as it's no longer
in use. Depending on usage pattern, you can often reduce the
reference count to a single bit for each thread (basically a
mutex).

You'll have to explain that one to me. (I agree that this is a
pattern in which reference counting works extremely well, but
I'm pretty sure you still need more than a single bit to
implement it.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 14 '08 #24

Bill David

On 9æœˆ15æ—¥, ä¸Šåˆ12æ—¶21åˆ†, James Kanze <james.ka...@gmail.comwrote:

On Sep 13, 6:38 pm, Jon Harrop <j...@ffconsultancy.comwrote:

Jerry Coffin wrote:
In article <TcSdnUV3p4REGlbVnZ2dneKdnZydn...@bt.com>,
j...@ffconsultancy.com says...
>Bill David wrote:
SUBJECT: How to make this program more efficient?
>Unless you mean "within the confines of C++" the most
>obvious improvement would be to use an immutable map
>because they greatly reduce the locking required and can
>even completely eliminate it.
It's entirely possible to create an immutable map in C++,
and from a viewpoint of thread safety, it can have the same
basic properties as an immutable map in any other language.
Efficient immutable data structures rely heavily upon
efficient allocation and garbage collection. C++ has neither

If you don't want it to. Â*I've used garbage collection with C++,
and it works quite well. Â*For almost all applications, it's a
definite win in terms of development times and robustness, and
for a lot of applications, it's also a win in terms of
performance, and for others, it's a win in terms of perceived
performance, even if total performance isn't as good.

--
James Kanze (GABI Software) Â* Â* Â* Â* Â* Â* email:james.ka...@gmail.com
Conseils en informatique orientÃ©e objet/
Â* Â* Â* Â* Â* Â* Â* Â* Â* Â*Beratung in objektorientierter Datenverarbeitung
9 place SÃ©mard, 78210 St.-Cyr-l'Ã‰cole, France, +33 (0)1 30 23 00 34

Thank all of you for your suggestion.
to the following important scenario I have mentioned:
To the original solution based on stl::map, both getData (will call
map::find) and copyData (will iterate the whole map) will face some
errors if we don't lock between all methods that access (read/update)
the map.
To above problem, it seems garbage collection based solution is the
only one that can solve it with more memory consuming. A getData/
copyData request that has started before the update may still use the
original map pointer and content, while requests after update will use
the new one.
To RCU, I am afraid it's not fit for this scenario, although it can be
lock-free sometimes. I have some RCU implementation experience, we add
a timestamp with data, we read data out and try to update it. But
before we really update data to DB, we will check the current
timestamp again. If it's modified, the update request will fail.
Otherwise, we will update data with new timestamp. But how could we
apply it in above scenario, I have no idea.
To RWL, if it can permit concurrent read without problem we may face
in the scenario I have mentioned above, it will also be a great
enhancement to performance. But I am not sure about this solution. I
will think about it in the following hours, and come back again, :)

Sep 15 '08 #25

Bill David

On 9æœˆ15æ—¥, ä¸Šåˆ12æ—¶21åˆ†, James Kanze <james.ka...@gmail.comwrote:

On Sep 13, 6:38 pm, Jon Harrop <j...@ffconsultancy.comwrote:

Jerry Coffin wrote:
In article <TcSdnUV3p4REGlbVnZ2dneKdnZydn...@bt.com>,
j...@ffconsultancy.com says...
>Bill David wrote:
SUBJECT: How to make this program more efficient?
>Unless you mean "within the confines of C++" the most
>obvious improvement would be to use an immutable map
>because they greatly reduce the locking required and can
>even completely eliminate it.
It's entirely possible to create an immutable map in C++,
and from a viewpoint of thread safety, it can have the same
basic properties as an immutable map in any other language.
Efficient immutable data structures rely heavily upon
efficient allocation and garbage collection. C++ has neither

If you don't want it to. Â*I've used garbage collection with C++,
and it works quite well. Â*For almost all applications, it's a
definite win in terms of development times and robustness, and
for a lot of applications, it's also a win in terms of
performance, and for others, it's a win in terms of perceived
performance, even if total performance isn't as good.

--
James Kanze (GABI Software) Â* Â* Â* Â* Â* Â* email:james.ka...@gmail.com
Conseils en informatique orientÃ©e objet/
Â* Â* Â* Â* Â* Â* Â* Â* Â* Â*Beratung in objektorientierter Datenverarbeitung
9 place SÃ©mard, 78210 St.-Cyr-l'Ã‰cole, France, +33 (0)1 30 23 00 34

Sep 15 '08 #26

Jerry Coffin

In article <c06aa603-7443-489b-a1be-231c4f929341@
59g2000hsb.googlegroups.com>, ja*********@gmail.com says...

[ ... ]

I'd collect the data into a separate map (or perhaps vector or
whatever) and only when you've collected all the data for a
single update, copy it all into the map. This way the map only
needs to be locked for the duration of the copy.

All accesses, including reads, still need to be synchronized.

Well, yes. For the moment, I'm ignoring the read lock because it'll be
needed for _any_ solution that shares access to the map, regardless of
how the updating is done, and an arbitrary number of threads can acquire
read locks simultaneously anyway.

If this is still longer than you like, you can access the map
via a pointer. When you're going to do an update, you create a
copy of the map, update the copy, and then update the pointer
to refer to the newly updated copy of the map (then dispose of
the old map). This way, you only need a "lock" for the
duration of one pointer update -- but most architectures will
support an atomic swap operation without needing any other
locks.

Provided you do it in assembler. std::swap doesn't guarantee
atomicity and synchronization. (And you likely need additional
instructions for synchronization of the swap on any architecture
other than 80x86.)

You can't do it entirely portably, but you rarely (if ever) have to use
assembly language either -- nearly any platform that provides threading
of any sort is also going to include some synchronization primitives,
and one of them is almost certain to be adequate to this purpose. For
one obvious example, on Win32 you'd use InterlockedExchange (or a
variation such as InterlockedExchange64). Under C++ 0x it will be
provided under the name atomic_exchange().

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 16 '08 #27

Jerry Coffin

In article <f9c317c4-329c-4951-922d-0b1125a67ef9
@k13g2000hse.googlegroups.com>, ja*********@gmail.com says...

[ ... ]

Another is to use something like reference counting so you can
determinstically dispose of the map as soon as it's no longer
in use. Depending on usage pattern, you can often reduce the
reference count to a single bit for each thread (basically a
mutex).

You'll have to explain that one to me. (I agree that this is a
pattern in which reference counting works extremely well, but
I'm pretty sure you still need more than a single bit to
implement it.)

As I said, it depends on the usage -- in a pretty fair number of cases,
a thread isn't capable of creating more than one reference to the
contents of the collection at any given time. When that's the case, the
maximum reference count is one...

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 16 '08 #28

James Kanze

On Sep 16, 6:22 am, Jerry Coffin <jcof...@taeus.comwrote:

In article <f9c317c4-329c-4951-922d-0b1125a67ef9
@k13g2000hse.googlegroups.com>, james.ka...@gmail.com says...

[ ... ]

Another is to use something like reference counting so you
can determinstically dispose of the map as soon as it's no
longer in use. Depending on usage pattern, you can often
reduce the reference count to a single bit for each thread
(basically a mutex).

You'll have to explain that one to me. (I agree that this
is a pattern in which reference counting works extremely
well, but I'm pretty sure you still need more than a single
bit to implement it.)

As I said, it depends on the usage -- in a pretty fair number
of cases, a thread isn't capable of creating more than one
reference to the contents of the collection at any given time.
When that's the case, the maximum reference count is one...

But in that case, you don't need a reference count at all. Any
time you would decrement the reference count, you're guaranteed
that the result will be 0, and you'll have to delete, so you
might as well just delete immediately, without bothering with
the reference count.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 16 '08 #29

Jerry Coffin

In article <3d3f2630-2d72-4106-bc60-79bb8b415271@
25g2000hsx.googlegroups.com>, ja*********@gmail.com says...

[ ... ]

As I said, it depends on the usage -- in a pretty fair number
of cases, a thread isn't capable of creating more than one
reference to the contents of the collection at any given time.
When that's the case, the maximum reference count is one...

But in that case, you don't need a reference count at all. Any
time you would decrement the reference count, you're guaranteed
that the result will be 0, and you'll have to delete, so you
might as well just delete immediately, without bothering with
the reference count.

No so -- each thread has only a single use, but an arbitrary number of
threads can have access, so you need to wait until all the threads clear
their bits before you dispose of the map.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 16 '08 #30

James Kanze

On Sep 16, 3:10 pm, Jerry Coffin <jcof...@taeus.comwrote:

In article <3d3f2630-2d72-4106-bc60-79bb8b415271@
25g2000hsx.googlegroups.com>, james.ka...@gmail.com says...

[ ... ]

As I said, it depends on the usage -- in a pretty fair
number of cases, a thread isn't capable of creating more
than one reference to the contents of the collection at
any given time. When that's the case, the maximum
reference count is one...

But in that case, you don't need a reference count at all.
Any time you would decrement the reference count, you're
guaranteed that the result will be 0, and you'll have to
delete, so you might as well just delete immediately,
without bothering with the reference count.

No so -- each thread has only a single use, but an arbitrary
number of threads can have access, so you need to wait until
all the threads clear their bits before you dispose of the
map.

So it's one bit per thread, rather than a single bit. So for n
threads, you need n bits, rather than just lg(n).

There's something I'm not understanding.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 16 '08 #31

Jerry Coffin

In article <a9c3da76-0b29-46c0-b2ef-510d42322355
@c65g2000hsa.googlegroups.com>, ja*********@gmail.com says...

[ ... ]

So it's one bit per thread, rather than a single bit. So for n
threads, you need n bits, rather than just lg(n).

There's something I'm not understanding.

Yes -- my original article was rather misleading, but this does not
optimize space. Rather, it optimizes speed at the expense of a little
extra space. That is, you avoid all your reader threads contending for a
single resource (the reference count). Instead, each has its own
"reference count" and only has to contend with the writer thread for its
use. From a viewpoint of space utilization this typically won't be any
different from having a full-blown reference count for each thread --
i.e. you want to store the bits unpacked so each word can be accessed
independently of the others.

With a reference count, the a single use of the map (or whatever
resource you're sharing) involves the following steps:

1) obtain mutex on reference count.
2) read, modify and write the reference count.
3) release the mutex on the reference count.
4) Use the shared resource.
5) obtain mutex on reference count.
6) read, modify and write the reference count.
7) release the mutex on the reference count.

With a mutex for each thread the cycle is:

1) set the mutex
2) Use the shared resource
3) clear the mutex

The number of steps can be somewhat misleading though: steps 2 and 6 of
the original set each involve an RMW cycle by themselves, which is
completely absent in the second set. In addition, obtaining the mutex
(steps 1 and 5) takes longer as the number of threads grows, another
feature that's absent from the second set.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 17 '08 #32

Jon Harrop

James Kanze wrote:

And of course, as you say, it's a lot more work...

Only if you are limited to languages like C++. In F#, for example, it is
literally just:

bindings <- Map.add key value bindings

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 19 '08 #33

Jon Harrop

James Kanze wrote:

On Sep 13, 12:28 pm, Jon Harrop <j...@ffconsultancy.comwrote:
>James Kanze wrote:
If there's any thread modifying the object, then all
accesses must be synchronized. Period.

>That is incorrect.

Not according to Posix.

Or the Bible, which is also irrelevant.

And not on some of the more recent architectures.

Any major ones?

>Caches are an obvious counter example where the writer can
write a single word (a pointer to the updated immutable cache)
atomically and readers can read the word atomically safely
without any locking at all.

First, there's formally no guarantee about writing even a single
word.

That is incorrect. There are, of course, already guarantees such as the one
given by .NET as I have already said.

But that's not the issue; the issue is one of ordering.
And the upcoming C++ standard will be clear here: if more than
one thread accesses the container, and any thread modifies it,
then all accesses must be synchronized.

That is just another serious limitation of the new C++.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 19 '08 #34

courpron

On 20 sep, 13:37, Jon Harrop <j...@ffconsultancy.comwrote:

[...]

But that's not the issue; the issue is one of ordering. *And
the upcoming C++ standard will be clear here: if more than
one thread accesses the container, and any thread modifies
it, then all accesses must be synchronized.

That is just another serious limitation of the new C++.

That's because it's a limitation of real hardware.

Another misconception.

Not at all.

Accessing an (aligned) pointer atomically can still lead to some
problems (the same problems you can have with the volatile keyword).
Basically, on a SMP architecture :
- instructions can be reordered at runtime ; and there is no
dependence analysis as with parallel instructions in a susperscalar
monoprocessor.
- the cache of the concerned CPU may be outdated (even in a cache-
coherent architecture, depending on the cache-coherent system)

In such context, the only portable way to access safely a memory word
is to use memory barriers.

Alexandre Courpron.

Sep 20 '08 #35

peter koch

On 20 Sep., 15:47, courp...@gmail.com wrote:

On 20 sep, 13:37, Jon Harrop <j...@ffconsultancy.comwrote:

[...]
But that's not the issue; the issue is one of ordering. *And
the upcoming C++ standard will be clear here: if more than
one thread accesses the container, and any thread modifies
it, then all accesses must be synchronized.

>That is just another serious limitation of the new C++.

That's because it's a limitation of real hardware.

Another misconception.

Not at all.

[snip]

In such context, the only portable way to access safely a memory word
is to use memory barriers.

Or use a mutex. In both cases, synchronisation takes place - which I
believe must be evident for everybody except perhaps for Jon Harrop.

/Peter

Sep 20 '08 #36

Jon Harrop

co******@gmail.com wrote:

I brought this precison to show that memory barriers are ultimately
needed...

That is not true in general.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 20 '08 #37

Jon Harrop

co******@gmail.com wrote:

On 20 sep, 13:37, Jon Harrop <j...@ffconsultancy.comwrote:
Another misconception.

Not at all.

Accessing an (aligned) pointer atomically can still lead to some
problems (the same problems you can have with the volatile keyword).
Basically, on a SMP architecture :
- instructions can be reordered at runtime ; and there is no
dependence analysis as with parallel instructions in a susperscalar
monoprocessor.
- the cache of the concerned CPU may be outdated (even in a cache-
coherent architecture, depending on the cache-coherent system)

In such context, the only portable way to access safely a memory word
is to use memory barriers.

Only if you add portability as a new requirement. That is extremely and
needlessly inefficient on platforms which support atomic pointer writes
though (which ARM, AMD and Intel all do). In practice, you would use
specialized code generation whereever possible because it will generate
vastly faster code. That is exactly what the JVM and CLR do, of course.
They are not going to put a lock around every pointer write if they can
possibly avoid it...

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 20 '08 #38

courpron

On 20 sep, 20:16, Jon Harrop <j...@ffconsultancy.comwrote:

courp...@gmail.com wrote:
I brought this precison to show that memory barriers are ultimately
needed...

That is not true in general.

The answer to this is contained in the part of my message you cut (i.
e. : "If the implementation of a mutex on a specific architecture
doesn't need memory barriers then a simple atomic access probably
doesn't need memory barriers too." )

Alexandre Courpron.

Sep 20 '08 #39

courpron

On 20 sep, 20:18, Jon Harrop <j...@ffconsultancy.comwrote:

courp...@gmail.com wrote:
On 20 sep, 13:37, Jon Harrop <j...@ffconsultancy.comwrote:
Another misconception.

Not at all.

Accessing an (aligned) pointer atomically can still lead to some
problems (the same problems you can have with the volatile keyword).
Basically, on a SMP architecture :
- instructions can be reordered at runtime ; and there is no
dependence analysis as with parallel instructions in a susperscalar
monoprocessor.
- the cache of the concerned CPU may be outdated (even in a cache-
coherent architecture, depending on the cache-coherent system)

In such context, the only portable way to access safely a memory word
is to use memory barriers.

Only if you add portability as a new requirement. That is extremely and
needlessly inefficient on platforms which support atomic pointer writes
though (which ARM, AMD and Intel all do). In practice, you would use
specialized code generation whereever possible because it will generate
vastly faster code. That is exactly what the JVM and CLR do, of course.
They are not going to put a lock around every pointer write if they can
possibly avoid it...

You are making some confusion about the atomicity of an operation and
the fact that such operation, **even if they are atomic**, still needs
to be *memory* synchronized ,at least on SMP architectures, to
overcome the ordering and cache problems which I described earlier.

Alexandre Courpron.

Sep 20 '08 #40

Jon Harrop

co******@gmail.com wrote:

On 20 sep, 20:18, Jon Harrop <j...@ffconsultancy.comwrote:
>Only if you add portability as a new requirement. That is extremely and
needlessly inefficient on platforms which support atomic pointer writes
though (which ARM, AMD and Intel all do). In practice, you would use
specialized code generation whereever possible because it will generate
vastly faster code. That is exactly what the JVM and CLR do, of course.
They are not going to put a lock around every pointer write if they can
possibly avoid it...

You are making some confusion about the atomicity of an operation and
the fact that such operation, **even if they are atomic**, still needs
to be *memory* synchronized ,at least on SMP architectures, to
overcome the ordering and cache problems which I described earlier.

Within the confines of your assumptions I agree but, in general, that is not
true and the distinction can be useful.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 20 '08 #41

courpron

On 21 sep, 00:12, Jon Harrop <j...@ffconsultancy.comwrote:

courp...@gmail.com wrote:
On 20 sep, 20:18, Jon Harrop <j...@ffconsultancy.comwrote:
Only if you add portability as a new requirement. That is extremely and
needlessly inefficient on platforms which support atomic pointer writes
though (which ARM, AMD and Intel all do). In practice, you would use
specialized code generation whereever possible because it will generate
vastly faster code. That is exactly what the JVM and CLR do, of course.
They are not going to put a lock around every pointer write if they can
possibly avoid it...

You are making some confusion about the atomicity of an operation and
the fact that such operation, **even if they are atomic**, still needs
to be *memory* synchronized ,at least on SMP architectures, to
overcome the ordering and cache problems which I described earlier.

Within the confines of your assumptions I agree but, in general, that is not
true

What is exactly wrong ? that you get the ordering and caching problem
on most SMP architectures ?

Alexandre Courpron.

Sep 21 '08 #42

courpron

On 21 sep, 00:12, Jon Harrop <j...@ffconsultancy.comwrote:

courp...@gmail.com wrote:
On 20 sep, 20:18, Jon Harrop <j...@ffconsultancy.comwrote:
Only if you add portability as a new requirement. That is extremely and
needlessly inefficient on platforms which support atomic pointer writes
though (which ARM, AMD and Intel all do). In practice, you would use
specialized code generation whereever possible because it will generate
vastly faster code. That is exactly what the JVM and CLR do, of course.
They are not going to put a lock around every pointer write if they can
possibly avoid it...

You are making some confusion about the atomicity of an operation and
the fact that such operation, **even if they are atomic**, still needs
to be *memory* synchronized ,at least on SMP architectures, to
overcome the ordering and cache problems which I described earlier.

[...] the distinction can be useful.

Do you mean the distinction between atomic operations and memory
synchronization ?
You did not make it when you say for example, about memory barriers :

>That is extremely and
needlessly inefficient on platforms which support atomic pointer writes
though (which ARM, AMD and Intel all do ).

Did you know that all of those CPUS, in a SMP configuration, have a
weak memory ordering model ?

Alexandre Courpron

Sep 21 '08 #43

Jon Harrop

co******@gmail.com wrote:

On 21 sep, 00:12, Jon Harrop <j...@ffconsultancy.comwrote:
>Within the confines of your assumptions I agree but, in general, that is
not true

What is exactly wrong? that you get the ordering and caching problem on
most SMP architectures?

The question is whether or not those are always a problem. They are a
problem if determinism is required but determinism is not always required,
e.g. when caching the results of pure computations.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 21 '08 #44

Chris M. Thomasson

[comp.programming.threads added]
"James Kanze" <ja*********@gmail.comwrote in message
news:6e**********************************@d1g2000h sg.googlegroups.com...
On Sep 13, 3:18 am, Bill David <billdavi...@gmail.comwrote:

SUBJECT: How to make this program more efficient?

In my program, a thread will check update from server
periodically and generate a stl::map for other part of this
program to read data from. Let's name the update method as
doUpdate and stl::map read methods as getData and copyData.
Since stl::map is not thread-safe, we should do
synchronization by ourselves. A usable solution is to create a
boost::mutex::scoped_lock object in all above methods to make
the access to all methods synchronized. But since the doUpdate
method will be executed periodically and the interval may be 1
hour or longer, while all other parts can only read the
stl::map, the above solution may do synchronization too much.
We only need synchronization on all methods when we are
doUpdating.

If there's any thread modifying the object, then all accesses
must be synchronized. Period.

Did you know that readers threads can access data-structures without sync?
Or perhaps, any explicit sync whatsoever? No atomic RMW, no membars... Think
about it. Virtually zero-overhead of reader threads indeed. Sounds to good
to be true? Why? Perhaps I can clarify.

Given the usage pattern, it may
be more interesting to use a rwlock than a mutex, but this
really depends on how long the readers hold the mutex, and how
many of them there are.

Sure. For read-mostly access patterns, WHY use a rwmutex? Did you know that
rwmutex can be totally suboptimal even in the presence of concurrent
mutators?

[...]

Sep 21 '08 #45

courpron

On 21 sep, 14:27, Jon Harrop <j...@ffconsultancy.comwrote:

courp...@gmail.com wrote:
On 21 sep, 00:12, Jon Harrop <j...@ffconsultancy.comwrote:
Within the confines of your assumptions I agree but, in general, that is
not true

What is exactly wrong? that you get the ordering and caching problem on
most SMP architectures?

The question is whether or not those are always a problem. They are a
problem if determinism is required but determinism is not always required,
e.g. when caching the results of pure computations.

I said you get the ordering problem and caching problem on most SMP
architectures (i.e they are inherent problems of SMP architectures), I
didn't say that you get those problems on all algorithms. In fact, I
can write a simple algorithm where no synchronization at all is needed
because the shared object is accessed(read and written) concurrently
but never used.
Back to the topic, without any specific details about the
architecture, the RCU, by default, needs memory barriers to access the
pointer.

Alexandre Courpron.

Sep 21 '08 #46

Chris M. Thomasson

<co******@gmail.comwrote in message
news:6c**********************************@w7g2000h sa.googlegroups.com...

On 21 sep, 14:27, Jon Harrop <j...@ffconsultancy.comwrote:
>courp...@gmail.com wrote:
On 21 sep, 00:12, Jon Harrop <j...@ffconsultancy.comwrote:
Within the confines of your assumptions I agree but, in general, that
is
not true

What is exactly wrong? that you get the ordering and caching problem on
most SMP architectures?

The question is whether or not those are always a problem. They are a
problem if determinism is required but determinism is not always
required,
e.g. when caching the results of pure computations.

I said you get the ordering problem and caching problem on most SMP
architectures (i.e they are inherent problems of SMP architectures), I
didn't say that you get those problems on all algorithms. In fact, I
can write a simple algorithm where no synchronization at all is needed
because the shared object is accessed(read and written) concurrently
but never used.
Back to the topic, without any specific details about the
architecture, the RCU, by default, needs memory barriers to access the
pointer.

RCU, from the point of reader threads, does NOT need explicit membars to
access and/or deference the pointer on virtually all major archs out there;
DEC Alpha aside for a moment. Dependant load barrier anyone?

;^D

Sep 21 '08 #47

courpron

On 21 sep, 14:36, "Chris M. Thomasson" <n...@spam.invalidwrote:

<courp...@gmail.comwrote in message

news:6c**********************************@w7g2000h sa.googlegroups.com...

On 21 sep, 14:27, Jon Harrop <j...@ffconsultancy.comwrote:
courp...@gmail.com wrote:
On 21 sep, 00:12, Jon Harrop <j...@ffconsultancy.comwrote:
Within the confines of your assumptions I agree but, in general, that
is
not true

What is exactly wrong? that you get the ordering and caching problem on
most SMP architectures?

The question is whether or not those are always a problem. They are a
problem if determinism is required but determinism is not always
required,
e.g. when caching the results of pure computations.

I said you get the ordering problem and caching problem on most SMP
architectures (i.e they are inherent problems of SMP architectures), I
didn't say that you get those problems on all algorithms. In fact, I
can write a simple algorithm where no synchronization at all is needed
because the shared object is accessed(read and written) concurrently
but never used.
Back to the topic, without any specific details about the
architecture, the RCU, by default, needs memory barriers to access the
pointer.

RCU, from the point of reader threads, does NOT need explicit membars to
access and/or deference the pointer on virtually all major archs out there;
DEC Alpha aside for a moment. Dependant load barrier anyone?

Or on any non-CC architectures. As I said, *without any specific
details about the architecture*, RCU needs, by default, membars for
reads and writes. However, for read accesses, membars should indeed
not be needed on most common architectures.

Alexandre Courpron.

Sep 21 '08 #48

James Kanze

On Sep 21, 2:19 pm, "Chris M. Thomasson" <n...@spam.invalidwrote:

[comp.programming.threads added]

"James Kanze" <james.ka...@gmail.comwrote in message

news:6e**********************************@d1g2000h sg.googlegroups.com...
On Sep 13, 3:18 am, Bill David <billdavi...@gmail.comwrote:

SUBJECT: How to make this program more efficient?
In my program, a thread will check update from server
periodically and generate a stl::map for other part of this
program to read data from. Let's name the update method as
doUpdate and stl::map read methods as getData and copyData.
Since stl::map is not thread-safe, we should do
synchronization by ourselves. A usable solution is to create a
boost::mutex::scoped_lock object in all above methods to make
the access to all methods synchronized. But since the doUpdate
method will be executed periodically and the interval may be 1
hour or longer, while all other parts can only read the
stl::map, the above solution may do synchronization too much.
We only need synchronization on all methods when we are
doUpdating.
If there's any thread modifying the object, then all accesses
must be synchronized. Period.

Did you know that readers threads can access data-structures
without sync?

Not according to Posix, and not on many machines. Reading
without a sync will probably give you some value, but not
necessarily the last written.

Or perhaps, any explicit sync whatsoever? No atomic RMW, no
membars... Think about it. Virtually zero-overhead of reader
threads indeed. Sounds to good to be true? Why? Perhaps I can
clarify.

Given the usage pattern, it may be more interesting to use a
rwlock than a mutex, but this really depends on how long the
readers hold the mutex, and how many of them there are.

Sure. For read-mostly access patterns, WHY use a rwmutex? Did
you know that rwmutex can be totally suboptimal even in the
presence of concurrent mutators?

And? A rwlock, even when acquired for just reading, ensures
memory synchronization.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 21 '08 #49

James Kanze

On Sep 21, 5:57 am, Jerry Coffin <jcof...@taeus.comwrote:

In article <8d804b66-88fa-4b59-b3be-1d9e9cd89e88
@m44g2000hsc.googlegroups.com>, james.ka...@gmail.com says...

[...]

Interestingly, the layout I'm talking about will generally work
correctly even on an architecture with virtually _no_ support for atomic
operations -- modifying one bit in a word will normally be atomic
regardless, simply because it's essentially impossible to spread a write
fo a single bit over more than one clock cycle.

Writing a single bit is NOT atomic on any of the machines I'm
familiar with. Because it is essentially impossible to write a
single bit, so writing a single bit is implemented as
read/modify/write.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 21 '08 #50

How to make this program more efficient?

Similar topics