473,499 Members | 1,572 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Re: atomically thread-safe Meyers singleton impl (fixed)...

You need compiler barriers (_ReadWriteBarrier() in MSVC) to ensure
things don't get rearranged across your atomic access
functions. There's no need to drop to assembler either: you're not
doing anything more complicated than a simple MOV.

Anyway, if I was writing this (and I wouldn't be, because I really
dislike singletons), I'd just use boost::call_once. It doesn't use a
lock unless it has to and is portable across pthreads and win32
threads.

Oh, and one other thing: you don't need inline assembler for atomic
ops with gcc from version 4.2 onwards, as the compiler has built-in
functions for atomic operations.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
Jul 30 '08 #1
8 3409
On 30 ÉÀÌ, 11:20, Anthony Williams <anthony....@gmail.comwrote:
You need compiler barriers (_ReadWriteBarrier() in MSVC) to ensure
things don't get rearranged across your atomic access
functions. There's no need to drop to assembler either: you're not
doing anything more complicated than a simple MOV.

In MSVC one can use just volatile variables, accesses to volatile
variables guaranteed to be 'load-acquire' and 'store-release'. I.e. on
Itanium/PPC MSVC will emit hardware memory fences (along with compiler
fences).

http://msdn.microsoft.com/en-us/library/12a04hfd.aspx

Dmitriy V'jukov
Jul 30 '08 #2
"Dmitriy V'jukov" <dv*****@gmail.comwrites:
On 30 июл, 11:20, Anthony Williams <anthony....@gmail.comwrote:
>You need compiler barriers (_ReadWriteBarrier() in MSVC) to ensure
things don't get rearranged across your atomic access
functions. There's no need to drop to assembler either: you're not
doing anything more complicated than a simple MOV.


In MSVC one can use just volatile variables, accesses to volatile
variables guaranteed to be 'load-acquire' and 'store-release'. I.e. on
Itanium/PPC MSVC will emit hardware memory fences (along with compiler
fences).

http://msdn.microsoft.com/en-us/library/12a04hfd.aspx
I am aware of this. However, the description only says it orders
accesses to "global and static data", and that it refers to objects
*declared* as volatile. I haven't tested it enough to be confident
that it is entirely equivalent to _ReadWriteBarrier(), and that it
works on variables *cast* to volatile.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
Jul 30 '08 #3

"Anthony Williams" <an*********@gmail.comwrote in message
news:u6***********@gmail.com...
"Chris M. Thomasson" <no@spam.invalidwrites:
[...]
The algorithm used by boost::call_once on pthreads platforms is
described here:

http://www.open-std.org/jtc1/sc22/wg...007/n2444.html
>>It doesn't use a
lock unless it has to and is portable across threads and win32
threads.

The code I posted does not use a lock unless it absolutely has to
because it attempts to efficiently take advantage of the double
checked locking pattern.

Oh yes, I realise that: the code for call_once is similar. However, it
attempts to avoid contention on the mutex by using thread-local
storage. If you have atomic ops, you can go even further in
eliminating the mutex, e.g. using compare_exchange and fetch_add.
[...]

Before I reply to your entire post I should point out that:

http://groups.google.com/group/comp....9c7aff738f9102

the Boost mechanism is not 100% portable, but is elegant in practice. It
uses a similar technique that a certain distributed reference counting
algorithm I created claims:

http://groups.google.com/group/comp....16861ac568b2be

http://groups.google.com/group/comp....hreads&q=vzoom

Not 100% portable, but _highly_ portable indeed!

Jul 30 '08 #4
"Chris M. Thomasson" <no@spam.invalidwrites:
"Anthony Williams" <an*********@gmail.comwrote in message
news:u6***********@gmail.com...
>"Chris M. Thomasson" <no@spam.invalidwrites:
[...]
>The algorithm used by boost::call_once on pthreads platforms is
described here:

http://www.open-std.org/jtc1/sc22/wg...007/n2444.html
>>>It doesn't use a
lock unless it has to and is portable across threads and win32
threads.

The code I posted does not use a lock unless it absolutely has to
because it attempts to efficiently take advantage of the double
checked locking pattern.

Oh yes, I realise that: the code for call_once is similar. However, it
attempts to avoid contention on the mutex by using thread-local
storage. If you have atomic ops, you can go even further in
eliminating the mutex, e.g. using compare_exchange and fetch_add.
[...]

Before I reply to your entire post I should point out that:

http://groups.google.com/group/comp....9c7aff738f9102

the Boost mechanism is not 100% portable, but is elegant in
practice.
Yes. If you look at the whole thread, you'll see a comment by me there
where I admit as much.
It uses a similar technique that a certain distributed
reference counting algorithm I created claims:
I wasn't aware that you were using something similar in vZOOM.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
Jul 30 '08 #5
"Anthony Williams" <an*********@gmail.comwrote in message
news:uh***********@gmail.com...
"Chris M. Thomasson" <no@spam.invalidwrites:
>"Anthony Williams" <an*********@gmail.comwrote in message
news:u6***********@gmail.com...
>>"Chris M. Thomasson" <no@spam.invalidwrites:
[...]
>>The algorithm used by boost::call_once on pthreads platforms is
described here:

http://www.open-std.org/jtc1/sc22/wg...007/n2444.html

It doesn't use a
lock unless it has to and is portable across threads and win32
threads.

The code I posted does not use a lock unless it absolutely has to
because it attempts to efficiently take advantage of the double
checked locking pattern.

Oh yes, I realise that: the code for call_once is similar. However, it
attempts to avoid contention on the mutex by using thread-local
storage. If you have atomic ops, you can go even further in
eliminating the mutex, e.g. using compare_exchange and fetch_add.
[...]

Before I reply to your entire post I should point out that:

http://groups.google.com/group/comp....9c7aff738f9102

the Boost mechanism is not 100% portable, but is elegant in
practice.

Yes. If you look at the whole thread, you'll see a comment by me there
where I admit as much.
Does the following line:

__thread fast_pthread_once_t _fast_pthread_once_per_thread_epoch;

explicitly set `_fast_pthread_once_per_thread_epoch' to zero? If so, is it
guaranteed?

>It uses a similar technique that a certain distributed
reference counting algorithm I created claims:

I wasn't aware that you were using something similar in vZOOM.
Humm, now that I think about it, it seems like I am totally mistaken. The
"most portable" version of vZOOM relies on an assumption that pointer
load/stores are atomic and the unlocking of a mutex executes at least a
release-barrier, and the loading of a shared variable executes at least a
data-dependant load-barrier; very similar to RCU without the explicit
#LoadStore | #StoreStore before storing into a shared pointer location...
Something like:


__________________________________________________ __________________
struct foo {
int a;
};
static foo* shared_f = NULL;
// single producer thread {
foo* local_f = new foo;
pthread_mutex_t* lock = get_per_thread_mutex();
pthread_mutex_lock(lock);
local_f->a = 666;
pthread_mutex_unlock(lock);
shared_f = local_f;
}
// single consumer thread {
foo* local_f;
while (! (local_f = shared_f)) {
sched_yield();
}
assert(local_f->a == 666);
delete local_f;
}
__________________________________________________ __________________

If the `pthread_mutex_unlock()' function does not execute at least a
release-barrier in the producer, and if the load of the shared variable does
not execute at least a data-dependant load-barrier in the consumer, the
"most portable" version of vZOOM will NOT work on that platform in any way
shape or form, it will need a platform-dependant version. However, the only
platform I can think of where the intra-node memory visibility requirements
do not hold is the Alpha... For multi-node super-computers, inter-node
communication is adapted to using MPI.

Jul 30 '08 #6
"Chris M. Thomasson" <no@spam.invalidwrites:
"Anthony Williams" <an*********@gmail.comwrote in message
news:uh***********@gmail.com...
>"Chris M. Thomasson" <no@spam.invalidwrites:
>>"Anthony Williams" <an*********@gmail.comwrote in message
news:u6***********@gmail.com...
"Chris M. Thomasson" <no@spam.invalidwrites:
[...]
The algorithm used by boost::call_once on pthreads platforms is
described here:

http://www.open-std.org/jtc1/sc22/wg...007/n2444.html

>It doesn't use a
>lock unless it has to and is portable across threads and win32
>threads.
>
The code I posted does not use a lock unless it absolutely has to
because it attempts to efficiently take advantage of the double
checked locking pattern.

Oh yes, I realise that: the code for call_once is similar. However, it
attempts to avoid contention on the mutex by using thread-local
storage. If you have atomic ops, you can go even further in
eliminating the mutex, e.g. using compare_exchange and fetch_add.
[...]

Before I reply to your entire post I should point out that:

http://groups.google.com/group/comp....9c7aff738f9102

the Boost mechanism is not 100% portable, but is elegant in
practice.

Yes. If you look at the whole thread, you'll see a comment by me there
where I admit as much.

Does the following line:

__thread fast_pthread_once_t _fast_pthread_once_per_thread_epoch;

explicitly set `_fast_pthread_once_per_thread_epoch' to zero? If so,
is it guaranteed?
The algorithm assumes it does, but it depends which compiler you
user. In the Boost implementation, the value is explicitly
initialized (to ~0 --- I found it worked better with exception
handling to count backwards).
>
>>It uses a similar technique that a certain distributed
reference counting algorithm I created claims:

I wasn't aware that you were using something similar in vZOOM.

Humm, now that I think about it, it seems like I am totally
mistaken. The "most portable" version of vZOOM relies on an assumption
that pointer load/stores are atomic and the unlocking of a mutex
executes at least a release-barrier, and the loading of a shared
variable executes at least a data-dependant load-barrier; very similar
to RCU without the explicit #LoadStore | #StoreStore before storing
into a shared pointer location... Something like:

// single producer thread {
foo* local_f = new foo;
pthread_mutex_t* lock = get_per_thread_mutex();
pthread_mutex_lock(lock);
local_f->a = 666;
pthread_mutex_unlock(lock);
shared_f = local_f;
So you're using the lock just for the barrier properties. Interesting
idea.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
Jul 30 '08 #7

"Anthony Williams" <an*********@gmail.comwrote in message
news:ud***********@gmail.com...
"Chris M. Thomasson" <no@spam.invalidwrites:
[...]
>>>the Boost mechanism is not 100% portable, but is elegant in
practice.

Yes. If you look at the whole thread, you'll see a comment by me there
where I admit as much.

Does the following line:

__thread fast_pthread_once_t _fast_pthread_once_per_thread_epoch;

explicitly set `_fast_pthread_once_per_thread_epoch' to zero? If so,
is it guaranteed?

The algorithm assumes it does, but it depends which compiler you
user. In the Boost implementation, the value is explicitly
initialized (to ~0 --- I found it worked better with exception
handling to count backwards).
>>
>>>It uses a similar technique that a certain distributed
reference counting algorithm I created claims:

I wasn't aware that you were using something similar in vZOOM.

Humm, now that I think about it, it seems like I am totally
mistaken. The "most portable" version of vZOOM relies on an assumption
that pointer load/stores are atomic and the unlocking of a mutex
executes at least a release-barrier, and the loading of a shared
variable executes at least a data-dependant load-barrier; very similar
to RCU without the explicit #LoadStore | #StoreStore before storing
into a shared pointer location... Something like:

// single producer thread {
foo* local_f = new foo;
pthread_mutex_t* lock = get_per_thread_mutex();
pthread_mutex_lock(lock);
local_f->a = 666;
pthread_mutex_unlock(lock);
shared_f = local_f;

So you're using the lock just for the barrier properties. Interesting
idea.
Yes. Actually, I did not show the whole algorithm. The code above is busted
because I forgot to show it all; STUPID ME!!! Its busted because the store
to shared_f can legally be hoisted up above the unlock. Here is the whole
picture... Each thread has a special dedicated mutex which is locked from
its birth... Here is exactly how production of an object can occur:
static foo* volatile shared_f = NULL;

// single producer thread {
00: foo* local_f;
01: pthread_mutex_t* const mem_mutex = get_per_thread_mem_mutex();
02: local_f = new foo;
03: local_f->a = 666;
04: pthread_mutex_unlock(mem_mutex);
05: pthread_mutex_lock(mem_mutex);
06: shared_f = local_f;
}
Here are the production rules wrt POSIX:

1. Steps 02-03 CANNOT sink below step 04
2. Step 06 CANNOT rise above step 05
3. vZOOM assumes that step 04 has a release barrier

Those __two__guarantees__and__single__assumption__ ensure the ordering and
visibility of the operations is correct. After that, the consumer can do:
// single consumer thread {
00: foo* local_f;
01: while (! (local_f = shared_f)) {
02: sched_yield();
}
03: assert(local_f->a == 666);
04: delete local_f;
}
Consumption rules:

01: vZOOM assumes that the load from `shared_f' will have implied
data-dependant load-barrier.

BTW, here is a brief outline of how the "most portable" version of vZOOM
distributed reference counting works with the above idea:

http://groups.google.ru/group/comp.p...e9b6e427b4a144

http://groups.google.com/group/comp....24fe99f742ce6e
(an __execlelent__ question from Dmitriy...)


What do you think Anthony?

Jul 30 '08 #8
[...]
>
BTW, here is a brief outline of how the "most portable" version of vZOOM
distributed reference counting works with the above idea:

http://groups.google.ru/group/comp.p...e9b6e427b4a144
Take note of the per-thread memory lock. Its vital to vZOOM.
>
http://groups.google.com/group/comp....24fe99f742ce6e
(an __execlelent__ question from Dmitriy...)


What do you think Anthony?
Jul 30 '08 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

38
2836
by: Anthony Baxter | last post by:
On behalf of the Python development team and the Python community, I'm happy to announce the release of Python 2.3.1 (final). Python 2.3.1 is a pure bug fix release of Python 2.3, released in...
4
2872
by: Gilles Leblanc | last post by:
Hi I have started a small project with PyOpenGL. I am wondering what are the options for a GUI. So far I checked PyUI but it has some problems with 3d rendering outside the Windows platform. I...
31
2433
by: AlexeiOst | last post by:
Everywhere in documentation there are recommendations to use threads from thread pooling for relatively short tasks. As I understand, fetching a page or multiple pages (sometimes up to 50 but not...
16
1605
by: yuraukar | last post by:
I am trying to create a garbage collection class in C++ to collect instances of a specific class (so no general gc). The approach is to use smart pointers and a mark and a simple sweep gc. ...
14
1823
by: Michi Henning | last post by:
Hi, I can't find a statement about this in the threading sections in the doc... Consider: class Class1 { Class1() { _val = 42;
7
2681
by: Ivan | last post by:
Hi I have following problem: I'm creating two threads who are performing some tasks. When one thread finished I would like to restart her again (e.g. new job). Following example demonstrates...
9
3048
by: mareal | last post by:
I have noticed how the thread I created just stops running. I have added several exceptions to the thread System.Threading.SynchronizationLockException System.Threading.ThreadAbortException...
2
1163
by: Enigma Curry | last post by:
Can some kind person please further my education on Threads? When I create a thread called "t" and I do a "t.start()" am I guaranteed that "t.isAlive()" will return True as long as the thread...
3
34911
by: John Nagle | last post by:
There's no way to set thread priorities within Python, is there? We have some threads that go compute-bound, and would like to reduce their priority slightly so the other operations, like...
23
4246
by: =?GB2312?B?0rvK18qr?= | last post by:
Hi all, Recently I had a new coworker. There is some dispute between us. The last company he worked for has a special networking programming model. They split the business logic into...
0
7171
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7220
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
7386
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5468
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
4918
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
3098
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
1427
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
664
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
295
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.