Re: atomically thread-safe Meyers singleton impl (fixed)...

Anthony Williams

You need compiler barriers (_ReadWriteBarrier() in MSVC) to ensure
things don't get rearranged across your atomic access
functions. There's no need to drop to assembler either: you're not
doing anything more complicated than a simple MOV.

Anyway, if I was writing this (and I wouldn't be, because I really
dislike singletons), I'd just use boost::call_once. It doesn't use a
lock unless it has to and is portable across pthreads and win32
threads.

Oh, and one other thing: you don't need inline assembler for atomic
ops with gcc from version 4.2 onwards, as the compiler has built-in
functions for atomic operations.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Jul 30 '08 #1

Subscribe Reply

3409

Dmitriy V'jukov

On 30 ÉÀÌ, 11:20, Anthony Williams <anthony....@gmail.comwrote:

You need compiler barriers (_ReadWriteBarrier() in MSVC) to ensure
things don't get rearranged across your atomic access
functions. There's no need to drop to assembler either: you're not
doing anything more complicated than a simple MOV.

In MSVC one can use just volatile variables, accesses to volatile
variables guaranteed to be 'load-acquire' and 'store-release'. I.e. on
Itanium/PPC MSVC will emit hardware memory fences (along with compiler
fences).

http://msdn.microsoft.com/en-us/library/12a04hfd.aspx

Dmitriy V'jukov

Jul 30 '08 #2

Anthony Williams

"Dmitriy V'jukov" <dv*****@gmail.comwrites:

On 30 Ð¸ÑŽÐ», 11:20, Anthony Williams <anthony....@gmail.comwrote:
>You need compiler barriers (_ReadWriteBarrier() in MSVC) to ensure
things don't get rearranged across your atomic access
functions. There's no need to drop to assembler either: you're not
doing anything more complicated than a simple MOV.

In MSVC one can use just volatile variables, accesses to volatile
variables guaranteed to be 'load-acquire' and 'store-release'. I.e. on
Itanium/PPC MSVC will emit hardware memory fences (along with compiler
fences).

http://msdn.microsoft.com/en-us/library/12a04hfd.aspx

I am aware of this. However, the description only says it orders
accesses to "global and static data", and that it refers to objects
*declared* as volatile. I haven't tested it enough to be confident
that it is entirely equivalent to _ReadWriteBarrier(), and that it
works on variables *cast* to volatile.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Jul 30 '08 #3

Chris M. Thomasson

"Anthony Williams" <an*********@gmail.comwrote in message
news:u6***********@gmail.com...

"Chris M. Thomasson" <no@spam.invalidwrites:

[...]

The algorithm used by boost::call_once on pthreads platforms is
described here:

http://www.open-std.org/jtc1/sc22/wg...007/n2444.html

>>It doesn't use a
lock unless it has to and is portable across threads and win32
threads.

The code I posted does not use a lock unless it absolutely has to
because it attempts to efficiently take advantage of the double
checked locking pattern.

Oh yes, I realise that: the code for call_once is similar. However, it
attempts to avoid contention on the mutex by using thread-local
storage. If you have atomic ops, you can go even further in
eliminating the mutex, e.g. using compare_exchange and fetch_add.

[...]

Before I reply to your entire post I should point out that:

http://groups.google.com/group/comp....9c7aff738f9102

the Boost mechanism is not 100% portable, but is elegant in practice. It
uses a similar technique that a certain distributed reference counting
algorithm I created claims:

http://groups.google.com/group/comp....16861ac568b2be

http://groups.google.com/group/comp....hreads&q=vzoom

Not 100% portable, but _highly_ portable indeed!

Jul 30 '08 #4

Anthony Williams

"Chris M. Thomasson" <no@spam.invalidwrites:

"Anthony Williams" <an*********@gmail.comwrote in message
news:u6***********@gmail.com...
>"Chris M. Thomasson" <no@spam.invalidwrites:
[...]
>The algorithm used by boost::call_once on pthreads platforms is
described here:

http://www.open-std.org/jtc1/sc22/wg...007/n2444.html

>>>It doesn't use a
lock unless it has to and is portable across threads and win32
threads.

The code I posted does not use a lock unless it absolutely has to
because it attempts to efficiently take advantage of the double
checked locking pattern.

Oh yes, I realise that: the code for call_once is similar. However, it
attempts to avoid contention on the mutex by using thread-local
storage. If you have atomic ops, you can go even further in
eliminating the mutex, e.g. using compare_exchange and fetch_add.
[...]

Before I reply to your entire post I should point out that:

http://groups.google.com/group/comp....9c7aff738f9102

the Boost mechanism is not 100% portable, but is elegant in
practice.

Yes. If you look at the whole thread, you'll see a comment by me there
where I admit as much.

It uses a similar technique that a certain distributed
reference counting algorithm I created claims:

I wasn't aware that you were using something similar in vZOOM.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Jul 30 '08 #5

Chris M. Thomasson

"Anthony Williams" <an*********@gmail.comwrote in message
news:uh***********@gmail.com...

"Chris M. Thomasson" <no@spam.invalidwrites:

>"Anthony Williams" <an*********@gmail.comwrote in message
news:u6***********@gmail.com...
>>"Chris M. Thomasson" <no@spam.invalidwrites:
[...]
>>The algorithm used by boost::call_once on pthreads platforms is
described here:

http://www.open-std.org/jtc1/sc22/wg...007/n2444.html

It doesn't use a
lock unless it has to and is portable across threads and win32
threads.

The code I posted does not use a lock unless it absolutely has to
because it attempts to efficiently take advantage of the double
checked locking pattern.

Oh yes, I realise that: the code for call_once is similar. However, it
attempts to avoid contention on the mutex by using thread-local
storage. If you have atomic ops, you can go even further in
eliminating the mutex, e.g. using compare_exchange and fetch_add.
[...]

Before I reply to your entire post I should point out that:

http://groups.google.com/group/comp....9c7aff738f9102

the Boost mechanism is not 100% portable, but is elegant in
practice.

Yes. If you look at the whole thread, you'll see a comment by me there
where I admit as much.

Does the following line:

__thread fast_pthread_once_t _fast_pthread_once_per_thread_epoch;

explicitly set `_fast_pthread_once_per_thread_epoch' to zero? If so, is it
guaranteed?

>It uses a similar technique that a certain distributed
reference counting algorithm I created claims:

I wasn't aware that you were using something similar in vZOOM.

Humm, now that I think about it, it seems like I am totally mistaken. The
"most portable" version of vZOOM relies on an assumption that pointer
load/stores are atomic and the unlocking of a mutex executes at least a
release-barrier, and the loading of a shared variable executes at least a
data-dependant load-barrier; very similar to RCU without the explicit
#LoadStore | #StoreStore before storing into a shared pointer location...
Something like:

__________________________________________________ __________________
struct foo {
int a;
};
static foo* shared_f = NULL;
// single producer thread {
foo* local_f = new foo;
pthread_mutex_t* lock = get_per_thread_mutex();
pthread_mutex_lock(lock);
local_f->a = 666;
pthread_mutex_unlock(lock);
shared_f = local_f;
}
// single consumer thread {
foo* local_f;
while (! (local_f = shared_f)) {
sched_yield();
}
assert(local_f->a == 666);
delete local_f;
}
__________________________________________________ __________________

If the `pthread_mutex_unlock()' function does not execute at least a
release-barrier in the producer, and if the load of the shared variable does
not execute at least a data-dependant load-barrier in the consumer, the
"most portable" version of vZOOM will NOT work on that platform in any way
shape or form, it will need a platform-dependant version. However, the only
platform I can think of where the intra-node memory visibility requirements
do not hold is the Alpha... For multi-node super-computers, inter-node
communication is adapted to using MPI.

Jul 30 '08 #6

Anthony Williams

"Chris M. Thomasson" <no@spam.invalidwrites:

"Anthony Williams" <an*********@gmail.comwrote in message
news:uh***********@gmail.com...
>"Chris M. Thomasson" <no@spam.invalidwrites:

>>"Anthony Williams" <an*********@gmail.comwrote in message
news:u6***********@gmail.com...
"Chris M. Thomasson" <no@spam.invalidwrites:
[...]
The algorithm used by boost::call_once on pthreads platforms is
described here:

http://www.open-std.org/jtc1/sc22/wg...007/n2444.html

>It doesn't use a
>lock unless it has to and is portable across threads and win32
>threads.
>
The code I posted does not use a lock unless it absolutely has to
because it attempts to efficiently take advantage of the double
checked locking pattern.

Oh yes, I realise that: the code for call_once is similar. However, it
attempts to avoid contention on the mutex by using thread-local
storage. If you have atomic ops, you can go even further in
eliminating the mutex, e.g. using compare_exchange and fetch_add.
[...]

Before I reply to your entire post I should point out that:

http://groups.google.com/group/comp....9c7aff738f9102

the Boost mechanism is not 100% portable, but is elegant in
practice.

Yes. If you look at the whole thread, you'll see a comment by me there
where I admit as much.

Does the following line:

__thread fast_pthread_once_t _fast_pthread_once_per_thread_epoch;

explicitly set `_fast_pthread_once_per_thread_epoch' to zero? If so,
is it guaranteed?

The algorithm assumes it does, but it depends which compiler you
user. In the Boost implementation, the value is explicitly
initialized (to ~0 --- I found it worked better with exception
handling to count backwards).

>

>>It uses a similar technique that a certain distributed
reference counting algorithm I created claims:

I wasn't aware that you were using something similar in vZOOM.

Humm, now that I think about it, it seems like I am totally
mistaken. The "most portable" version of vZOOM relies on an assumption
that pointer load/stores are atomic and the unlocking of a mutex
executes at least a release-barrier, and the loading of a shared
variable executes at least a data-dependant load-barrier; very similar
to RCU without the explicit #LoadStore | #StoreStore before storing
into a shared pointer location... Something like:

// single producer thread {
foo* local_f = new foo;
pthread_mutex_t* lock = get_per_thread_mutex();
pthread_mutex_lock(lock);
local_f->a = 666;
pthread_mutex_unlock(lock);
shared_f = local_f;

So you're using the lock just for the barrier properties. Interesting
idea.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Jul 30 '08 #7

Chris M. Thomasson

"Anthony Williams" <an*********@gmail.comwrote in message
news:ud***********@gmail.com...

"Chris M. Thomasson" <no@spam.invalidwrites:

[...]

>>>the Boost mechanism is not 100% portable, but is elegant in
practice.

Yes. If you look at the whole thread, you'll see a comment by me there
where I admit as much.

Does the following line:

__thread fast_pthread_once_t _fast_pthread_once_per_thread_epoch;

explicitly set `_fast_pthread_once_per_thread_epoch' to zero? If so,
is it guaranteed?

The algorithm assumes it does, but it depends which compiler you
user. In the Boost implementation, the value is explicitly
initialized (to ~0 --- I found it worked better with exception
handling to count backwards).

>>
>>>It uses a similar technique that a certain distributed
reference counting algorithm I created claims:

I wasn't aware that you were using something similar in vZOOM.

Humm, now that I think about it, it seems like I am totally
mistaken. The "most portable" version of vZOOM relies on an assumption
that pointer load/stores are atomic and the unlocking of a mutex
executes at least a release-barrier, and the loading of a shared
variable executes at least a data-dependant load-barrier; very similar
to RCU without the explicit #LoadStore | #StoreStore before storing
into a shared pointer location... Something like:

// single producer thread {
foo* local_f = new foo;
pthread_mutex_t* lock = get_per_thread_mutex();
pthread_mutex_lock(lock);
local_f->a = 666;
pthread_mutex_unlock(lock);
shared_f = local_f;

So you're using the lock just for the barrier properties. Interesting
idea.

Yes. Actually, I did not show the whole algorithm. The code above is busted
because I forgot to show it all; STUPID ME!!! Its busted because the store
to shared_f can legally be hoisted up above the unlock. Here is the whole
picture... Each thread has a special dedicated mutex which is locked from
its birth... Here is exactly how production of an object can occur:
static foo* volatile shared_f = NULL;

// single producer thread {
00: foo* local_f;
01: pthread_mutex_t* const mem_mutex = get_per_thread_mem_mutex();
02: local_f = new foo;
03: local_f->a = 666;
04: pthread_mutex_unlock(mem_mutex);
05: pthread_mutex_lock(mem_mutex);
06: shared_f = local_f;
}
Here are the production rules wrt POSIX:

1. Steps 02-03 CANNOT sink below step 04
2. Step 06 CANNOT rise above step 05
3. vZOOM assumes that step 04 has a release barrier

Those __two__guarantees__and__single__assumption__ ensure the ordering and
visibility of the operations is correct. After that, the consumer can do:
// single consumer thread {
00: foo* local_f;
01: while (! (local_f = shared_f)) {
02: sched_yield();
}
03: assert(local_f->a == 666);
04: delete local_f;
}
Consumption rules:

01: vZOOM assumes that the load from `shared_f' will have implied
data-dependant load-barrier.

BTW, here is a brief outline of how the "most portable" version of vZOOM
distributed reference counting works with the above idea:

http://groups.google.ru/group/comp.p...e9b6e427b4a144

http://groups.google.com/group/comp....24fe99f742ce6e
(an __execlelent__ question from Dmitriy...)

What do you think Anthony?

Jul 30 '08 #8

Chris M. Thomasson

[...]

>
BTW, here is a brief outline of how the "most portable" version of vZOOM
distributed reference counting works with the above idea:

http://groups.google.ru/group/comp.p...e9b6e427b4a144

Take note of the per-thread memory lock. Its vital to vZOOM.

>
http://groups.google.com/group/comp....24fe99f742ce6e
(an __execlelent__ question from Dmitriy...)

What do you think Anthony?

Jul 30 '08 #9

Similar topics

2836

RELEASED Python 2.3.1

by: Anthony Baxter | last post by:

On behalf of the Python development team and the Python community, I'm happy to announce the release of Python 2.3.1 (final). Python 2.3.1 is a pure bug fix release of Python 2.3, released in...

Python

2872

Using PyOpenGL what should I use for a GUI ?

by: Gilles Leblanc | last post by:

Hi I have started a small project with PyOpenGL. I am wondering what are the options for a GUI. So far I checked PyUI but it has some problems with 3d rendering outside the Windows platform. I...

Python

2433

thread pooling and short lived threads

by: AlexeiOst | last post by:

Everywhere in documentation there are recommendations to use threads from thread pooling for relatively short tasks. As I understand, fetching a page or multiple pages (sometimes up to 50 but not...

.NET Framework

1605

Thread problem (constructor+assignement in one step?)

by: yuraukar | last post by:

I am trying to create a garbage collection class in C++ to collect instances of a specific class (so no general gc). The approach is to use smart pointers and a mark and a simple sweep gc. ...

C / C++

1823

Thread safety of readonly members

by: Michi Henning | last post by:

Hi, I can't find a statement about this in the threading sections in the doc... Consider: class Class1 { Class1() { _val = 42;

C# / C Sharp

2681

Thread synchronization problem

by: Ivan | last post by:

Hi I have following problem: I'm creating two threads who are performing some tasks. When one thread finished I would like to restart her again (e.g. new job). Following example demonstrates...

C# / C Sharp

3048

Thread just stops running

by: mareal | last post by:

I have noticed how the thread I created just stops running. I have added several exceptions to the thread System.Threading.SynchronizationLockException System.Threading.ThreadAbortException...

ASP.NET

1163

Threads: does Thread.start() atomically set Thread.__started ?

by: Enigma Curry | last post by:

Can some kind person please further my education on Threads? When I create a thread called "t" and I do a "t.start()" am I guaranteed that "t.isAlive()" will return True as long as the thread...

Python

34911

Setting thread priorities

by: John Nagle | last post by:

There's no way to set thread priorities within Python, is there? We have some threads that go compute-bound, and would like to reduce their priority slightly so the other operations, like...

Python

4246

Thread Pool versus Dedicated Threads

by: =?GB2312?B?0rvK18qr?= | last post by:

Hi all, Recently I had a new coworker. There is some dispute between us. The last company he worked for has a special networking programming model. They split the business logic into...

C / C++

7171

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

7220

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

7386

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

5468

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

4918

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

3098

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

1427

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

664

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

295

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

General