Re: atomically thread-safe Meyers singleton impl (fixed)...

Anthony Williams

You need compiler barriers (_ReadWriteBarrier() in MSVC) to ensure
things don't get rearranged across your atomic access
functions. There's no need to drop to assembler either: you're not
doing anything more complicated than a simple MOV.

Anyway, if I was writing this (and I wouldn't be, because I really
dislike singletons), I'd just use boost::call_once. It doesn't use a
lock unless it has to and is portable across pthreads and win32
threads.

Oh, and one other thing: you don't need inline assembler for atomic
ops with gcc from version 4.2 onwards, as the compiler has built-in
functions for atomic operations.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Jul 30 '08 #1

Subscribe Post Reply

3400

Dmitriy V'jukov

On 30 ÉÀÌ, 11:20, Anthony Williams <anthony....@gmail.comwrote:

You need compiler barriers (_ReadWriteBarrier() in MSVC) to ensure
things don't get rearranged across your atomic access
functions. There's no need to drop to assembler either: you're not
doing anything more complicated than a simple MOV.

In MSVC one can use just volatile variables, accesses to volatile
variables guaranteed to be 'load-acquire' and 'store-release'. I.e. on
Itanium/PPC MSVC will emit hardware memory fences (along with compiler
fences).

http://msdn.microsoft.com/en-us/library/12a04hfd.aspx

Dmitriy V'jukov

Jul 30 '08 #2

Anthony Williams

"Dmitriy V'jukov" <dv*****@gmail.comwrites:

On 30 Ð¸ÑŽÐ», 11:20, Anthony Williams <anthony....@gmail.comwrote:
>You need compiler barriers (_ReadWriteBarrier() in MSVC) to ensure
things don't get rearranged across your atomic access
functions. There's no need to drop to assembler either: you're not
doing anything more complicated than a simple MOV.

In MSVC one can use just volatile variables, accesses to volatile
variables guaranteed to be 'load-acquire' and 'store-release'. I.e. on
Itanium/PPC MSVC will emit hardware memory fences (along with compiler
fences).

http://msdn.microsoft.com/en-us/library/12a04hfd.aspx

I am aware of this. However, the description only says it orders
accesses to "global and static data", and that it refers to objects
*declared* as volatile. I haven't tested it enough to be confident
that it is entirely equivalent to _ReadWriteBarrier(), and that it
works on variables *cast* to volatile.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Jul 30 '08 #3

Chris M. Thomasson

"Anthony Williams" <an*********@gmail.comwrote in message
news:u6***********@gmail.com...

"Chris M. Thomasson" <no@spam.invalidwrites:

[...]

The algorithm used by boost::call_once on pthreads platforms is
described here:

http://www.open-std.org/jtc1/sc22/wg...007/n2444.html

>>It doesn't use a
lock unless it has to and is portable across threads and win32
threads.

The code I posted does not use a lock unless it absolutely has to
because it attempts to efficiently take advantage of the double
checked locking pattern.

Oh yes, I realise that: the code for call_once is similar. However, it
attempts to avoid contention on the mutex by using thread-local
storage. If you have atomic ops, you can go even further in
eliminating the mutex, e.g. using compare_exchange and fetch_add.

[...]

Before I reply to your entire post I should point out that:

http://groups.google.com/group/comp....9c7aff738f9102

the Boost mechanism is not 100% portable, but is elegant in practice. It
uses a similar technique that a certain distributed reference counting
algorithm I created claims:

http://groups.google.com/group/comp....16861ac568b2be

http://groups.google.com/group/comp....hreads&q=vzoom

Not 100% portable, but _highly_ portable indeed!

Jul 30 '08 #4

Anthony Williams

"Chris M. Thomasson" <no@spam.invalidwrites:

"Anthony Williams" <an*********@gmail.comwrote in message
news:u6***********@gmail.com...
>"Chris M. Thomasson" <no@spam.invalidwrites:
[...]
>The algorithm used by boost::call_once on pthreads platforms is
described here:

http://www.open-std.org/jtc1/sc22/wg...007/n2444.html

>>>It doesn't use a
lock unless it has to and is portable across threads and win32
threads.

The code I posted does not use a lock unless it absolutely has to
because it attempts to efficiently take advantage of the double
checked locking pattern.

Oh yes, I realise that: the code for call_once is similar. However, it
attempts to avoid contention on the mutex by using thread-local
storage. If you have atomic ops, you can go even further in
eliminating the mutex, e.g. using compare_exchange and fetch_add.
[...]

Before I reply to your entire post I should point out that:

http://groups.google.com/group/comp....9c7aff738f9102

the Boost mechanism is not 100% portable, but is elegant in
practice.

Yes. If you look at the whole thread, you'll see a comment by me there
where I admit as much.

It uses a similar technique that a certain distributed
reference counting algorithm I created claims:

I wasn't aware that you were using something similar in vZOOM.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Jul 30 '08 #5

Chris M. Thomasson

"Anthony Williams" <an*********@gmail.comwrote in message
news:uh***********@gmail.com...

"Chris M. Thomasson" <no@spam.invalidwrites:

>"Anthony Williams" <an*********@gmail.comwrote in message
news:u6***********@gmail.com...
>>"Chris M. Thomasson" <no@spam.invalidwrites:
[...]
>>The algorithm used by boost::call_once on pthreads platforms is
described here:

http://www.open-std.org/jtc1/sc22/wg...007/n2444.html

It doesn't use a
lock unless it has to and is portable across threads and win32
threads.

The code I posted does not use a lock unless it absolutely has to
because it attempts to efficiently take advantage of the double
checked locking pattern.

Oh yes, I realise that: the code for call_once is similar. However, it
attempts to avoid contention on the mutex by using thread-local
storage. If you have atomic ops, you can go even further in
eliminating the mutex, e.g. using compare_exchange and fetch_add.
[...]

Before I reply to your entire post I should point out that:

http://groups.google.com/group/comp....9c7aff738f9102

the Boost mechanism is not 100% portable, but is elegant in
practice.

Yes. If you look at the whole thread, you'll see a comment by me there
where I admit as much.

Does the following line:

__thread fast_pthread_once_t _fast_pthread_once_per_thread_epoch;

explicitly set `_fast_pthread_once_per_thread_epoch' to zero? If so, is it
guaranteed?

>It uses a similar technique that a certain distributed
reference counting algorithm I created claims:

I wasn't aware that you were using something similar in vZOOM.

Humm, now that I think about it, it seems like I am totally mistaken. The
"most portable" version of vZOOM relies on an assumption that pointer
load/stores are atomic and the unlocking of a mutex executes at least a
release-barrier, and the loading of a shared variable executes at least a
data-dependant load-barrier; very similar to RCU without the explicit
#LoadStore | #StoreStore before storing into a shared pointer location...
Something like:

__________________________________________________ __________________
struct foo {
int a;
};
static foo* shared_f = NULL;
// single producer thread {
foo* local_f = new foo;
pthread_mutex_t* lock = get_per_thread_mutex();
pthread_mutex_lock(lock);
local_f->a = 666;
pthread_mutex_unlock(lock);
shared_f = local_f;
}
// single consumer thread {
foo* local_f;
while (! (local_f = shared_f)) {
sched_yield();
}
assert(local_f->a == 666);
delete local_f;
}
__________________________________________________ __________________

If the `pthread_mutex_unlock()' function does not execute at least a
release-barrier in the producer, and if the load of the shared variable does
not execute at least a data-dependant load-barrier in the consumer, the
"most portable" version of vZOOM will NOT work on that platform in any way
shape or form, it will need a platform-dependant version. However, the only
platform I can think of where the intra-node memory visibility requirements
do not hold is the Alpha... For multi-node super-computers, inter-node
communication is adapted to using MPI.

Jul 30 '08 #6

Anthony Williams

"Chris M. Thomasson" <no@spam.invalidwrites:

"Anthony Williams" <an*********@gmail.comwrote in message
news:uh***********@gmail.com...
>"Chris M. Thomasson" <no@spam.invalidwrites:

>>"Anthony Williams" <an*********@gmail.comwrote in message
news:u6***********@gmail.com...
"Chris M. Thomasson" <no@spam.invalidwrites:
[...]
The algorithm used by boost::call_once on pthreads platforms is
described here:

http://www.open-std.org/jtc1/sc22/wg...007/n2444.html

>It doesn't use a
>lock unless it has to and is portable across threads and win32
>threads.
>
The code I posted does not use a lock unless it absolutely has to
because it attempts to efficiently take advantage of the double
checked locking pattern.

Oh yes, I realise that: the code for call_once is similar. However, it
attempts to avoid contention on the mutex by using thread-local
storage. If you have atomic ops, you can go even further in
eliminating the mutex, e.g. using compare_exchange and fetch_add.
[...]

Before I reply to your entire post I should point out that:

http://groups.google.com/group/comp....9c7aff738f9102

the Boost mechanism is not 100% portable, but is elegant in
practice.

Yes. If you look at the whole thread, you'll see a comment by me there
where I admit as much.

Does the following line:

__thread fast_pthread_once_t _fast_pthread_once_per_thread_epoch;

explicitly set `_fast_pthread_once_per_thread_epoch' to zero? If so,
is it guaranteed?

The algorithm assumes it does, but it depends which compiler you
user. In the Boost implementation, the value is explicitly
initialized (to ~0 --- I found it worked better with exception
handling to count backwards).

>

>>It uses a similar technique that a certain distributed
reference counting algorithm I created claims:

I wasn't aware that you were using something similar in vZOOM.

Humm, now that I think about it, it seems like I am totally
mistaken. The "most portable" version of vZOOM relies on an assumption
that pointer load/stores are atomic and the unlocking of a mutex
executes at least a release-barrier, and the loading of a shared
variable executes at least a data-dependant load-barrier; very similar
to RCU without the explicit #LoadStore | #StoreStore before storing
into a shared pointer location... Something like:

// single producer thread {
foo* local_f = new foo;
pthread_mutex_t* lock = get_per_thread_mutex();
pthread_mutex_lock(lock);
local_f->a = 666;
pthread_mutex_unlock(lock);
shared_f = local_f;

So you're using the lock just for the barrier properties. Interesting
idea.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Jul 30 '08 #7

Chris M. Thomasson

"Anthony Williams" <an*********@gmail.comwrote in message
news:ud***********@gmail.com...

"Chris M. Thomasson" <no@spam.invalidwrites:

[...]

>>>the Boost mechanism is not 100% portable, but is elegant in
practice.

Yes. If you look at the whole thread, you'll see a comment by me there
where I admit as much.

Does the following line:

__thread fast_pthread_once_t _fast_pthread_once_per_thread_epoch;

explicitly set `_fast_pthread_once_per_thread_epoch' to zero? If so,
is it guaranteed?

The algorithm assumes it does, but it depends which compiler you
user. In the Boost implementation, the value is explicitly
initialized (to ~0 --- I found it worked better with exception
handling to count backwards).

>>
>>>It uses a similar technique that a certain distributed
reference counting algorithm I created claims:

I wasn't aware that you were using something similar in vZOOM.

Humm, now that I think about it, it seems like I am totally
mistaken. The "most portable" version of vZOOM relies on an assumption
that pointer load/stores are atomic and the unlocking of a mutex
executes at least a release-barrier, and the loading of a shared
variable executes at least a data-dependant load-barrier; very similar
to RCU without the explicit #LoadStore | #StoreStore before storing
into a shared pointer location... Something like:

// single producer thread {
foo* local_f = new foo;
pthread_mutex_t* lock = get_per_thread_mutex();
pthread_mutex_lock(lock);
local_f->a = 666;
pthread_mutex_unlock(lock);
shared_f = local_f;

So you're using the lock just for the barrier properties. Interesting
idea.

Yes. Actually, I did not show the whole algorithm. The code above is busted
because I forgot to show it all; STUPID ME!!! Its busted because the store
to shared_f can legally be hoisted up above the unlock. Here is the whole
picture... Each thread has a special dedicated mutex which is locked from
its birth... Here is exactly how production of an object can occur:
static foo* volatile shared_f = NULL;

// single producer thread {
00: foo* local_f;
01: pthread_mutex_t* const mem_mutex = get_per_thread_mem_mutex();
02: local_f = new foo;
03: local_f->a = 666;
04: pthread_mutex_unlock(mem_mutex);
05: pthread_mutex_lock(mem_mutex);
06: shared_f = local_f;
}
Here are the production rules wrt POSIX:

1. Steps 02-03 CANNOT sink below step 04
2. Step 06 CANNOT rise above step 05
3. vZOOM assumes that step 04 has a release barrier

Those __two__guarantees__and__single__assumption__ ensure the ordering and
visibility of the operations is correct. After that, the consumer can do:
// single consumer thread {
00: foo* local_f;
01: while (! (local_f = shared_f)) {
02: sched_yield();
}
03: assert(local_f->a == 666);
04: delete local_f;
}
Consumption rules:

01: vZOOM assumes that the load from `shared_f' will have implied
data-dependant load-barrier.

BTW, here is a brief outline of how the "most portable" version of vZOOM
distributed reference counting works with the above idea:

http://groups.google.ru/group/comp.p...e9b6e427b4a144

http://groups.google.com/group/comp....24fe99f742ce6e
(an __execlelent__ question from Dmitriy...)

What do you think Anthony?

Jul 30 '08 #8

Chris M. Thomasson

[...]

>
BTW, here is a brief outline of how the "most portable" version of vZOOM
distributed reference counting works with the above idea:

http://groups.google.ru/group/comp.p...e9b6e427b4a144

Take note of the per-thread memory lock. Its vital to vZOOM.

>
http://groups.google.com/group/comp....24fe99f742ce6e
(an __execlelent__ question from Dmitriy...)

What do you think Anthony?

Jul 30 '08 #9

Similar topics

RELEASED Python 2.3.1

by: Anthony Baxter | last post by:

On behalf of the Python development team and the Python community, I'm happy to announce the release of Python 2.3.1 (final). Python 2.3.1 is a pure bug fix release of Python 2.3, released in...

Python

Using PyOpenGL what should I use for a GUI ?

by: Gilles Leblanc | last post by:

Hi I have started a small project with PyOpenGL. I am wondering what are the options for a GUI. So far I checked PyUI but it has some problems with 3d rendering outside the Windows platform. I...

Python

thread pooling and short lived threads

by: AlexeiOst | last post by:

Everywhere in documentation there are recommendations to use threads from thread pooling for relatively short tasks. As I understand, fetching a page or multiple pages (sometimes up to 50 but not...

.NET Framework

Thread problem (constructor+assignement in one step?)

by: yuraukar | last post by:

I am trying to create a garbage collection class in C++ to collect instances of a specific class (so no general gc). The approach is to use smart pointers and a mark and a simple sweep gc. ...

C / C++

Thread safety of readonly members

by: Michi Henning | last post by:

Hi, I can't find a statement about this in the threading sections in the doc... Consider: class Class1 { Class1() { _val = 42;

C# / C Sharp

Thread synchronization problem

by: Ivan | last post by:

Hi I have following problem: I'm creating two threads who are performing some tasks. When one thread finished I would like to restart her again (e.g. new job). Following example demonstrates...

C# / C Sharp

Thread just stops running

by: mareal | last post by:

I have noticed how the thread I created just stops running. I have added several exceptions to the thread System.Threading.SynchronizationLockException System.Threading.ThreadAbortException...

ASP.NET

Threads: does Thread.start() atomically set Thread.__started ?

by: Enigma Curry | last post by:

Can some kind person please further my education on Threads? When I create a thread called "t" and I do a "t.start()" am I guaranteed that "t.isAlive()" will return True as long as the thread...

Python

Setting thread priorities

by: John Nagle | last post by:

There's no way to set thread priorities within Python, is there? We have some threads that go compute-bound, and would like to reduce their priority slightly so the other operations, like...

Python

Thread Pool versus Dedicated Threads

by: =?GB2312?B?0rvK18qr?= | last post by:

Hi all, Recently I had a new coworker. There is some dispute between us. The last company he worked for has a special networking programming model. They split the business logic into...

C / C++

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General