Memory barrier note

William Stacey [MVP]

Here is an interesting writing on memory barriers. Not sure if this helps
my understanding or raises more questions, but interesting...

http://discuss.develop.com/archives/...=DOTNET&P=R375

--
William Stacey, MVP

Nov 16 '05 #1

Subscribe Post Reply

5825

Jon Skeet [C# MVP]

William Stacey [MVP] <st***********@mvps.org> wrote:

Here is an interesting writing on memory barriers. Not sure if this helps
my understanding or raises more questions, but interesting...

http://discuss.develop.com/archives/...=DOTNET&P=R375

Thanks for the link - one to add to the (currently empty) list of
resources at the end of my own threading article. It's *nearly*
finished now (ish!)

http://www.pobox.com/~skeet/csharp/multithreading.html

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #2

Jon Skeet [C# MVP]

William Stacey [MVP] <st***********@mvps.org> wrote:

Here is an interesting writing on memory barriers. Not sure if this helps
my understanding or raises more questions, but interesting...

http://discuss.develop.com/archives/...=DOTNET&P=R375

Hmm... I've now read it through, and while it mostly confirms what I've
understood before, I'm still not convinced I can see a problem in the
first singleton implementation, because the variable is volatile.

A volatile write (like a lock release) ensures that no memory write can
move after it. Thus the write to "val" shouldn't be able to move to
after the write to "singleton".

I've already bothered Vance on more than one occasion, so I'm reluctant
to do so again - anyone else have any insight into this?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #3

Jon Skeet [C# MVP]

Jon Skeet [C# MVP] <sk***@pobox.com> wrote:

Hmm... I've now read it through, and while it mostly confirms what I've
understood before, I'm still not convinced I can see a problem in the
first singleton implementation, because the variable is volatile.

A volatile write (like a lock release) ensures that no memory write can
move after it. Thus the write to "val" shouldn't be able to move to
after the write to "singleton".

I've already bothered Vance on more than one occasion, so I'm reluctant
to do so again - anyone else have any insight into this?

Aha - I've now read the rest of the thread, and indeed I was right (as
were others reading the article). Vance clears it up in another post:

<quote>
Arrgg!

The original posting has a most unfortunate typo in it. The first
example should have NO volatile variables in it. (This is what I get
for cutting and pasting too much). The example with the volatile
'singleton' variable was ANOTHER way of fixing the memory issue. I
toyed with explaining this fix, but decided against it becasue it is
not as good as the MemoryBarrier() fix. This is becasue it forces to
the JIT to do a read memory barrier many places it is not necessary
(like the critial path that does not take locks).

Thus while this solves the problem, I do not recommend it as a
solution.
</quote>

Hooray - I can sleep easy once more :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #4

William Stacey [MVP]

line 1) System.Threading.Thread.WriteMemoryBarrier();
line 2) singleton = newObj;

So the above insures 110%, without question that the "if (singleton ==
null)" test will never find singleton to be a partially completed assignment
and insures some kind of read barrier so that I read "singleton" ref
correctly?

Put another way, is it possible to have a thread switch before line 2 has
fully assigned newObj to singleton, which may cause thread1 to see singleton
is not null, but ref is not right either, so error?
--
William Stacey, MVP

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

William Stacey [MVP] <st***********@mvps.org> wrote:
Here is an interesting writing on memory barriers. Not sure if this helps my understanding or raises more questions, but interesting...

http://discuss.develop.com/archives/...=DOTNET&P=R375

Hmm... I've now read it through, and while it mostly confirms what I've
understood before, I'm still not convinced I can see a problem in the
first singleton implementation, because the variable is volatile.

A volatile write (like a lock release) ensures that no memory write can
move after it. Thus the write to "val" shouldn't be able to move to
after the write to "singleton".

I've already bothered Vance on more than one occasion, so I'm reluctant
to do so again - anyone else have any insight into this?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #5

Jon Skeet [C# MVP]

William Stacey [MVP] <st***********@mvps.org> wrote:

line 1) System.Threading.Thread.WriteMemoryBarrier();
line 2) singleton = newObj;

So the above insures 110%, without question that the "if (singleton ==
null)" test will never find singleton to be a partially completed assignment
and insures some kind of read barrier so that I read "singleton" ref
correctly?
Yes - but so would just

singleton = new Singleton();

when singleton is declared to be volatile.
Put another way, is it possible to have a thread switch before line 2 has
fully assigned newObj to singleton, which may cause thread1 to see singleton
is not null, but ref is not right either, so error?

No. Reference assignments are always atomic, I believe. If they weren't
that would be a serious security problem, IMO.

(I've only seen it specified when the memory is properly aligned, but
as I say, anything else would be a huge security problem.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #6

William Stacey [MVP]

If I read following link (IBM) right, it says Vances code is classic example
of a "fix" that is broke. He needs a read barrier to do it that way or just
use the darn lock and do it right for all cases on all machines (it is
really not that expensive) :) Cheers.
http://www.cs.umd.edu/~pugh/java/mem...edLocking.html

BTW. I would guess you have these, but here are some other links on this
topic for others.

Brad Abrams blog
http://blogs.msdn.com/brada/archive/...12/130935.aspx

Vance Morrison (CLR team)
http://discuss.develop.com/archives/...=DOTNET&P=R375

Exploring the Singleton Design Pattern
http://msdn.microsoft.com/library/de...tondespatt.asp

Scott Allen's blog
http://odetocode.com/Blogs/scott/arc...05/13/242.aspx

Chris Brumme
http://weblogs.asp.net/cbrumme/archi.../17/51445.aspx
http://blogs.gotdotnet.com/cbrumme/P...b-c69f01d7ff2b

The "Double-Checked Locking is Broken" Declaration
http://www.cs.umd.edu/~pugh/java/mem...edLocking.html

Various
http://www.google.com/groups?q=g:thl...ose.com&rnum=3

http://www.javaworld.com/javaworld/j...9-toolbox.html

==================================
Alexei Zakharov Writes:
# re: volatile and MemoryBarrier()...
I think the implementation without using volatile is missing one memory
barrier. According to
http://www.google.com/groups?q=g:thl...ose.com&rnum=3
memory barriers are required for both read and write code paths. The read
path extracted from the code is:

if ( Singleton.value == null ) // false
{// not executed }
return Singleton.value;

There is no memory barrier on this path. In the CLR memory model as
described in Chris Brumme's blog
(http://blogs.msdn.com/cbrumme/archiv...17/51445.aspx), only volatile
loads are considered "acquire", but normal loads can be reordered.

The correct implementation will be:

public sealed class Singleton {
private Singleton() {}
private static Singleton value;
private static object sync = new object();

public static Singleton Value {
get {
Singleton temp = Singleton.value;
System.Threading.Thread.MemoryBarrier(); // this is important

if ( temp == null ) {
lock ( sync ) {
if ( Singleton.value == null ) {
temp = new Singleton();
System.Threading.Thread.MemoryBarrier();
Singleton.value = temp;
}
}
}

return Singleton.value;
}
}
}

Let me expand on the performance of the two implementations of the double
checked locking pattern. Obviously we want to make the read path faster and
don't care about the write path because the write path is taken only once.
The read path extracted from the code is:

// using volatile (Singleton.value is volatile)
get {
if ( Singleton.value == null ) {
// ... not taken
}
return Singleton.value;
}

// using memory barriers
get {
Singleton temp = Singleton.value;
System.Threading.Thread.MemoryBarrier();
if ( temp == null ) {
// ... not taken
}
return Singleton.value;
}

The volatile load in the first code has the acquire semantics and is
equivalent to the non-volatile load plus the memory barrier in the second
code. There are two volatile loads in the first code and only one memory
barrier in the second. So I expect the code with memory barriers to perform
faster than the code that uses volatiles. But as any performance
speculations it has to be taken with a grain of salt. I haven't done any
measurements here.
==================================

--
William Stacey, MVP

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

William Stacey [MVP] <st***********@mvps.org> wrote:
line 1) System.Threading.Thread.WriteMemoryBarrier();
line 2) singleton = newObj;

So the above insures 110%, without question that the "if (singleton ==
null)" test will never find singleton to be a partially completed assignment and insures some kind of read barrier so that I read "singleton" ref
correctly?

Yes - but so would just

singleton = new Singleton();

when singleton is declared to be volatile.
Put another way, is it possible to have a thread switch before line 2 has fully assigned newObj to singleton, which may cause thread1 to see singleton is not null, but ref is not right either, so error?

No. Reference assignments are always atomic, I believe. If they weren't
that would be a serious security problem, IMO.

(I've only seen it specified when the memory is properly aligned, but
as I say, anything else would be a huge security problem.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #7

William Stacey [MVP]

oops. Forgot two of the best ones....
Andrew Birrell (http://birrell.org/andrew/papers/)
"An Introduction to Programming with C# Threads"
http://research.microsoft.com/~birre...eadsCSharp.pdf (At
Microsoft)
http://birrell.org/andrew/papers/035-Threads.pdf (First version at Compaq)

Andrew seems to suggest the only way to do this right is using a lock around
all the tests and assignments (which the correct memory barrier
implementation would do the same for you I guess, but need to think really
hard about it each time or each new twist.)

--
William Stacey, MVP

Nov 16 '05 #8

Jon Skeet [C# MVP]

William Stacey [MVP] <st***********@mvps.org> wrote:

If I read following link (IBM) right, it says Vances code is classic example
of a "fix" that is broke. He needs a read barrier to do it that way or just
use the darn lock and do it right for all cases on all machines (it is
really not that expensive) :) Cheers.
http://www.cs.umd.edu/~pugh/java/mem...edLocking.html
Do you mean the code with the explicit write barrier but no read
barrier? I see what you're saying, but I'm not sure either way. The
thing is, the reading thread can't have read (and therefore cached) any
information about the object before it gets the reference - so I
*think* it's okay so long as all the *writes* are performed in a way
that ensures that all the information is available as soon as the
reference itself becomes available.

I certainly favour the "use a lock" approach when using static
initialisers doesn't quite have the desired semantics (for whatever
reason).
BTW. I would guess you have these, but here are some other links on this
topic for others.

I certainly don't have *all* of them - I'll include various ones in the
article.

<snip>

Cheers!

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #9

Jon Skeet [C# MVP]

William Stacey [MVP] <st***********@mvps.org> wrote:

oops. Forgot two of the best ones....
Andrew Birrell (http://birrell.org/andrew/papers/)
"An Introduction to Programming with C# Threads"
http://research.microsoft.com/~birre...eadsCSharp.pdf (At
Microsoft)
http://birrell.org/andrew/papers/035-Threads.pdf (First version at Compaq)

Andrew seems to suggest the only way to do this right is using a lock around
all the tests and assignments (which the correct memory barrier
implementation would do the same for you I guess, but need to think really
hard about it each time or each new twist.)

I don't think he's right about that though. The .NET memory model
provides a memory barrier on a volatile read/write, whereas the Java
model doesn't. That's why the "various fixes" tried with Java don't
work - but I believe making the variable volatile in .NET *does*.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #10

Scott Allen

One more link to add to the list:

Raymond Chen: High-performance multithreading is very hard
http://blogs.msdn.com/oldnewthing/ar...28/143769.aspx

--
Scott

On Thu, 3 Jun 2004 18:34:01 -0400, "William Stacey [MVP]"
<st***********@mvps.org> wrote:

If I read following link (IBM) right, it says Vances code is classic example
of a "fix" that is broke. He needs a read barrier to do it that way or just
use the darn lock and do it right for all cases on all machines (it is
really not that expensive) :) Cheers.
http://www.cs.umd.edu/~pugh/java/mem...edLocking.html

BTW. I would guess you have these, but here are some other links on this
topic for others.

Brad Abrams blog
http://blogs.msdn.com/brada/archive/...12/130935.aspx

Vance Morrison (CLR team)
http://discuss.develop.com/archives/...=DOTNET&P=R375

Exploring the Singleton Design Pattern
http://msdn.microsoft.com/library/de...tondespatt.asp

Scott Allen's blog
http://odetocode.com/Blogs/scott/arc...05/13/242.aspx

Chris Brumme
http://weblogs.asp.net/cbrumme/archi.../17/51445.aspx
http://blogs.gotdotnet.com/cbrumme/P...b-c69f01d7ff2b

The "Double-Checked Locking is Broken" Declaration
http://www.cs.umd.edu/~pugh/java/mem...edLocking.html

Various
http://www.google.com/groups?q=g:thl...ose.com&rnum=3

http://www.javaworld.com/javaworld/j...9-toolbox.html

--
Scott
http://www.OdeToCode.com

Nov 16 '05 #11

William Stacey [MVP]

Thanks. And another that was interesting. He also concludes using the lock
before the test is way to go.
http://www.nwcpp.org/Downloads/2004/DCLP_notes.pdf

"Back Where We Started"...

public class Keyboard
{
private static pInstance = null;
internal Keyboard()
{
}
public static GetInstance()
{
lock(syncRoot) // read and write barrier
{
if (pInstance == null)
pInstance = new Keyboard();
}
return pInstance;
}
}

Note: Access to shared data is now inside a critical section.
Conclusion: There is no portable way to implement DCLP in C++. This may
apply to c#, not sure.

My Comments:
On my slow box, 10million locks and releases takes 375ms. So is *not
locking really worth the fuss or the danger in not doing something right
using some other fancy method? I can't see it.

This begs a question. What about collections and syncing on an object (like
syncRoot.) You protect concurrent access to internal array (for example, or
queue), but you don't have all the shared data inside a critical section
that I see.
1) lock(syncRoot)
2) array.Add(new object())
3) unlock
4) lock(syncRoot)
5) return array[0];
6) unlock

We don't lock the array, and we don't lock element 0 at all. element 0 has
not been protected by any barrier that I can see, so is this subject to
issues also?? Cheers!

--
William Stacey, MVP

"Scott Allen" <bitmask@[nospam].fred.net> wrote in message
news:f0********************************@4ax.com...

One more link to add to the list:

Raymond Chen: High-performance multithreading is very hard
http://blogs.msdn.com/oldnewthing/ar...28/143769.aspx

--
Scott

On Thu, 3 Jun 2004 18:34:01 -0400, "William Stacey [MVP]"
<st***********@mvps.org> wrote:
If I read following link (IBM) right, it says Vances code is classic exampleof a "fix" that is broke. He needs a read barrier to do it that way or justuse the darn lock and do it right for all cases on all machines (it is
really not that expensive) :) Cheers.
http://www.cs.umd.edu/~pugh/java/mem...edLocking.html

BTW. I would guess you have these, but here are some other links on this
topic for others.

Brad Abrams blog
http://blogs.msdn.com/brada/archive/...12/130935.aspx

Vance Morrison (CLR team)
http://discuss.develop.com/archives/...=DOTNET&P=R375

Exploring the Singleton Design Pattern

http://msdn.microsoft.com/library/de...-us/dnbda/html

/singletondespatt.asp

Scott Allen's blog
http://odetocode.com/Blogs/scott/arc...05/13/242.aspx

Chris Brumme
http://weblogs.asp.net/cbrumme/archi.../17/51445.aspx

http://blogs.gotdotnet.com/cbrumme/P...a8-4694-96db-c

69f01d7ff2b

The "Double-Checked Locking is Broken" Declaration
http://www.cs.umd.edu/~pugh/java/mem...edLocking.html

Various

http://www.google.com/groups?q=g:thl...&ie=UTF-8&selm

=1998May28.082712%40bose.com&rnum=3

http://www.javaworld.com/javaworld/j...9-toolbox.html

--
Scott
http://www.OdeToCode.com

Nov 16 '05 #12

Jon Skeet [C# MVP]

William Stacey [MVP] <st***********@mvps.org> wrote:

Thanks. And another that was interesting. He also concludes using the lock
before the test is way to go.
http://www.nwcpp.org/Downloads/2004/DCLP_notes.pdf
<snip>

Interesting. I'll have to read through it carefully to work out whether
or not to include it in my list - it's not .NET-specific, so could be
misleading - but may well have good general points to make.
Note: Access to shared data is now inside a critical section.
Conclusion: There is no portable way to implement DCLP in C++. This may
apply to c#, not sure.
I'm not at all surprised that there's no portable way to do it in
unmanaged C++ - basically you've got a potentially different memory
model on every platform.
My Comments:
On my slow box, 10million locks and releases takes 375ms. So is *not
locking really worth the fuss or the danger in not doing something right
using some other fancy method? I can't see it.
How many processors does your box have though? Memory barriers and
locks are *vastly* more expensive on multi-processor machines, I
believe. I still advocate simplicity unless you're absolutely *sure*
about the performance cost, but I'm just pointing out that there may
well be more of a performance cost than the figures you quote imply.
This begs a question. What about collections and syncing on an object (like
syncRoot.) You protect concurrent access to internal array (for example, or
queue), but you don't have all the shared data inside a critical section
that I see.
1) lock(syncRoot)
2) array.Add(new object())
3) unlock
4) lock(syncRoot)
5) return array[0];
6) unlock

We don't lock the array, and we don't lock element 0 at all. element 0 has
not been protected by any barrier that I can see, so is this subject to
issues also?? Cheers!

Element 0 has been protected by exactly the same memory barrier as
everything else. There aren't different kinds of memory barrier - it's
not like a memory barrier only applies to the object you lock on. It
applies to *everything*.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #13

William Stacey [MVP]

> Element 0 has been protected by exactly the same memory barrier as

everything else. There aren't different kinds of memory barrier - it's
not like a memory barrier only applies to the object you lock on. It
applies to *everything*.

You 100% on that? I thought I read that it had to do with individual memory
locations, not sure. I assume you have a paper on this question, and would
really appreciate the link. Cheers!

--
William Stacey, MVP

Nov 16 '05 #14

Jon Skeet [C# MVP]

William Stacey [MVP] <st***********@mvps.org> wrote:

Element 0 has been protected by exactly the same memory barrier as
everything else. There aren't different kinds of memory barrier - it's
not like a memory barrier only applies to the object you lock on. It
applies to *everything*.

You 100% on that? I thought I read that it had to do with individual memory
locations, not sure. I assume you have a paper on this question, and would
really appreciate the link. Cheers!

No paper - just the spec.

<quote>
A volatile read has acquire semanticsmeaning that the read is
guaranteed to occur prior to any references to memory that occur after
the read instruction in the CIL instruction sequence. A volatile write
has release semantics meaning that the write is guaranteed to happen
after any memory references prior to the write instruction in the CIL
instruction sequence.
</quote>

No mention of which bit of memory is affected there.

Also, if you think about it - if your concern was correct, the
singleton pattern of

lock (foo)
{
if (instance==null)
{
instance = new Singleton();
}
}
return instance;

wouldn't work...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #15

Scott Allen

I wouldn't say that the memory barrier protects anything, per se. It's
purpose is to place limitations on memory re-ordering.

Tell me what you think of this example:

class A
{
int x = 1;
int y = 2;
int z = 3;

public void Foo()
{
x = 0;
y = z;
}
}

The sequence for Foo could be:

Write 0 to x
Read z (3)
Assign 3 to y

But the compiler or processor may decide it is better to do:

Read z (3)
Write 0 to x
Write 3 to y

Now if we wrote Foo as:

public void Foo()
{
x = 0;
System.Threading.Thread.MemoryBarrier();
y = z;
}

Now we are guaranteeing the write to x occurs (and is visible) before
reading z.

Of course in this example nothing exciting happens if the re-ordering
occurs, but I could imagine another contrived example where we add
another method to class A which writes to z, then start two threads to
work on the same instance. In this case the memory barrier could
change the outcome of the value stored to y.

--
Scott

On Fri, 4 Jun 2004 19:11:22 -0400, "William Stacey [MVP]"
<st***********@mvps.org> wrote:

Element 0 has been protected by exactly the same memory barrier as
everything else. There aren't different kinds of memory barrier - it's
not like a memory barrier only applies to the object you lock on. It
applies to *everything*.

You 100% on that? I thought I read that it had to do with individual memory
locations, not sure. I assume you have a paper on this question, and would
really appreciate the link. Cheers!

--
Scott
http://www.OdeToCode.com

Nov 16 '05 #16

William Stacey [MVP]

"A volatile" read. Are they talking about vars that have volatile key word?

Also, if you think about it - if your concern was correct, the
singleton pattern of

lock (foo)
{
if (instance==null)
{
instance = new Singleton();
}
}
return instance;

I ~think that is because anything "inside" the lock has barrier. Other
things can be "optimized " into this barrier that can cause other unexpected
behavior. I will have to find the paper that talked about this.

--
William Stacey, MVP

Nov 16 '05 #17

Jon Skeet [C# MVP]

William Stacey [MVP] <st***********@mvps.org> wrote:

"A volatile" read. Are they talking about vars that have volatile key word?

Yes. Reading a volatile variable involves performing a volitile read
(and likewise writing in the obvious way).

Also, if you think about it - if your concern was correct, the
singleton pattern of

lock (foo)
{
if (instance==null)
{
instance = new Singleton();
}
}
return instance;

I ~think that is because anything "inside" the lock has barrier. Other
things can be "optimized " into this barrier that can cause other unexpected
behavior. I will have to find the paper that talked about this.

No reads within the lock can be performed before the lock is acquired.
No writes within the lock can be performed after the lock is released.

I believe, however, that writes *after* the lock can be performed
before the lock is released. Not sure on that one - but if everything
is synchronized properly, that shouldn't matter.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #18

William Stacey [MVP]

Not sure all this ordering stuff is the key (however is also important.)
The memory barrier (AFAICT), flushes the cache on all processors. This
makes sure that when you set a var on one processor, another processor will
read that same memory location and not a cached version of that location
that could be pointing to something else or null. The lock insures the
ordering and sync, the barrier insures the cache does not become an issue.
Please correct if in error. Cheers!

--
William Stacey, MVP

"Scott Allen" <bitmask@[nospam].fred.net> wrote in message
news:e1********************************@4ax.com...

I wouldn't say that the memory barrier protects anything, per se. It's
purpose is to place limitations on memory re-ordering.

Tell me what you think of this example:

class A
{
int x = 1;
int y = 2;
int z = 3;

public void Foo()
{
x = 0;
y = z;
}
}

The sequence for Foo could be:

Write 0 to x
Read z (3)
Assign 3 to y

But the compiler or processor may decide it is better to do:

Read z (3)
Write 0 to x
Write 3 to y

Now if we wrote Foo as:

public void Foo()
{
x = 0;
System.Threading.Thread.MemoryBarrier();
y = z;
}

Now we are guaranteeing the write to x occurs (and is visible) before
reading z.

Of course in this example nothing exciting happens if the re-ordering
occurs, but I could imagine another contrived example where we add
another method to class A which writes to z, then start two threads to
work on the same instance. In this case the memory barrier could
change the outcome of the value stored to y.

--
Scott

On Fri, 4 Jun 2004 19:11:22 -0400, "William Stacey [MVP]"
<st***********@mvps.org> wrote:
Element 0 has been protected by exactly the same memory barrier as
everything else. There aren't different kinds of memory barrier - it's
not like a memory barrier only applies to the object you lock on. It
applies to *everything*.

You 100% on that? I thought I read that it had to do with individual memorylocations, not sure. I assume you have a paper on this question, and wouldreally appreciate the link. Cheers!

--
Scott
http://www.OdeToCode.com

Nov 16 '05 #19

Jon Skeet [C# MVP]

William Stacey [MVP] <st***********@mvps.org> wrote:

Not sure all this ordering stuff is the key (however is also important.)
Ordering is *precisely* the point of memory barriers.
The memory barrier (AFAICT), flushes the cache on all processors. This
makes sure that when you set a var on one processor, another processor will
read that same memory location and not a cached version of that location
that could be pointing to something else or null.
Although that may be the physical affect, the effect in terms of the
memory model is specified by ordering. It's not just how the compiler
might reorder things - it's the order in which the operations *appear*
to have taken place due to caching.
The lock insures the
ordering and sync, the barrier insures the cache does not become an issue.
Please correct if in error. Cheers!

Well, unfortunately it's not very clearly defined. The CLR has a very
definite idea of two different types of barrier - a volatile read which
has acquire semantics, and a volatile write which has release
semantics. Each only affects things in one direction, however. The
memory barrier you've described is a sort of bidirectional barrier, so
that memory accesses can't move either side of it. It would help if the
docs for MemoryBarrier defined it in CLR terms...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #20

William Stacey [MVP]

I think the cpu instruction sent is the same to flush the cache (not one for
read and one write.) If both, please advise or provide link for detail.
Cheers.

--
William Stacey, MVP

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

William Stacey [MVP] <st***********@mvps.org> wrote:
Not sure all this ordering stuff is the key (however is also important.)

Ordering is *precisely* the point of memory barriers.
The memory barrier (AFAICT), flushes the cache on all processors. This
makes sure that when you set a var on one processor, another processor will read that same memory location and not a cached version of that location
that could be pointing to something else or null.

Although that may be the physical affect, the effect in terms of the
memory model is specified by ordering. It's not just how the compiler
might reorder things - it's the order in which the operations *appear*
to have taken place due to caching.
The lock insures the
ordering and sync, the barrier insures the cache does not become an issue. Please correct if in error. Cheers!

Well, unfortunately it's not very clearly defined. The CLR has a very
definite idea of two different types of barrier - a volatile read which
has acquire semantics, and a volatile write which has release
semantics. Each only affects things in one direction, however. The
memory barrier you've described is a sort of bidirectional barrier, so
that memory accesses can't move either side of it. It would help if the
docs for MemoryBarrier defined it in CLR terms...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #21

Jon Skeet [C# MVP]

William Stacey [MVP] <st***********@mvps.org> wrote:

I think the cpu instruction sent is the same to flush the cache (not one for
read and one write.) If both, please advise or provide link for detail.

It may well be a single CPU instruction for x86, which has a fairly
strong memory model. That's no guarantee about what will happen
elsewhere though.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #22

William Stacey [MVP]

I posted this to badbrams block and chrisbrumme blog. Post here to get more
eyes.

Does this spin version work? Why or why not? Cheers!

public sealed class Singleton
{
private static int spinLock = 0; // lock not owned.
private static Singleton value = null;
private Singleton() {}

public static Singleton Value()
{
// Get spin lock.
while ( Interlocked.Exchange(ref spinLock, 1) != 0 )
Thread.Sleep(0);

// Do we have any mbarrier issues?
if ( value == null )
value = new Singleton();

Interlocked.Exchange(ref spinLock, 0);
return value;
}
}

This would help answer a few related questions for me on how Interlocked
works with mem barriers and cache, etc. TIA -- William

--
William Stacey, MVP

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

William Stacey [MVP] <st***********@mvps.org> wrote:
I think the cpu instruction sent is the same to flush the cache (not one for read and one write.) If both, please advise or provide link for detail.

It may well be a single CPU instruction for x86, which has a fairly
strong memory model. That's no guarantee about what will happen
elsewhere though.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #23

Scott Allen

Hi William:

As Jon points out, it really is all about ordering. The lock can only
ensure a consistent view of the memory if everyone follows the
protocol: acquire lock, work with shared memory, release lock.

The problem with double check locking is that only one thread will
ever follow the protocol, everyone else cheats and tries to look at
shared memory without acquiring the same lock. Because of this we have
to strictly control the ordering of the memory operations inside of
the lock. Other threads will be peaking at our work while we still
have a work in progress.

We can use a memory barrier to force a strong order - all memory
writes will be seen by an external observer to happen in the same
order as we programmed them. That's really what it's all about.

--
Scott

On Wed, 9 Jun 2004 20:18:27 -0400, "William Stacey [MVP]"
<st***********@mvps.org> wrote:

Not sure all this ordering stuff is the key (however is also important.)
The memory barrier (AFAICT), flushes the cache on all processors. This
makes sure that when you set a var on one processor, another processor will
read that same memory location and not a cached version of that location
that could be pointing to something else or null. The lock insures the
ordering and sync, the barrier insures the cache does not become an issue.
Please correct if in error. Cheers!

Nov 16 '05 #24

Jon Skeet [C# MVP]

William Stacey [MVP] <st***********@mvps.org> wrote:

I posted this to badbrams block and chrisbrumme blog. Post here to get more
eyes.

Does this spin version work? Why or why not? Cheers!

public sealed class Singleton
{
private static int spinLock = 0; // lock not owned.
private static Singleton value = null;
private Singleton() {}

public static Singleton Value()
{
// Get spin lock.
while ( Interlocked.Exchange(ref spinLock, 1) != 0 )
Thread.Sleep(0);

// Do we have any mbarrier issues?
if ( value == null )
value = new Singleton();

Interlocked.Exchange(ref spinLock, 0);
return value;
}
}

This would help answer a few related questions for me on how Interlocked
works with mem barriers and cache, etc. TIA -- William

I *suspect* it will work if Interlocked.Exchange performs a full
bidirectional memory barrier (which it sounds like it does).

I suspect it forms no better than using a lock every time, but I guess
that wasn't what you were interested in :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #25

Scott Allen

On P4s and above there is:

MFENCE (full memory barrier)
LFENCE (read (load) barrier)
SFENCE (write (store) barrier)

Note that none of these instructions "flush the cache" of the
processor they execute on or any other processor. They strongly order
instructions. It's up to the cache coherency protocols in Intel
systems to ensure consistency, barriers and locks don't mean "cache
flush".
See:
http://developer.intel.com/design/pe...als/253666.htm
and
http://developer.intel.com/design/pe...als/253667.htm

--
Scott
On Thu, 10 Jun 2004 15:40:15 +0100, Jon Skeet [C# MVP]
<sk***@pobox.com> wrote:

William Stacey [MVP] <st***********@mvps.org> wrote:
I think the cpu instruction sent is the same to flush the cache (not one for
read and one write.) If both, please advise or provide link for detail.

It may well be a single CPU instruction for x86, which has a fairly
strong memory model. That's no guarantee about what will happen
elsewhere though.

Nov 16 '05 #26

Scott Allen

On IA-32 architectures I'm pretty sure this would be using cmpxchg8b
with a lock prefix for MP machines. This instruction provides an
atomic read/compare/store operation and acts as a full memory
barrier. A lock(syncRoot) would boil down to the same instruction.

--
Scott
On Thu, 10 Jun 2004 17:19:26 +0100, Jon Skeet [C# MVP]
<sk***@pobox.com> wrote:

William Stacey [MVP] <st***********@mvps.org> wrote:
I posted this to badbrams block and chrisbrumme blog. Post here to get more
eyes.

Does this spin version work? Why or why not? Cheers!

public sealed class Singleton
{
private static int spinLock = 0; // lock not owned.
private static Singleton value = null;
private Singleton() {}

public static Singleton Value()
{
// Get spin lock.
while ( Interlocked.Exchange(ref spinLock, 1) != 0 )
Thread.Sleep(0);

// Do we have any mbarrier issues?
if ( value == null )
value = new Singleton();

Interlocked.Exchange(ref spinLock, 0);
return value;
}
}

This would help answer a few related questions for me on how Interlocked
works with mem barriers and cache, etc. TIA -- William

I *suspect* it will work if Interlocked.Exchange performs a full
bidirectional memory barrier (which it sounds like it does).

I suspect it forms no better than using a lock every time, but I guess
that wasn't what you were interested in :)

--
Scott
http://www.OdeToCode.com

Nov 16 '05 #27

William Stacey [MVP]

Thanks Scott. That helps.

--
William Stacey, MVP

"Scott Allen" <bitmask@[nospam].fred.net> wrote in message
news:m6********************************@4ax.com...

Hi William:

As Jon points out, it really is all about ordering. The lock can only
ensure a consistent view of the memory if everyone follows the
protocol: acquire lock, work with shared memory, release lock.

The problem with double check locking is that only one thread will
ever follow the protocol, everyone else cheats and tries to look at
shared memory without acquiring the same lock. Because of this we have
to strictly control the ordering of the memory operations inside of
the lock. Other threads will be peaking at our work while we still
have a work in progress.

We can use a memory barrier to force a strong order - all memory
writes will be seen by an external observer to happen in the same
order as we programmed them. That's really what it's all about.

--
Scott

On Wed, 9 Jun 2004 20:18:27 -0400, "William Stacey [MVP]"
<st***********@mvps.org> wrote:
Not sure all this ordering stuff is the key (however is also important.)
The memory barrier (AFAICT), flushes the cache on all processors. This
makes sure that when you set a var on one processor, another processor willread that same memory location and not a cached version of that location
that could be pointing to something else or null. The lock insures the
ordering and sync, the barrier insures the cache does not become an issue.Please correct if in error. Cheers!

Nov 16 '05 #28

William Stacey [MVP]

I *suspect* it will work if Interlocked.Exchange performs a full
bidirectional memory barrier (which it sounds like it does).
Thanks Jon. That is what I hoped was going on. Otherwise I would be more
confused.
I suspect it forms no better than using a lock every time, but I guess
that wasn't what you were interested in :)

Other then the fact that this is non-blocking after the first creation of
the singleton and comparechange is a faster then taking out a lock before
every test. Not that I would normally do this, but helps in understanding
some different threading problems. Cheers!

--
William Stacey, MVP

Nov 16 '05 #29

William Stacey [MVP]

Thanks Scott. Glad I posted this. You have any paper you write on this?

--
William Stacey, MVP

"Scott Allen" <bitmask@[nospam].fred.net> wrote in message
news:5g********************************@4ax.com...

On IA-32 architectures I'm pretty sure this would be using cmpxchg8b
with a lock prefix for MP machines. This instruction provides an
atomic read/compare/store operation and acts as a full memory
barrier. A lock(syncRoot) would boil down to the same instruction.

--
Scott
On Thu, 10 Jun 2004 17:19:26 +0100, Jon Skeet [C# MVP]
<sk***@pobox.com> wrote:
William Stacey [MVP] <st***********@mvps.org> wrote:
I posted this to badbrams block and chrisbrumme blog. Post here to get more eyes.

Does this spin version work? Why or why not? Cheers!

public sealed class Singleton
{
private static int spinLock = 0; // lock not owned.
private static Singleton value = null;
private Singleton() {}

public static Singleton Value()
{
// Get spin lock.
while ( Interlocked.Exchange(ref spinLock, 1) != 0 )
Thread.Sleep(0);

// Do we have any mbarrier issues?
if ( value == null )
value = new Singleton();

Interlocked.Exchange(ref spinLock, 0);
return value;
}
}

This would help answer a few related questions for me on how Interlocked works with mem barriers and cache, etc. TIA -- William

I *suspect* it will work if Interlocked.Exchange performs a full
bidirectional memory barrier (which it sounds like it does).

I suspect it forms no better than using a lock every time, but I guess
that wasn't what you were interested in :)

--
Scott
http://www.OdeToCode.com

Nov 16 '05 #30

William Stacey [MVP]

Also. So I take it (assuming my singleton example.) That I would also not
have any issue with instance vars inside the singleton that where created
during construction? Say a ref var that was another object. This
interlocked "fence" should protect everything between the fence start and
fence end (assuming no other lazy init is going on inside the first class)?

--
William Stacey, MVP

"Scott Allen" <bitmask@[nospam].fred.net> wrote in message
news:5g********************************@4ax.com...

On IA-32 architectures I'm pretty sure this would be using cmpxchg8b
with a lock prefix for MP machines. This instruction provides an
atomic read/compare/store operation and acts as a full memory
barrier. A lock(syncRoot) would boil down to the same instruction.

--
Scott
On Thu, 10 Jun 2004 17:19:26 +0100, Jon Skeet [C# MVP]
<sk***@pobox.com> wrote:
William Stacey [MVP] <st***********@mvps.org> wrote:
I posted this to badbrams block and chrisbrumme blog. Post here to get more eyes.

Does this spin version work? Why or why not? Cheers!

public sealed class Singleton
{
private static int spinLock = 0; // lock not owned.
private static Singleton value = null;
private Singleton() {}

public static Singleton Value()
{
// Get spin lock.
while ( Interlocked.Exchange(ref spinLock, 1) != 0 )
Thread.Sleep(0);

// Do we have any mbarrier issues?
if ( value == null )
value = new Singleton();

Interlocked.Exchange(ref spinLock, 0);
return value;
}
}

This would help answer a few related questions for me on how Interlocked works with mem barriers and cache, etc. TIA -- William

I *suspect* it will work if Interlocked.Exchange performs a full
bidirectional memory barrier (which it sounds like it does).

I suspect it forms no better than using a lock every time, but I guess
that wasn't what you were interested in :)

--
Scott
http://www.OdeToCode.com

Nov 16 '05 #31

Scott Allen

No, I'm afraid not, but I'm sure you can find some if you dig around.
There has to be someone left still slinging code in assembly - I gave
it up about 7 years ago :)

One reason I remember the cmpxchg8 instruction so well is because it
was the instruction involved in the dreaded Pentium F00F bug - you
could lock up the CPU from user mode code:

http://www.google.com/search?hl=en&l...=cmpxchg8b+bug

--
Scott

On Thu, 10 Jun 2004 16:41:17 -0400, "William Stacey [MVP]"
<st***********@mvps.org> wrote:

Thanks Scott. Glad I posted this. You have any paper you write on this?

--
Scott
http://www.OdeToCode.com

Nov 16 '05 #32

Similar topics