Threading in new C++ standard

On Apr 25, 1:43 pm, Pete Becker <p...@versatilecoding.comwrote:

On 2008-04-25 04:33:22 -0400, Szabolcs Ferenczi
<szabolcs.feren...@gmail.comsaid:

...
Look at the section entitled "Mult-threaded executions and
data races".

Thanks for the hint. I had a look at it. Alltogether, that
section considers how the compiler should react to an
incorrect concurrent program. By incorrect concurrent
program I mean again as above.

Not just that. The terms that it defines are used in other
places to specify the meaning of a valid C++ program.

I think that's the key. Szabolcs keeps harping about an
"incorrect concurrent program", but without something in the
language itself, we have no means of determining whether a
program is correct or not.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 27 '08 #52

=?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=

On 2008-04-25 10:33, Szabolcs Ferenczi wrote:

On Apr 24, 7:05 pm, Pete Becker <p...@versatilecoding.comwrote:
>On 2008-04-24 12:37:40 -0400, Szabolcs Ferenczi
<szabolcs.feren...@gmail.comsaid:

>...
I remember having seen a number of arguments that day going that the
library solution is not adequate (with reference to the paper of
Boehm) and that C++0x goes for a language-based approach instead.

That's not correct, as several people have told you.

Several people keep saying that there are language elements for multi-
threading in C++0x but nobody could enumerate any single languge
element so far.

They have, you just have to realise that a language element does not
have to be a keyword.

Several people did not reply to my request to point out what are the
language elements for concurrency in the proposed C++0x standard. When
I make a list for them, asking (1) what are the language elements for
starting a thread of compulation, (2) by which language elements
mutual exclusion is specified, (3) what are the language elements for
making the threads synchronised---the answer is silence or ignorance
from several people. (Ok, one of them has admitted that these are at
the library level rather than at the language level.)

"This International Standard specifies requirements for implementations
of the C + + programming language." These are the very first words in
the C++ standard document, since the very same document specifies the
C++ standard library I have to conclude that the library is a *part* of
the language. There are a number of things in the standard library that
can not be written in pure C++, these are things that the committee
decided would be better of in the library instead of using new keywords,
that does not make them any less a part of the language.

I understand that you are not satisfied with the syntax/semantics the
committee have decided on for solving the concurrency issues, but that
does not mean that the solution is not a language solution.

--
Erik WikstrÃ¶m

Jun 27 '08 #53

On Apr 25, 7:23*pm, Erik Wikström <Erik-wikst...@telia.comwrote:

I understand that you are not satisfied with the syntax/semantics the
committee have decided on for solving the concurrency issues, but that
does not mean that the solution is not a language solution.

Good that you mention the committee. That must be the key. So it is a
new brave multi-threaded C++ designed by a committee. Just like a
horse, which is designed by a committee. Although, to the external
observer it looks like a camel, it must be a horse because that was on
the agenda of the committee.

That explans a lot. Thanks.

Best Regards,
Szabolcs

Jun 27 '08 #54

On Apr 25, 11:57*am, James Kanze <james.ka...@gmail.comwrote:

...
No. *Several people have pointed out several times where the
language addresses concurrency issues.

Oh yes, you are one of them who continously keep saying that C++0x
adresses concurrency issues at the language level but who fail to put
here just a single language element for it.

Just like tha Bandar-log in The Jungle Book. "We are great. We are
free. We are wonderful. ... We all say so, and so it must be true."

The Bandar-log never complete anything just like the several people
here who keep talking about the new brave language level concurrency
elements in C++0x but nobody can point out any humble element.

Nobody is able to show a single code example here:
http://groups.google.com/group/comp....07b37a3b0323f3
Several times I have asked for it but nobody is able to do any exact
work except keep talking like Bandar-log.

Just keep saying the Bandar-log song: Yes, we have it, we have it at
the language level and it is true because we say so.

Bravo.

Best Regards,
Szabolcs

Jun 27 '08 #55

"Szabolcs Ferenczi" <sz***************@gmail.comwrote in message
news:d7**********************************@24g2000h sh.googlegroups.com...
On Apr 25, 7:23 pm, Erik Wikström <Erik-wikst...@telia.comwrote:

I understand that you are not satisfied with the syntax/semantics the
committee have decided on for solving the concurrency issues, but that
does not mean that the solution is not a language solution.

Good that you mention the committee. That must be the key. So it is a
new brave multi-threaded C++ designed by a committee. Just like a
horse, which is designed by a committee. Although, to the external
observer it looks like a camel, it must be a horse because that was on
the agenda of the committee.

That explans a lot. Thanks.

Do you have ANY idea who is on that committee? Did you know that Paul
McKenney is working with them?

http://www.rdrop.com/users/paulmck

He, and some others are pushing for very relaxed memory barriers. Thanks to
them, we will be able to do a standard user-space RCU implementation in C++.
Pretty darn cool if you ask me!

:^)

Jun 27 '08 #56

"Szabolcs Ferenczi" <sz***************@gmail.comwrote in message
news:ff**********************************@27g2000h sf.googlegroups.com...
On Apr 25, 11:57 am, James Kanze <james.ka...@gmail.comwrote:

...
No. Several people have pointed out several times where the
language addresses concurrency issues.

Oh yes, you are one of them who continously keep saying that C++0x
adresses concurrency issues at the language level but who fail to put
here just a single language element for it.

[...]

There is a VERY CLOSE relationship between the language and the library. C++
does not need any new keyword to define low-level high-performance threading
semantics.

Jun 27 '08 #57

Bo Persson

Szabolcs Ferenczi wrote:

On Apr 25, 11:57 am, James Kanze <james.ka...@gmail.comwrote:
>...
No. Several people have pointed out several times where the
language addresses concurrency issues.

Oh yes, you are one of them who continously keep saying that C++0x
adresses concurrency issues at the language level but who fail to
put here just a single language element for it.

That's the beauty of C++. :-)

Seriously, the library is part of the language specification. An
implementation has to supply both a compiler (or an interpreter) and a
complete library implementation. It also has to assure that it works
according to spec.

Is it a flaw that the standard doesn't explain exactly how this is to
be done?
Bo Persson

Jun 27 '08 #58

On Apr 26, 9:48*am, "Bo Persson" <b...@gmb.dkwrote:

Szabolcs Ferenczi wrote:
On Apr 25, 11:57 am, James Kanze <james.ka...@gmail.comwrote:
...
No. Several people have pointed out several times where the
language addresses concurrency issues.

Oh yes, you are one of them who continously keep saying that C++0x
adresses concurrency issues at the language level but who fail to
put here just a single language element for it.

That's the beauty of C++. *:-)

Really? I only hope not only that's the beauty of C++.

Seriously, the library is part of the language specification.

"Parallel programs are particularly prone to time-dependent errors,
which
either cannot be detected by program testing nor by run-time checks.
It is therefore very important that a high-level language designed for
this purpose should provide complete security against time-dependent
errors by means of a compile-time check."
C. A. R. Hoare, Towards a Theory of Parallel Programming (1971)

You cannot provide this kind of compiler support with any library-
based approach. This is one of the failure in Boehm's paper that he
completely ignored this issue and now the ignorant trendy fans take
his paper as a Bible.

An
implementation has to supply both a compiler (or an interpreter) and a
complete library implementation. It also has to assure that it works
according to spec.

An implementation is not identical with the language. Implementation
and language are two different although related issues. The
implementation may provide libraries though they never belong to the
language itself, even if many people erroneously think so.

Is it a flaw that the standard doesn't explain exactly how this is to
be done?

Not only that. What is claimed about C++0x does not match with what is
provided. Language level multi-threading is claimed but library level
multi-threading is provided. It would be more honest to claim the
truth that C++0x provides no more than what is available now in the
average library-based approaches.

Someone mentioned that it is a committee who decides on it: So, if the
committee claims they are working on a beautiful super horse but
external observers can see an average camel coming out, well, that is
discordant. However, if the committee says they are designing a camel,
well, that is fair and honest.

Best Regards,
Szabolcs

Jun 27 '08 #59

Bo Persson

Szabolcs Ferenczi wrote:

On Apr 26, 9:48 am, "Bo Persson" <b...@gmb.dkwrote:
>Szabolcs Ferenczi wrote:
>>On Apr 25, 11:57 am, James Kanze <james.ka...@gmail.comwrote:
...
No. Several people have pointed out several times where the
language addresses concurrency issues.

>>Oh yes, you are one of them who continously keep saying that C++0x
adresses concurrency issues at the language level but who fail to
put here just a single language element for it.

That's the beauty of C++. :-)

Really? I only hope not only that's the beauty of C++.

>Seriously, the library is part of the language specification.

"Parallel programs are particularly prone to time-dependent errors,
which
either cannot be detected by program testing nor by run-time checks.
It is therefore very important that a high-level language designed
for this purpose should provide complete security against
time-dependent errors by means of a compile-time check."
C. A. R. Hoare, Towards a Theory of Parallel Programming (1971)

You cannot provide this kind of compiler support with any library-
based approach. This is one of the failure in Boehm's paper that he
completely ignored this issue and now the ignorant trendy fans take
his paper as a Bible.

It isn't a library based support, it is a library interface that
requires compiler support for the implementation. The standard
document describes the interface to the features, not the
implemenation.

Is that you problem?

>
>An
implementation has to supply both a compiler (or an interpreter)
and a complete library implementation. It also has to assure that
it works according to spec.

An implementation is not identical with the language. Implementation
and language are two different although related issues. The
implementation may provide libraries though they never belong to the
language itself, even if many people erroneously think so.

The standard requires an implementation to supply some libraries.
These certainly belongs to the language.

Let's quote paragraph 1 of the standard document:

"This International Standard specifies requirements for
implementations of the C++ programming language. The first such
requirement is that they implement the language, and so this
International Standard also defined C++."

>
>Is it a flaw that the standard doesn't explain exactly how this is
to be done?

Not only that. What is claimed about C++0x does not match with what
is provided. Language level multi-threading is claimed but library
level multi-threading is provided. It would be more honest to claim
the truth that C++0x provides no more than what is available now in
the average library-based approaches.

It is defined as a set of interfaces, library style. It certainly will
need some compiler support, just like the type_info class of the
library does.

You have noticed, haven't you, that

#include <mutex>

will let you use mutexes in C++0x, but it doesn't <mutexto be a
file, it could be built into the compiler (in whole, or in part).

Generally, the C++ library defined an interface for the features, not
the implementation. The features, like std::vector, are allowed, but
not required, to be implemented as a library. The same goes for the
new threading primitives - they are allowed, but not required, to be
implemented in the compiler or in the library, or as a combination.

>
Someone mentioned that it is a committee who decides on it: So, if
the committee claims they are working on a beautiful super horse but
external observers can see an average camel coming out, well, that
is discordant. However, if the committee says they are designing a
camel, well, that is fair and honest.

The committee tries to define the common interface to a, possibly four
legged, animal that can carry your goods. It doesn't prescribe the
number of humps the animal must have. Perhaps even a small mule will
work, for embedded systems?
Bo Persson

Jun 27 '08 #60

kwikius

"Erik Wikström" <Er***********@telia.comwrote in message
news:8s*****************@newsb.telia.net...

On 2008-04-25 10:33, Szabolcs Ferenczi wrote:

<...>

>Several people did not reply to my request to point out what are the
language elements for concurrency in the proposed C++0x standard. When
I make a list for them, asking (1) what are the language elements for
starting a thread of compulation, (2) by which language elements
mutual exclusion is specified, (3) what are the language elements for
making the threads synchronised---the answer is silence or ignorance
from several people. (Ok, one of them has admitted that these are at
the library level rather than at the language level.)

"This International Standard specifies requirements for implementations
of the C + + programming language." These are the very first words in
the C++ standard document, since the very same document specifies the
C++ standard library I have to conclude that the library is a *part* of
the language. There are a number of things in the standard library that
can not be written in pure C++, these are things that the committee
decided would be better of in the library instead of using new keywords,
that does not make them any less a part of the language.

I understand that you are not satisfied with the syntax/semantics the
committee have decided on for solving the concurrency issues, but that
does not mean that the solution is not a language solution.

FWIW An Interesting talk on C++0x from last tear including concurrency
issues by Lawrence Crowl

http://www.youtube.com/watch?v=ZAG5txfYnW4
regards
Andy Little

Jun 27 '08 #61

On Apr 25, 11:04 pm, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

On Apr 25, 11:57 am, James Kanze <james.ka...@gmail.comwrote:

...
No. Several people have pointed out several times where the
language addresses concurrency issues.

Oh yes, you are one of them who continously keep saying that
C++0x adresses concurrency issues at the language level but
who fail to put here just a single language element for it.

We've been pointing out language issues constantly. The fact
that you don't read them isn't our problem.

Just like tha Bandar-log in The Jungle Book. "We are great. We
are free. We are wonderful. ... We all say so, and so it must
be true."

The Bandar-log never complete anything just like the several
people here who keep talking about the new brave language
level concurrency elements in C++0x but nobody can point out
any humble element.

The definition of the memory model. The most important aspect
of multithreading.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 27 '08 #62

I think (1) and often (2) are essential for a useful concurrent

language. But languages designed for concurrency from the start
didn't always get them right either.

Are linkers also part of related considerations?
Are specifications needed to prevent unwanted instruction reordering during the
linking process?

Regards,
Markus

Jun 27 '08 #63

On Apr 28, 11:03 pm, Markus Elfring <Markus.Elfr...@web.dewrote:

I think (1) and often (2) are essential for a useful concurrent
language. But languages designed for concurrency from the start
didn't always get them right either.

Are linkers also part of related considerations? Are
specifications needed to prevent unwanted instruction
reordering during the linking process?

The C++ standard does not distinguish linking as a separate
operation. It's one of the "phases of translation". The C++
standard specifies behavior for a legal program, and gives the
programmer certain guarantees, without regard to who does what
in any particular implementation's translation process.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 27 '08 #64

The C++ standard does not distinguish linking as a separate

operation. It's one of the "phases of translation". The C++
standard specifies behavior for a legal program, and gives the
programmer certain guarantees, without regard to who does what
in any particular implementation's translation process.

Are you generally looking for instruction reordering prevention that should be
supported by compilers and various linkers?
Are there still any dangers for the concurrency memory model because of
potential optimisations?

Regards,
Markus

Jun 27 '08 #65

Ian Collins

Markus Elfring wrote:

>The C++ standard does not distinguish linking as a separate
operation. It's one of the "phases of translation". The C++
standard specifies behavior for a legal program, and gives the
programmer certain guarantees, without regard to who does what
in any particular implementation's translation process.

Are you generally looking for instruction reordering prevention that
should be supported by compilers and various linkers?
Are there still any dangers for the concurrency memory model because of
potential optimisations?

Please retain the attributions, it's rude to snip then ans makes
following the thread difficult.

--
Ian Collins.

Jun 27 '08 #66

On Apr 28, 8:42*pm, "Boehm, Hans" <hans.bo...@hp.comwrote:

Let me put forward that all your problems are coming from the facts
that in C++0x:

(1) you are still going to solve multi-threading at the library level;
and

(2) your only concern is the tuning of the OPTIMISATION of the
compilers which is developed for the SEQUENTIAL execution in the first
place.

On Apr 21, 10:07 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

Concurrency library + adjusted sequential language != concurrent
language

In my mind, it depends on the adjustments, and the concurrent
language.

You are right: If the adjustments are just made for the shake of
changing the *sequential* optimisation rules of the compiler that
supports the language, it is not very adequate. In that case no matter
how hard you try to adjust, it will always remain to be a sequential
language, i.e. until you address concurrency at the language level.

You say it depends on the concurrent language. Well, a concurrent
language must contain language elements for issues coming from the
concurrent, i.e. simultaneous execution of some parts of the program.
These are:

1) Defining and starting the threads of computation

2) Separating the shared resources and the resources local to a single
process

3) Synchronisation and communication of the simultaneously executed
parts
3.a) Must provide means for mutual exclusion at the language level
3.b) Must handle non-determinism at the language level

Item (1): In a procedural language this can be a kind of a parallel
statement, e.g.

parallel S;
else (int i=0..pmax) P(i);
else {Q; L;}
end

In object-oriented languages the threads of computation can be
combined with objects resulting in some form of active objects, see
for instance the language proposal Distributed Processes.
http://brinch-hansen.net/papers/1978a.pdf

If the process is marked at the language level, the compiler can check
whether the process accesses local and shared resources properly.
Item (2): In a procedural language a single keyword like `shared' or
`resource' may help as a property to the types. In object-oriented
languages the natural unit of marking with the shared property is the
class.

If the shared variables are marked, the compiler can check whether the
processes access the shared resources properly, i.e. excluding each
other.
Item (3): In a well designed concurrent language most probably you can
find an adapted form of Dijkstra's Guarded Commands to deal with non-
determinism (see:
http://www.cs.utexas.edu/users/EWD/t...xx/EWD418.html)
GC has been already adapted to message communication (see
Communicating Sequential Processes and its language realisation OCCAM)
as well as to shared memory communication (see Edison or Distributed
Processes).
http://brinch-hansen.net/papers/1981b.pdf
http://brinch-hansen.net/papers/1978a.pdf

You can find GC adapted in Ada too.

In an object-oriented language, Guarded Commands could be combined
with classes and Conditional Critical Regions (C. A. R. Hoare, Towards
a Theory of Parallel Programming, 1971), something like this:

shared class A {
int i, k;
public:
A() i(0), k(0) {}
void foo() {
when (i>10) {
S;
}
else (i k) {
P;
}
else (k i) {
Q;
}
}
...
};

Class A being a shared class means that private members `i' and `k'
are shared variables and public methods are Critical Regions already.
So without classes, in a C-like language it would look something like
this:

shared int i=0, k=0;
void foo() {
with (i,k) {
when (i>10) {
S;
}
else (i k) {
P;
}
else (k i) {
Q;
}
}
}

Note that if some notations like the ones shown above are used, the
compiler can easily check whether a shared variable is accessed in a
wrong way or in a proper way.

The compiler can also optimise how it translates a Conditional
Critical Region the most optimal way on a given platform. This is,
however, not sequential optimisation any more. Neither is it about
suppressing sequential optimisations here and there. However,
sequential optimisations can be used unrestricted in parts of the
processes which parts are working on local variables only. In a
concurrent language it is clear to the compiler what parts these are.

What I have shown above are just examples how concurrency can be
addressed at the language level. I am not claiming that you should
exactly include these elements into C++0x. However, I do claim that
you should address concurrent programming at the language level in C+
+0x otherwise C++ will lag behind in concurrent programming.

Now, as you can see:

Concurrency library + adjusted sequential language != concurrent
language

Quod erat demonstrandum

Best Regards,
Szabolcs

Jun 27 '08 #67

On Apr 30, 7:55 am, Markus Elfring <Markus.Elfr...@web.dewrote:

The C++ standard does not distinguish linking as a separate
operation. It's one of the "phases of translation". The C++
standard specifies behavior for a legal program, and gives the
programmer certain guarantees, without regard to who does what
in any particular implementation's translation process.

Are you generally looking for instruction reordering
prevention that should be supported by compilers and various
linkers? Are there still any dangers for the concurrency
memory model because of potential optimisations?

I'm not sure I understand the question. The current rules allow
considerable reordering, and don't take threading issues into
consideration. Which means that you can't write concurrent code
without additional, implementation defined guarantees.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 27 '08 #68

Szabolcs Ferenczi schrieb:

Let me put forward that all your problems are coming from the facts
that in C++0x:
(1) you are still going to solve multi-threading at the library level;
and

(2) your only concern is the tuning of the OPTIMISATION of the
compilers which is developed for the SEQUENTIAL execution in the first
place.

Why does the concurrency memory model not seem to be an important part of the
software development game in your view?

Regards,
Markus

Jun 27 '08 #69

On Apr 30, 9:35*pm, Markus Elfring <Markus.Elfr...@web.dewrote:
...

Why does the concurrency memory model not seem to be an important part of the
software development game in your view?

Can you summarise here the so-called concurrency memory model?

If you do so, we can talk about it but first we should agree what we
are talking about.

How does the concurrency memory model in your view makes it possible
for the compiler to make basic checks about whether shared variables
are accessed inside or outside Critical Regions?

How does the concurrency memory model in your view helps the compiler
determining whether a piece of code is part of a particular thread of
execution?

Best Regards,
Szabolcs

Jun 27 '08 #70

On Apr 30, 3:23*am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

On Apr 28, 8:42*pm, "Boehm, Hans" <hans.bo...@hp.comwrote:

Let me put forward that all your problems are coming from the facts
that in C++0x:

(1) you are still going to solve multi-threading at the library level;
and

No, though we still have library calls to create threads and for
synchronization operations.

>
(2) your only concern is the tuning of the OPTIMISATION of the
compilers which is developed for the SEQUENTIAL execution in the first
place.

No. I am concerned about optimizations (hardware and compiler). But
even in most more commonly used concurrent languages, it is tricky to
define what optimizations are allowed, and what the user can rely on.
See the recent work on the Java memory model, for example. And, as I
said in the last message, I don't think Ada has gotten this quite
right in the presence of atomic variables.

I think you're advocating a particular kind of concurrent language
that does not provide any kind of variable that can be safely accessed
concurrently by multiple threads, i.e. nothing like Java volatiles or C
++ atomics. Everything must be protected by locks or a similar
mechanism. That does simplify things. I would argue that the result
is often impractical. If you don't believe that, try reference
counting with the count operations protected by locks.

However, even if you go this route (and for many programs that's
fine), the problem does not go away completely. Consider

Thread 1:
x = 42;
lock(l);

Thread 2:
while (trylock(l) == SUCCESS) unlock(l);
r1 = x;

Is this allowable? Does it guarantee r1 == 42? The answer can have
substantial effect on the cost of the lock() implementation. C++0x
resolves it in a new and interesting way.

You can of course make this problem go away too by moving to ever more
restrictive languages, in which you can't express something like
trylock(), or cannot even express code that might involve races. I
think neither is practical, in that it doesn't let me write code that
commonly needs to be written, unless we abandon the shared-memory,
lock-based programming model altogether. I think all widely used
concurrent languages and thread libraries allow me to write both
trylock (or a lock with a timeout) and data races.

For example, I need to be able to write code that initializes an
object in one thread without holding a lock, makes a shared pointer p
point to it, reads p in another thread, and then access the referenced
object in the second thread, again without holding a lock. (The
accesses to p may be lock protected.) This involves no data race.
But it's hard to tell that statically.

I also need to be able to protect a bunch of objects with a smaller
set of locks, by hashing the object address to an entry in an array of
locks, or do hand-over-hand locking on a linked list.

That does mean that we would like other tools to detect data races,
since the compiler can't do so. Unfortunately that seems to be hard
precisely because syntactic disciplines that preclude races are too
restrictive.

Hans

Jun 27 '08 #71

Owen Jacobson

On Apr 25, 5:04*pm, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

On Apr 25, 11:57*am, James Kanze <james.ka...@gmail.comwrote:

...
No. *Several people have pointed out several times where the
language addresses concurrency issues.

Oh yes, you are one of them who continously keep saying that C++0x
adresses concurrency issues at the language level but who fail to put
here just a single language element for it.

If the only "language element" your world-view will admit is a new
syntactic construct, then you're right and the new C++ standard does
not contain any language elements to support threading. However,
that's an extremely limiting definition, not shared by very many
people.

A comprehensive memory model is *required* for correct threaded code,
and it's something C++ does not yet have. At the most basic level, a
memory model is a set of rules dictating what writes are and are not
allowed to affect a given read. In the following simple example:

struct foo {
short a;
short b;
};

foo a_foo;

a memory model provides hard and fast rules about whether or not a
read of a_foo.b is allowed to be affected by writes to a_foo.a, or to
a_foo itself. Note that this doesn't necessarily need to involve
threads in the definition: the rules will hold under all (valid)
executions of the program possible for a conforming implementation,
including multithreaded executions.

If there is a comprehensive memory model that allows it (as with the
one in the upcoming C++ standard), *then* a library can provide the
threading primitives that are correct with respect to that memory
model. Without the rules a memory model provides, you can't state
that

struct bar {
short a;
short b;

pthread_mutex_t a_mtx;
pthread_mutex_t b_mtx;
};

bar a_bar;

void thread_a () {
pthread_mutex_lock (&a_bar.a_mtx);
a_bar.a++;
pthread_mutex_unlock (&a_bar.a_mtx);
}

void thread_b () {
pthread_mutex_lock (&a_bar.b_mtx);
a_bar.b = 5;
std::cout << a_bar.b << std::endl;
pthread_mutex_unlock (&a_bar.b_mtx);
}

is either correct *or* incorrect if thread_a and thread_b are called
on different threads, because nothing guarantees that the write to
a_bar.a will not affect a_bar.b.

*That's* the nature of the language support being added. It's not
about syntax: it's about semantics and rules. The tools for creating
and synchronizing threads are being added to the library because there
is no need to modify the language to support them, and because
modifying the C++ language itself is fraught with peril. The language
is being extended to provide rules that allow the library to be both
portable and correct.

-o

Jun 27 '08 #72

Szabolcs Ferenczi schrieb:

Can you summarise here the so-called concurrency memory model?

It specifies the rules under which actions are correctly performed on memory
locations in the context of multi-threaded executions.

How does the concurrency memory model in your view makes it possible
for the compiler to make basic checks about whether shared variables
are accessed inside or outside Critical Regions?

The tool will check if function/method implementations are compliant to the
fundamental rules.

How does the concurrency memory model in your view helps the compiler
determining whether a piece of code is part of a particular thread of
execution?

I guess that the tool can only determine this relationship if whole program
analysis would be applied.

Regards,
Markus

Jun 27 '08 #73

On May 1, 7:21*am, "Boehm, Hans" <hans.bo...@hp.comwrote:

On Apr 30, 3:23*am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:On Apr 28, 8:42*pm, "Boehm, Hans" <hans.bo...@hp.comwrote:

Let me put forward that all your problems are coming from the facts
that in C++0x:

(1) you are still going to solve multi-threading at the library level;
and

No, though we still have library calls to create threads and for
synchronization operations.

If you `still have library calls to create threads and for
synchronization operations' that exactly means: threading is at the
library level. (I cannot understand why do you have to start your
comments with the word `no' even if you confirm what I claim.)

That you support threading at the library level is too bad since with
respect to the concurrency it is like some assembly language
programming:

MT library calls = assembly programming

In words: Threading at the library level can be compared to assembly
programming. In both cases you miss the support of a compiler. In both
cases you think you are in control of the details. In both cases the
control of the details prevents you to make large applications.

(2) your only concern is the tuning of the OPTIMISATION of the
compilers which is developed for the SEQUENTIAL execution in the first
place.

No. *I am concerned about optimizations (hardware and compiler).

What do you deny here again with `no'? Your statement tells the same
as point (2).

>*But
even in most more commonly used concurrent languages, it is tricky to
define what optimizations are allowed, and what the user can rely on.

You mention the commonly used concurrent languages. What are they?

Please note that Java is not designed properly for concurrency either.
So it is just false to prove something with another badly designed
language.

Let me call your attention to that not only I claim that Java is not
well designed for concurrency. See Java's Insecure Parallelism, Per
Brinch Hansen (1999).
http://brinch-hansen.net/papers/1999b.pdf

"The author concludes that Java ignores the last twenty-five years of
research in parallel programming languages."

Therefore, it is simply funny that some guys on this discussion list
refer to the state-of-the-art in concurrency and they mean the same
things Java has.

See the recent work on the Java memory model, for example.

I am afraid that in C++0x you are trying to copy the wrong way Java
develops. Besides being badly designed for concurrency, Java brings in
this false concern what they refer to as the memory model. The same
mistake is there in Java from the concurrency point of view. Exactly
the same what you are dealing with: In Java with the so-called memory
model they try to fix a buggy concurrent program where shared
variables are accessed without any protection.

The concern about memory model is some low level consideration that
can be taken into account in the implementation of the high level
elements of a programming language.

>*And, as I
said in the last message, I don't think Ada has gotten this quite
right in the presence of atomic variables.

Yes, you have said that really. However, you did not tell whether you
consider an erroneous Ada program or a correct one from the concurrent
programming point of view. Please write an example in Ada illustrating
what do you mean.

I think you're advocating a particular kind of concurrent language
that does not provide any kind of variable that can be safely accessed
concurrently by multiple threads, i.e. nothing like Java volatiles or C
++ atomics.

Well, the small grained atomic operations is just a very small part of
the problem domain in concurrent programming. Java volatiles do not
provide atomicity for you except for atomic write and atomic read in
case of the most simple types. For instance the increment of a
volatile is not atomic even with the simple type integer.

The usability of the proposed C++ atomics is limited to certain kind
of problems. They cannot be regarded as general language means. Atomic
operations start to be useful when the check and the set is atomic
together. It is, however, a built in small Critical Region. You see in
this case the Critical Region may not be implemented with locking.

>*Everything must be protected by locks or a similar
mechanism. *That does simplify things. *I would argue that the result
is often impractical.

I would not say that `everything must be protected by locks'.

In fact locks should not appear at the language level. Locks are
library level elements. You do not need them at the language level.
Well, at least you do not need them in a well designed concurrent
language.

What I claim is that a concurrent language must provide some language
level means to specify that the intention of the programmer is to
define some block where he or she wants no data race but mutual
exclusion. Usually the language level means for this is the Critical
Region. A Critical Region must not necessarily be implemented by
locking. It can be at the discretion of the compiler (optimisation)
what is the most appropriate implementation of the language level
Critical Region on a given architecture.

Best Regards,
Szabolcs

Jun 27 '08 #74

On May 1, 7:21*am, "Boehm, Hans" <hans.bo...@hp.comwrote:

On Apr 30, 3:23*am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:On Apr 28, 8:42*pm, "Boehm, Hans" <hans.bo...@hp.comwrote:

[...]
However, even if you go this route (and for many programs that's
fine), the problem does not go away completely. *Consider

Thread 1:
x = 42;
lock(l);

Thread 2:
while (trylock(l) == SUCCESS) unlock(l);
r1 = x;

Is this allowable?

Not at all. It is not only a buggy concurrent program but a very
inefficient one too. It locks and unlocks an object while busy
waiting.

>*Does it guarantee r1 == 42?

Not in all circumstances. It is timing dependent, which is a typical
concurrent bug. Who can guarantee it that just when one thread is
locking on `l' another one would not overwrite `x' with another value.
The `x' is meant to be a shared variable but it is not accessed in a
safe way. It is a timing dependent construction and therefore an
unsafe one.

>*The answer can have
substantial effect on the cost of the lock() implementation.

My answer is that it has a concurrency bug and it must be fixed.
Besides, I do not think that the cost of an operation should have any
effect on the functional behaviour.

>*C++0x
resolves it in a new and interesting way.

How does C++0x resolve it then? I am curious.

How will you prevent another thread changing the value `x' as I
described above?

You can of course make this problem go away too by moving to ever more
restrictive languages, in which you can't express something like
trylock(), or cannot even express code that might involve races.

The lock does not belong to the language level unless you are dealing
with some kind of assembly level programming. In a high level language
you do not instruct the computer what to do rather you express the
conditions what you would like to achieve. Leave it to the compiler to
instruct the machine at a low level.

"There are two views of programming. In the old view it is regarded as
the purpose of our programs to instruct our machines; in the new one
it will be the purpose of our machines to execute our programs."
E.W. Dijkstra, Comments at a Symposium (1975)
https://www.cs.utexas.edu/users/EWD/...xx/EWD512.html

>*I
think neither is practical, in that it doesn't let me write code that
commonly needs to be written, unless we abandon the shared-memory,
lock-based programming model altogether.

What you mean by a `restrictive language' is the one that restricts
you in committing the most common concurrent programming errors.

"Parallel programs are particularly prone to time-dependent errors,
which
either cannot be detected by program testing nor by run-time checks.
It is therefore very important that a high-level language designed for
this purpose should provide complete security against time-dependent
errors by means of a compile-time check."
C. A. R. Hoare, Towards a Theory of Parallel Programming (1971)

"Well, if we cannot make concurrent programs work by proofreading
or testing, then I can see only one other effective method at the
moment:
to write all concurrent programs in a programming language that is so
structured that you can specify exactly what processes can do to
shared
variables and depend on a compiler to check that the programs satisfy
these assumptions. Concurrent Pascal is the first language that makes
this possible."
Per Brinch Hansen: The Architecture of Concurrent Programs (1977)

>*I think all widely used
concurrent languages and thread libraries allow me to write both
trylock (or a lock with a timeout) and data races.

I would say that no decent concurrent programming language would allow
you to instruct the machine such a low level as locking. On the other
hand, low level thread libraries do allow you to write both trylock
and data races.

For example, I need to be able to write code that initializes an
object in one thread without holding a lock, makes a shared pointer p
point to it, reads p in another thread, and then access the referenced
object in the second thread, again without holding a lock. *(The
accesses to p may be lock protected.) *This involves no data race.
But it's hard to tell that statically.

It does involve a data race and there is a way to solve it. So, your
requirement is that one thread constructs an object, passes its
address to another thread, and the other thread has exclusive access
to it. Let us re-state the problem in a high level language notation:

shared struct {
Ob *x;
bool constructed;
} res {NULL, false};

Thread 1:
with (res) {
constructed = true;
x = new Ob();
}

Thread 2:
with (res) {
when (constructed == true) {
// manipulate x
}
}

Now it is up to an optimising compiler to insert locks or any other
means to implement mutual exclusion. We have just expressed by
language means what we are going to achieve. The rest is up to the
compiler.

I also need to be able to protect a bunch of objects with a smaller
set of locks, by hashing the object address to an entry in an array of
locks, or do hand-over-hand locking on a linked list.

I am 100% sure that the problem to be solved does not specify that you
must use locks.

The problem might require that you arrange your algorithm in a way
that those bunch of objects should be accessed in a mutually exclusive
way. Then it is better if the language allows you to express the real
requirement and you do not have to over-specify the solution. Nothing
prevents you to apply smaller granularity of mutual exclusion
specification as required by your problem so solve.

That does mean that we would like other tools to detect data races,
since the compiler can't do so. *Unfortunately that seems to be hard
precisely because syntactic disciplines that preclude races are too
restrictive.

The compiler can do a lot of checks for you including preventing data
races provided the language is well designed with respect to
concurrency. The language must include means to mark shared variables
and Critical Regions. That is the key to it.

Best Regards,
Szabolcs

Jun 27 '08 #75

On May 1, 10:12 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

On May 1, 7:21 am, "Boehm, Hans" <hans.bo...@hp.comwrote:

On Apr 30, 3:23 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:On Apr 28, 8:42 pm, "Boehm, Hans" <hans.bo...@hp.comwrote:

Let me put forward that all your problems are coming from the facts
that in C++0x:

(1) you are still going to solve multi-threading at the library level;
and

No, though we still have library calls to create threads and for
synchronization operations.

If you `still have library calls to create threads and for
synchronization operations' that exactly means: threading is at the
library level. (I cannot understand why do you have to start your
comments with the word `no' even if you confirm what I claim.)

We're arguing about terminology here. The point is that no matter
what thread creation etc. looks like, the language semantics, and in
particular, the semantics of shared variables, have to talk about
concurrency. In the C++0x working paper, they do. If you call that
"threading at the library level", fine.

>
That you support threading at the library level is too bad since with
respect to the concurrency it is like some assembly language
programming:

MT library calls = assembly programming

In words: Threading at the library level can be compared to assembly
programming. In both cases you miss the support of a compiler. In both
cases you think you are in control of the details. In both cases the
control of the details prevents you to make large applications.

The point is that I don't know how to get useful compiler support
without severely restricting the utility of the language. You do want
the compiler to ensure, for example, that locks are released when an
exception is thrown. But that's easily handled in C++0x without
additional language syntax. And things like avoiding data races are
hard, whether or not you add language syntax.

>

(2) your only concern is the tuning of the OPTIMISATION of the
compilers which is developed for the SEQUENTIAL execution in the first
place.

No. I am concerned about optimizations (hardware and compiler).

What do you deny here again with `no'? Your statement tells the same
as point (2).

No, the problem is not limited to sequential languages:

>
But
even in most more commonly used concurrent languages, it is tricky to
define what optimizations are allowed, and what the user can rely on.

You mention the commonly used concurrent languages. What are they?

Please note that Java is not designed properly for concurrency either.
So it is just false to prove something with another badly designed
language.

Clearly Java is a concurrent language. See the original Java Language
Specification. Whether or not you think it's well designed is a
different issue. But it seems to me it's probably the most widely
used one, probably followed by C# and then Ada as a distant third.

>
Let me call your attention to that not only I claim that Java is not
well designed for concurrency. See Java's Insecure Parallelism, Per
Brinch Hansen (1999).http://brinch-hansen.net/papers/1999b.pdf

"The author concludes that Java ignores the last twenty-five years of
research in parallel programming languages."

Therefore, it is simply funny that some guys on this discussion list
refer to the state-of-the-art in concurrency and they mean the same
things Java has.

Thanks for the reference.

However, it seems to me that there is really a pretty straightforward
trade-off between expressivity and safety here. You can statically
detect data races (Brinch Hansen's approach), or you can have a
language that's roughly as expressive as the mainstream concurrent
languages, and can express things like non-nested locking, hashing of
objects to locks, reuse of existing sequential libraries inside
critical sections, etc. I wish I knew hot to do both, but I don't.
If you want Concurrent Pascal fine. But the standards committee can't
turn C++ into Concurrent Pascal.

>
See the recent work on the Java memory model, for example.

I am afraid that in C++0x you are trying to copy the wrong way Java
develops. Besides being badly designed for concurrency, Java brings in
this false concern what they refer to as the memory model. The same
mistake is there in Java from the concurrency point of view. Exactly
the same what you are dealing with: In Java with the so-called memory
model they try to fix a buggy concurrent program where shared
variables are accessed without any protection.

The concern about memory model is some low level consideration that
can be taken into account in the implementation of the high level
elements of a programming language.

At a serious cost in what you can express in the resulting language.
And not a cost that any mainstrean languages have been willing to
incur.

>
And, as I
said in the last message, I don't think Ada has gotten this quite
right in the presence of atomic variables.

Yes, you have said that really. However, you did not tell whether you
consider an erroneous Ada program or a correct one from the concurrent
programming point of view. Please write an example in Ada illustrating
what do you mean.

I am not an Ada programmer, and hence won't attempt the syntax,
especially in a C++ newsgroup. As I said in my earlier message:

Thread 1:
x = 42; x_init = true;

Thread 2:
while (!x_init); assert(x == 42);

where x_init is declared atomic with the appropriate pragma, is an
interesting example. I believe the assignment in thread 1 to x_init
does not "signal" (in the sense of 9.10, Ada95 RM) the final read of
x_init in thread 2. Hence the accesses to x are not sequential, and
this is erroneous. I'd argue that this is (a) easy to implement, (b)
unexpected, and (c) fairly useless to programmers. For example, it
means that Ada atomic variables cannot be used to implemetn double-
checked locking. (Double-checked locking is both a sufficient
performance win and sufficiently widely used, that I think it needs to
be supported without resorting to assembly code. Of course, you can't
write it in Concurrent Pascal either.)

[Discussion of atomics omitted. See last message.]

>
What I claim is that a concurrent language must provide some language
level means to specify that the intention of the programmer is to
define some block where he or she wants no data race but mutual
exclusion. Usually the language level means for this is the Critical
Region. A Critical Region must not necessarily be implemented by
locking. It can be at the discretion of the compiler (optimisation)
what is the most appropriate implementation of the language level
Critical Region on a given architecture.

We agree here that that's a goal. The problem is that we don't know
how to do this without sacrificing a lot of very useful (necessary if
you want to get the standard approved) flexibility. Even translating
a simple critical section containing only an increment to an atomic
increment instruction (plus fences) is nontrivial in a C++-like
setting.

Hans

>
Best Regards,
Szabolcs

Jun 27 '08 #76

On 18 ÁÐÒ, 01:31, "Chris Thomasson" <cris...@comcast.netwrote:

Indeed. That's a major plus for me. The really cool thing, for me at least,
is that the next C++ will allow me to create a 100% standard implementation
of my AppCore library <http://appcore.home.comcast.netwhich currently uses
POSIX Threads and X86 assembly. The fact that C++ has condition variables,
mutexs, atomic operations and fine-grain memory barriers means a lot to me..
Finally, there will be such a thing as 100% portable non-blocking
algorithms. I think that's so neat.

There are no compiler ordering barriers in C++Ox.
So you can't implement VZOOM object lifetime management in
autodetection mode, you need compiler_acquire_barrier and
compiler_release_barrier.
Also you can't implement SMR+RCU, asymmetric reader-writer mutex,
asymmetric eventcount etc. Basically any algorithm where you want to
eliminate all hardware barriers from fast-path.

There are no specific barriers in C++Ox. Like 'load-release-wrt-loads'
which you need to implement sequence mutex (SeqLock). In C++0x you can
use 'load-release' barrier as replacement for 'load-release-wrt-
loads'. But 'load-release' includes #StoreLoad. 'load-release-wrt-
loads' doesn't inlcude #StoreLoad.
Thus, as I understand it now, C++0x will be suitable only for 'basic'
synchronization algorithms. If you want to implement 'advanced'
synchronization algorithms you still will have to implement 'my own
atomic library' and port it to every compiler and hardware platform
manually. Yikes!

Dmitriy V'jukov

Jun 27 '08 #77

"Dmitriy V'jukov" <dv*****@gmail.comwrote in message
news:0d**********************************@k13g2000 hse.googlegroups.com...

On 18 ÁÐÒ, 01:31, "Chris Thomasson" <cris...@comcast.netwrote:

Indeed. That's a major plus for me. The really cool thing, for me at
least,
is that the next C++ will allow me to create a 100% standard
implementation
of my AppCore library <http://appcore.home.comcast.netwhich currently
uses
POSIX Threads and X86 assembly. The fact that C++ has condition
variables,
mutexs, atomic operations and fine-grain memory barriers means a lot to
me.
Finally, there will be such a thing as 100% portable non-blocking
algorithms. I think that's so neat.

There are no compiler ordering barriers in C++Ox.

Thats what I thought.

So you can't implement VZOOM object lifetime management in
autodetection mode, you need compiler_acquire_barrier and
compiler_release_barrier.

No way could I do highly platform dependant auto-detection with C++0x. You
can get it to work with signals, but that's a little crazy:

http://groups.google.com/group/comp....45d3e17806ccfe

;^)

Also you can't implement SMR+RCU, asymmetric reader-writer mutex,
asymmetric eventcount etc. Basically any algorithm where you want to
eliminate all hardware barriers from fast-path.

Probably not.

There are no specific barriers in C++Ox. Like 'load-release-wrt-loads'
which you need to implement sequence mutex (SeqLock). In C++0x you can
use 'load-release' barrier as replacement for 'load-release-wrt-
loads'. But 'load-release' includes #StoreLoad. 'load-release-wrt-
loads' doesn't inlcude #StoreLoad.

Perhaps you should post this over on the cpp-threads mailing list:

http://www.decadentplace.org.uk/cgi-...fo/cpp-threads

Thus, as I understand it now, C++0x will be suitable only for 'basic'
synchronization algorithms. If you want to implement 'advanced'
synchronization algorithms you still will have to implement 'my own
atomic library' and port it to every compiler and hardware platform
manually. Yikes!

For automatic epoch detection, and all that "type" of stuff, I think your
correct.

Jun 27 '08 #78

On May 1, 2:26 pm, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

On Apr 28, 8:42 pm, "Boehm, Hans" <hans.bo...@hp.comwrote:

On Apr 21, 10:07 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:
[...]
If you need low-level atomics, things are very tricky anyway,

I do not need low level atomics because I can write correct concurrent
programs with Conditional Critical Regions. An optimising compiler may
implement some Critical Regions with atomics. However, there are
hackers around who are just crying for atomics and low level means.

And unfortunately for good reason. If your optimizer can generate
something like the Linux RCU implementation, or even double-checked
locking, from code based on critical regions, I'm impressed. And
unfortunately, if we're writing parallel code, it's often because we
need the performance.

We have had a lot of arguments about the need for "low-level"
atomics. But I don't think there's been much question about the need
for some form of atomics to supplement critical regions / locks.

>
...
Ada95 also has atomic operations, but if I read the spec correctly,
they seem to have largely overlooked the memory ordering issues.

From your point of view it may seem as an overlook but the truth is
that there is no such problem as memory ordering in a well designed
concurrent language. Just look at the comments to your following code
fragments.

In
particular, it I write the equivalent of

Thread 1:
x = 42;
x_init = true;

Thread 2:
while (!x_init);
assert(x == 42);

this can fail (and in fact is incorrect) even if x_init is atomic.

Of course it is incorrect. Here both `x' and `x_init' are shared
variables but you miss to declare them as such. Furthermore, you are
trying to access them `in sequential manner' i.e. as if they were
variables in a sequential program. However, then, why are you
wondering that an incorrect concurrent program can fail?

This is correct in Java and the C++0x working paper if x_init is
volatile(Java)/atomic(C++). The variable x cannot be simultaneously
accessed, hence there is no data race on x. And x_init is effectively
declared shared.

In a C-like environment, it doesn't work to declare x shared in cases
like this. Often x = 42 is really an initialization of some library
that has no idea whether it wil be called from a single thread,
possibly in a single threaded application, or from multiple threads
which serialize access to the library. The library doesn't know
whether its variables are shared; and it doesn't matter.

>
I guess the meaning of your fragment that you would like Thread 2 to
proceed only when shared variable becomes 42:

shared int x = 0;

Thread 1:
with (x) {x = 42;}

Thread 2:
with (x) when (x == 42) {
// do whatever you want to do when x==42

}

Let me note that if you want Thread 2 to do anything when x==42, you
should specify that action inside the Critical Region. It is because
in a concurrent programming environment the change of the shared
variable must be regarded as a non-deterministic event.

But now you've effectively been forced to move code that cannot
participate in a race into a critical section. As a result you're
holding locks much longer than you need to, reducing concurrency, at
least in a lock-based implementation.

Hans

Jun 27 '08 #79

On May 1, 2:08 pm, "Chris Thomasson" <cris...@comcast.netwrote:

"Dmitriy V'jukov" <dvyu...@gmail.comwrote in message

news:0d**********************************@k13g2000 hse.googlegroups.com...

On 18 ÁÐÒ, 01:31, "Chris Thomasson" <cris...@comcast.netwrote:
Indeed. That's a major plus for me. The really cool thing, for me at
least,
is that the next C++ will allow me to create a 100% standard
implementation
of my AppCore library <http://appcore.home.comcast.netwhich currently
uses
POSIX Threads and X86 assembly. The fact that C++ has condition
variables,
mutexs, atomic operations and fine-grain memory barriers means a lot to
me.
Finally, there will be such a thing as 100% portable non-blocking
algorithms. I think that's so neat.
There are no compiler ordering barriers in C++Ox.

Thats what I thought.

Currently not. There is still some low key discussion about that. I
don't think that affects threads use, though. It does affect code
using asynchronous signals.

>
So you can't implement VZOOM object lifetime management in
autodetection mode, you need compiler_acquire_barrier and
compiler_release_barrier.

No way could I do highly platform dependant auto-detection with C++0x. You
can get it to work with signals, but that's a little crazy:

http://groups.google.com/group/comp..../browse_frm/th...

;^)

Also you can't implement SMR+RCU, asymmetric reader-writer mutex,
asymmetric eventcount etc. Basically any algorithm where you want to
eliminate all hardware barriers from fast-path.

At least parts of this are under active discussion. See N2556. It
turns out that this extremely tricky to do at the source language
level and actually get guaranteed correctness. RCU usually relies on
the hardware enforcing dependency-based memory ordering. But
compilers like to break dependencies because to shortens critical
paths. And not all hardware agrees on what constitutes a dependency
anyway. I think we finally know a reasonable way to do this, though.

>
Probably not.

There are no specific barriers in C++Ox. Like 'load-release-wrt-loads'
which you need to implement sequence mutex (SeqLock). In C++0x you can
use 'load-release' barrier as replacement for 'load-release-wrt-
loads'. But 'load-release' includes #StoreLoad. 'load-release-wrt-
loads' doesn't inlcude #StoreLoad.

load-release actually doesn't make much sense ...

We currently don't generally have ordering forms that apply to only
loads or only stores. See N2176 for a rationale.

>
Perhaps you should post this over on the cpp-threads mailing list:

http://www.decadentplace.org.uk/cgi-...fo/cpp-threads

Thus, as I understand it now, C++0x will be suitable only for 'basic'
synchronization algorithms. If you want to implement 'advanced'
synchronization algorithms you still will have to implement 'my own
atomic library' and port it to every compiler and hardware platform
manually. Yikes!

For automatic epoch detection, and all that "type" of stuff, I think your
correct.

You can always implement using either sequentially consistent atomics,
or explicitly ordered atomics, by using the next stronger ordering
constraint. For sequentially consistent atomics, we know that it's
highly platform dependent how much performance you lose. There is a
hand-waving argument that on X86, it's probably not much. (The added
cost there is all in the stores, and in most cases, they are either
greatly outnumbered by loads, or typically encounter a coherence miss
anyway. This is not a 100% argument.)

I think that for explicitly ordered atomics, the performance cost is
generally quite small across most architectures. A few issues are
still under discussion. The chosen ordering constraints are a
compromise between simplicity and performance. But they were also
pruned a bit by observations that some of the other common ones are in
fact very hard to use correctly and/or describe.

Hans

Jun 27 '08 #80

Szabolcs Ferenczi schrieb:

You have the same problem there: Your concern is not about any
concurrent language feature but about what if an _incorrect concurrent
program_ is optimised by the compiler which compiler optimises for the
_sequential execution._ You do not want to inform the compiler that it
should not optimise for sequential execution any more because it is
not a sequential program but rather a concurrent one. You could inform
the compiler if threading would be introduced at the language level
(see marking shared variables and marking critical sections at the
language level).

I imagine that this abstraction level will be an use case for meta-compilation.
http://en.wikipedia.org/wiki/OpenC++

Which will the implementation language be for the key word "shared" to specify a
specific class property?

I guess that proxies or interceptors can help to apply advanced properties or
features to objects.

Regards,
Markus

Jun 27 '08 #81

On May 2, 2:35*am, "Boehm, Hans" <hans.bo...@hp.comwrote:

On May 1, 2:26 pm, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:On Apr 28, 8:42 pm, "Boehm, Hans" <hans.bo...@hp.comwrote:

On Apr 21, 10:07 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

[...]
In
particular, it I write the equivalent of

Thread 1:
x = 42;
x_init = true;

Thread 2:
while (!x_init);
assert(x == 42);

this can fail (and in fact is incorrect) even if x_init is atomic.

Of course it is incorrect. Here both `x' and `x_init' are shared
variables but you miss to declare them as such. Furthermore, you are
trying to access them `in sequential manner' i.e. as if they were
variables in a sequential program. However, then, why are you
wondering that an incorrect concurrent program can fail?

This is correct in Java and the C++0x working paper if x_init is
volatile(Java)/atomic(C++). *The variable x cannot be simultaneously
accessed, hence there is no data race on x. *And x_init is effectively
declared shared.

First of all, can you please decide which statement of yours is what
you really hold:

1) `this can fail (and in fact is incorrect) even if x_init is
atomic.'

2) `This is correct in Java and the C++0x working paper if x_init is
volatile(Java)/atomic(C++).'

Both statements are made by you and both refer to the same code
fragment. The only difference is that the first statement is made by
you for your original code fragment and the second one is made in
response to my comments.

What do you really think about it? Is it correct or incorrect?

Besides, I can admit that if you declare `x_init' volatile(Java)/
atomic(C++), the access to `x_init' will be exclusive (as if there
would be a tiny Critical Region defined).

On the other hand, it is a typical concurrent bug to think that if
`x_init' is volatile(Java)/atomic(C++), that has any effect on
accessing another variable `x' subsequent to inspecting atomically the
volatile(Java)/atomic(C++) one. The atomicity is valid for one and
only one operation individually. Two atomic operations in a sequence
cannot be build on the result of each other. You need a Critical
Region for that.

In fact in concurrent programming you can assume any long delay
between two subsequent operations and if those two operations together
is not declared to be an atomic one (see Critical Region) anything may
happen in between.

Thread 2:
while (!x_init);
// <--- x_init may become false right here
// delay(1 day); <--- e.g. x=33
assert(x == 42);

In this case anything may happen with `x' during the delay because
`while (!x_init);' and `assert(x == 42);' are not inside the same
Critical Region. Thus, you cannot be sure about the assert.

On the other hand, let us consider

Thread 2:
with (x) when (x == 42) {
// delay(1 day); <--- x cannot be accessed
// do whatever you want to do when x==42
}

In this case anything may happen during the delay _except_ with `x'
since we expressed our intention that we want to do something with it
when the condition holds (see the Conditional Critical Region).

Best Regards,
Szabolcs

Jun 27 '08 #82

=?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=

On 2008-05-02 13:35, Szabolcs Ferenczi wrote:

On May 2, 2:35 am, "Boehm, Hans" <hans.bo...@hp.comwrote:
>On May 1, 2:26 pm, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:On Apr 28, 8:42 pm, "Boehm, Hans" <hans.bo...@hp.comwrote:

On Apr 21, 10:07 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

[...]
In
particular, it I write the equivalent of

Thread 1:
x = 42;
x_init = true;

Thread 2:
while (!x_init);
assert(x == 42);

this can fail (and in fact is incorrect) even if x_init is atomic.

Of course it is incorrect. Here both `x' and `x_init' are shared
variables but you miss to declare them as such. Furthermore, you are
trying to access them `in sequential manner' i.e. as if they were
variables in a sequential program. However, then, why are you
wondering that an incorrect concurrent program can fail?

This is correct in Java and the C++0x working paper if x_init is
volatile(Java)/atomic(C++). The variable x cannot be simultaneously
accessed, hence there is no data race on x. And x_init is effectively
declared shared.

First of all, can you please decide which statement of yours is what
you really hold:

1) `this can fail (and in fact is incorrect) even if x_init is
atomic.'

2) `This is correct in Java and the C++0x working paper if x_init is
volatile(Java)/atomic(C++).'

Both statements are made by you and both refer to the same code
fragment. The only difference is that the first statement is made by
you for your original code fragment and the second one is made in
response to my comments.

What do you really think about it? Is it correct or incorrect?

Both are correct, it is just you who failed to quote all the relevant
parts of the original message. If you had you would see that the 'this
can fail...' part referred to that (kind of) code using Ada's memory
model, and the 'This is correct...' part is for the same code using
Java's and C++0x's memory model.

--
Erik WikstrÃ¶m

Jun 27 '08 #83

On May 2, 3:03*pm, Erik Wikström <Erik-wikst...@telia.comwrote:

On 2008-05-02 13:35, Szabolcs Ferenczi wrote:

On May 2, 2:35 am, "Boehm, Hans" <hans.bo...@hp.comwrote:
On May 1, 2:26 pm, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:On Apr 28, 8:42 pm, "Boehm, Hans" <hans.bo...@hp.comwrote:

On Apr 21, 10:07 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

[...]
In
particular, it I write the equivalent of

Thread 1:
x = 42;
x_init = true;

Thread 2:
while (!x_init);
assert(x == 42);

this can fail (and in fact is incorrect) even if x_init is atomic.

Of course it is incorrect. Here both `x' and `x_init' are shared
variables but you miss to declare them as such. Furthermore, you are
trying to access them `in sequential manner' i.e. as if they were
variables in a sequential program. However, then, why are you
wondering that an incorrect concurrent program can fail?

This is correct in Java and the C++0x working paper if x_init is
volatile(Java)/atomic(C++). *The variable x cannot be simultaneously
accessed, hence there is no data race on x. *And x_init is effectively
declared shared.

First of all, can you please decide which statement of yours is what
you really hold:

1) `this can fail (and in fact is incorrect) even if x_init is
atomic.'

2) `This is correct in Java and the C++0x working paper if x_init is
volatile(Java)/atomic(C++).'

Both statements are made by you and both refer to the same code
fragment. The only difference is that the first statement is made by
you for your original code fragment and the second one is made in
response to my comments.

What do you really think about it? Is it correct or incorrect?

Both are correct, it is just you who failed to quote all the relevant
parts of the original message. If you had you would see that the 'this
can fail...' part referred to that (kind of) code using Ada's memory
model, and the 'This is correct...' part is for the same code using
Java's and C++0x's memory model.

Thank you for your effort. So you think both are correct. It is
correct and incorrect at the same time. Brilliant solution. Well done.

However, I think he can answer the question which was put to him.

If you are so ambitious, can you comment this "correct C++0x code"?

Thread 1:
x = 42;
x_init = true;

Thread 2:
while (!x_init);
// <--- x_init may become false right here
// delay(1 day); <--- e.g. x=33
assert(x == 42);

You are in an easy position if it is both correct and incorrect for
you.

Best Regards,
Szabolcs

Jun 27 '08 #84

Lionel B

On Fri, 02 May 2008 06:14:18 -0700, Szabolcs Ferenczi wrote:

On May 2, 3:03Â*pm, Erik WikstrÃ¶m <Erik-wikst...@telia.comwrote:
>On 2008-05-02 13:35, Szabolcs Ferenczi wrote:

On May 2, 2:35 am, "Boehm, Hans" <hans.bo...@hp.comwrote:
On May 1, 2:26 pm, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:On Apr 28, 8:42 pm, "Boehm, Hans" <hans.bo...@hp.comwrote:

On Apr 21, 10:07 am, Szabolcs Ferenczi
<szabolcs.feren...@gmail.comwrote:

[...]
In
particular, it I write the equivalent of

Thread 1:
x = 42;
x_init = true;

Thread 2:
while (!x_init);
assert(x == 42);

this can fail (and in fact is incorrect) even if x_init is
atomic.

Of course it is incorrect. Here both `x' and `x_init' are shared
variables but you miss to declare them as such. Furthermore, you
are trying to access them `in sequential manner' i.e. as if they
were variables in a sequential program. However, then, why are you
wondering that an incorrect concurrent program can fail?

>This is correct in Java and the C++0x working paper if x_init is
volatile(Java)/atomic(C++). Â*The variable x cannot be simultaneously
accessed, hence there is no data race on x. Â*And x_init is
effectively declared shared.

First of all, can you please decide which statement of yours is what
you really hold:

1) `this can fail (and in fact is incorrect) even if x_init is
atomic.'

2) `This is correct in Java and the C++0x working paper if x_init is
volatile(Java)/atomic(C++).'

Both statements are made by you and both refer to the same code
fragment. The only difference is that the first statement is made by
you for your original code fragment and the second one is made in
response to my comments.

What do you really think about it? Is it correct or incorrect?

Both are correct, it is just you who failed to quote all the relevant
parts of the original message. If you had you would see that the 'this
can fail...' part referred to that (kind of) code using Ada's memory
model, and the 'This is correct...' part is for the same code using
Java's and C++0x's memory model.

Thank you for your effort. So you think both are correct. It is correct
and incorrect at the same time.

No. Read Erik WikstrÃ¶m's answer again, then go back and read the context
you snipped.

--
Lionel B

Jun 27 '08 #85

On May 2, 3:51*pm, Lionel B <m...@privacy.netwrote:

On Fri, 02 May 2008 06:14:18 -0700, Szabolcs Ferenczi wrote:
On May 2, 3:03*pm, Erik Wikström <Erik-wikst...@telia.comwrote:
On 2008-05-02 13:35, Szabolcs Ferenczi wrote:

On May 2, 2:35 am, "Boehm, Hans" <hans.bo...@hp.comwrote:
On May 1, 2:26 pm, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:On Apr 28, 8:42 pm, "Boehm, Hans" <hans.bo...@hp.comwrote:

On Apr 21, 10:07 am, Szabolcs Ferenczi
<szabolcs.feren...@gmail.comwrote:

[...]
In
particular, it I write the equivalent of

Thread 1:
x = 42;
x_init = true;

Thread 2:
while (!x_init);
assert(x == 42);

this can fail (and in fact is incorrect) even if x_init is
atomic.

Of course it is incorrect. Here both `x' and `x_init' are shared
variables but you miss to declare them as such. Furthermore, you
are trying to access them `in sequential manner' i.e. as if they
were variables in a sequential program. However, then, why are you
wondering that an incorrect concurrent program can fail?

This is correct in Java and the C++0x working paper if x_init is
volatile(Java)/atomic(C++). *The variable x cannot be simultaneously
accessed, hence there is no data race on x. *And x_init is
effectively declared shared.

First of all, can you please decide which statement of yours is what
you really hold:

1) `this can fail (and in fact is incorrect) even if x_init is
atomic.'

2) `This is correct in Java and the C++0x working paper if x_init is
volatile(Java)/atomic(C++).'

Both statements are made by you and both refer to the same code
fragment. The only difference is that the first statement is made by
you for your original code fragment and the second one is made in
response to my comments.

What do you really think about it? Is it correct or incorrect?

Both are correct, it is just you who failed to quote all the relevant
parts of the original message. If you had you would see that the 'this
can fail...' part referred to that (kind of) code using Ada's memory
model, and the 'This is correct...' part is for the same code using
Java's and C++0x's memory model.

Thank you for your effort. So you think both are correct. It is correct
and incorrect at the same time.

No. Read Erik Wikström's answer again, then go back and read the context
you snipped.

Thank you, Lionel. You have been most helpful. Can you in the meantime
look at the part you snipped:

If you are so ambitious, can you comment this "correct C++0x code"?

Thread 1:
x = 42;
x_init = true;

Thread 2:
while (!x_init);
// <--- x_init may become false right here
// delay(1 day); <--- e.g. x=33
assert(x == 42);

Thanks a lot.

Best Regards,
Szabolcs

Jun 27 '08 #86

Lionel B

On Fri, 02 May 2008 07:02:46 -0700, Szabolcs Ferenczi wrote:

[snip]

Thank you, Lionel. You have been most helpful. Can you in the meantime
look at the part you snipped:

Sure... uh-oh, where'd it go?

If you are so ambitious, can you comment this "correct C++0x code"?

Ambitious? Me?

[snip]

--
Lionel B

Jun 27 '08 #87

Ben Bacarisse

Szabolcs Ferenczi <sz***************@gmail.comwrites:

If you are so ambitious, can you comment this "correct C++0x code"?

OK, I know nothing about C++0x but it seems clear from what I've been
reading here that there has been a basic misunderstanding about the
purpose of Hans Boehm's example. It was posted as an example of how
atomic variables do not solve the problem (following your request for
an such an example). It was never intended to be correct, but the
point it illustrates is not the one you've taken from it.

Thread 1:
x = 42;
x_init = true;

Thread 2:
while (!x_init);
// <--- x_init may become false right here
// delay(1 day); <--- e.g. x=33
assert(x == 42);

The point was to show that the correctness (if it is to be correct)
relies on more than the atomicity of x_init. To make the example show
what was intended we need thread 1 to exit and to assert that no other
threads are involved (so nothing else can affect x_init or x). Hence
I think the point was that, without extra guarantees:

Syntax to declare shared atomic int x_init = false;

Thread 1:
x = 42;
x_init = true;
exit_thread();

Thread 2:
while (!x_init);
assert(x == 42);

is *still* wrong since the assignment to x may be delayed, either by
the compiler or the hardware. Languages that need the above code to
work, must restrict the compiler and use a relatively harsh memory
regime to ensure that the above does what is expected.

The claim is that C++0x will take a new route. In
<95**********************************@z24g2000prf. googlegroups.com>
Hans Boehm says:

| Consider
|
| Thread 1:
| x = 42;
| lock(l);
exit_thread(); /* Added for clarity */

| Thread 2:
| while (trylock(l) == SUCCESS) unlock(l);
| r1 = x;

| Is this allowable? Does it guarantee r1 == 42? The answer can have
| substantial effect on the cost of the lock() implementation. C++0x
| resolves it in a new and interesting way.

Again, we must assume that thread one exits (or at least does not
touch x or the lock again) and that no other threads are involved.
I don't know how C++0x resolves this, but the suggestion is that is
does so more cheaply than the obvious one.

Anyway, neither example had anything to do x changing again. It would
have been clearer if this has been stated, but it is not reasonable to
assume that your correspondent is missing something as obvious as your
counter example.

--
Ben.

Jun 27 '08 #88

On 2 ÍÁÊ, 01:08, "Chris Thomasson" <cris...@comcast.netwrote:

So you can't implement VZOOM object lifetime management in
autodetection mode, you need compiler_acquire_barrier and
compiler_release_barrier.

No way could I do highly platform dependant auto-detection with C++0x. You
can get it to work with signals, but that's a little crazy:

I am saying not about auto-detection logic itself, but about
vz_acquire()/vz_release() functions. I think they look something like
this:

void vz_acquire(void* p)
{
per_thread_rc_array[hash(p)] += 1;
compiler_acquire_barrier(); // <--------------
}

void vz_release(void* p)
{
compiler_release_barrier(); // <--------------
per_thread_rc_array[hash(p)] -= 1;
}

The question: will you have to manually implement and port to every
compiler compiler_acquire_barrier()/compiler_release_barrier()?

Dmitriy V'jukov

Jun 27 '08 #89

"Dmitriy V'jukov" <dv*****@gmail.comwrote in message
news:f4**********************************@25g2000h sx.googlegroups.com...
On 2 ÍÁÊ, 01:08, "Chris Thomasson" <cris...@comcast.netwrote:

So you can't implement VZOOM object lifetime management in
autodetection mode, you need compiler_acquire_barrier and
compiler_release_barrier.
No way could I do highly platform dependant auto-detection with C++0x.
You
can get it to work with signals, but that's a little crazy:

I am saying not about auto-detection logic itself, but about
vz_acquire()/vz_release() functions. I think they look something like
this:

void vz_acquire(void* p)
{
per_thread_rc_array[hash(p)] += 1;
compiler_acquire_barrier(); // <--------------
}

void vz_release(void* p)
{
compiler_release_barrier(); // <--------------
per_thread_rc_array[hash(p)] -= 1;
}

The question: will you have to manually implement and port to every
compiler compiler_acquire_barrier()/compiler_release_barrier()?

The implementation of the function which mutates the array is externally
compiled:

http://groups.google.com/group/comp....0b291ee41a7fb5

and I document that link-time optimization level should be turned down, or
off... Oh well.

Jun 27 '08 #90

On May 2, 7:02 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

Thread 1:
x = 42;
x_init = true;

Thread 2:
while (!x_init);
// <--- x_init may become false right here
// delay(1 day); <--- e.g. x=33
assert(x == 42);

I assumed a convention here that I should have been clearer about. In
particular, no other threads execute code relevant to this, and this
is the entire code executed in these two threads. That's admittedly a
simplification, but a convenient one that's generally used in
presenting such examples. A more realistic setting whould involve
multiple threads that behave like thread 2, but only a single thread
that sets x_init, and it is set only once and never reset. Very
similar cases occur with double-checked locking, or when passing
"ownership" to an object between threads through some sort of queue.
In the latter case, the queue is implemented using something like
critical regions, but the object is accessed outside of the critical
region, because it's only accessed by one thread at a time.

Hans

Jun 27 '08 #91

On 2 ÍÁÊ, 04:56, "Boehm, Hans" <hans.bo...@hp.comwrote:

There are no compiler ordering barriers in C++Ox.

Thats what I thought.

Currently not. There is still some low key discussion about that. I
don't think that affects threads use, though. It does affect code
using asynchronous signals.

There are synchronization algorithms where compiler ordering affects
exactly threads.

Asymmetric reader-writer mutex:
http://groups.google.ru/group/comp.p...6c19698964d1f6

SMR+RCU:
http://sourceforge.net/project/showf...roup_id=127837
(fastsmr package)

Those algorithms are very... exotic. They eliminate *all* hardware
fences from fast-path. But w/o correct compiler ordering it's
impossible.

Also you can't implement SMR+RCU, asymmetric reader-writer mutex,
asymmetric eventcount etc. Basically any algorithm where you want to
eliminate all hardware barriers from fast-path.

At least parts of this are under active discussion. See N2556. It
turns out that this extremely tricky to do at the source language
level and actually get guaranteed correctness. RCU usually relies on
the hardware enforcing dependency-based memory ordering. But
compilers like to break dependencies because to shortens critical
paths. And not all hardware agrees on what constitutes a dependency
anyway. I think we finally know a reasonable way to do this, though.

I am saying not about data-dependency (std::memory_order_consume).
Consider 'classical' SMR implementation:

// pseudo-code
void* smr_acquire_reference
(void** shared_object, void** hazard_pointer)
{
for (;;)
{
void* object = *shared_object;
*hazard_pointer = object;
hardware_store_load_fence();
void* object2 = *shared_object;
if (object == object2)
{
hardware_acquire_fence();
return object;
}
}
}

void smr_release_reference
(void** hazard_pointer)
{
hardware_release_fence();
*hazard_pointer = 0;
}

In SMR+RCU they looks like this:

// pseudo-code
void* smr_rcu_acquire_reference
(void** shared_object, void** hazard_pointer)
{
for (;;)
{
void* object = *shared_object;
*hazard_pointer = object;
compiler_store_load_fence();
void* object2 = *shared_object;
if (object == object2)
{
compiler_acquire_fence();
return object;
}
}
}

void smr_release_reference
(void** hazard_pointer)
{
compiler_release_fence();
*hazard_pointer = 0;
}

All hardware fences are eliminated. But instead one need:
compiler_store_load_fence()
compiler_acquire_fence()
compiler_release_fence()

As far as I understand, with current C++0x one have to revert assembly/
compiler specific things, and port this manually to every platform.

In gcc it's "__asm__ __volatile__ ("" : : :"memory")"
In msvc it's _ReadWriteBarrier().

It will be great if one will be able to write:

std::atomic_int x;
int l = x.load(std::memory_order_relaxed_but_compiler_acqu ire);
x.store(0, std::memory_order_relaxed_but_compiler_release);

There are no specific barriers in C++Ox. Like 'load-release-wrt-loads'
which you need to implement sequence mutex (SeqLock). In C++0x you can
use 'load-release' barrier as replacement for 'load-release-wrt-
loads'. But 'load-release' includes #StoreLoad. 'load-release-wrt-
loads' doesn't inlcude #StoreLoad.

load-release actually doesn't make much sense ...

I think that Sequence Lock (SeqLock):
http://en.wikipedia.org/wiki/Seqlock
must be implemented this way:

bool sequence_lock_rdunlock(seqlock* lock, int prev_seq)
{
int seq = lock->seq.load(std::memory_order_load_release_wrt_loads );
return seq == prev_seq;
}

On x86 memory_order_load_release_wrt_loads is no-op (plain load). But
if I use 'next stronger ordering constraint', i.e.
memory_model_acq_rel, then it will be locked rmw operation or mfence
instruction. I think in most C++ implementations it will be locked rmw
operation. And locked rmw operation means ownership of cache-line.
This basically kills the whole idea of SeqLock...

We currently don't generally have ordering forms that apply to only
loads or only stores. See N2176 for a rationale.

Yes, there are things about which most developers (including me) don't
even aware :)
But I think (hope) that my example with sequence_lock_rdunlock() is
still correct, because SeqLock allows only 'purely read-only'
transactions on reader side.

I don't argue that it's easy stuff. It's extremely hard stuff. I don't
even hope that I get all right. I just want to figure out whether one
can forget about assembly and manual porting at all, or one still will
have to revert to assembly and manual porting for most-advanced
things.
Dmitriy V'jukov

Jun 27 '08 #92

"Dmitriy V'jukov" <dv*****@gmail.comwrote in message
news:d9**********************************@34g2000h sf.googlegroups.com...
[...]

There are synchronization algorithms where compiler ordering affects
exactly threads.

[...]

I don't argue that it's easy stuff. It's extremely hard stuff. I don't
even hope that I get all right. I just want to figure out whether one
can forget about assembly and manual porting at all, or one still will
have to revert to assembly and manual porting for most-advanced
things.

I think your still going to have to use assembly and manual porting for
efficient implementations of algorithms like SMR+RCU. Unless C++ provides a
'rcu_synchronize()' type function, well, I am not sure how you can get
passive sync-epoch detection. You could use a thread bound to each processor
and a single polling thread sends a single message to each one, and waits
for a response. One all responses are in, a synchronization epoch involving
all the CPU's involved with the message broadcast. That is, the CPU has
executing something analogous to a store/load style memory barrier. This has
been proposed before:

http://groups.google.com/group/comp....8a80ea30b2e849

Oh well.BTW, I really do like the idea of providing fine-grain compiler
barriers...

Jun 27 '08 #93

On May 2, 8:01*pm, Ben Bacarisse <ben.use...@bsb.me.ukwrote:

Szabolcs Ferenczi <szabolcs.feren...@gmail.comwrites:
If you are so ambitious, can you comment this "correct C++0x code"?

OK, I know nothing about C++0x but it seems clear from what I've been
reading here that there has been a basic misunderstanding about the
purpose of Hans Boehm's example. *It was posted as an example of how
atomic variables do not solve the problem (following your request for
an such an example).

I think it will be the best if Hans Boehm himself corrects you but I
never requested such an example from him nor from anyone else.

I would never request any example for something that is clear in
concurrent programming from 1965 on, namely that if you have variables
with atomic access (atomic read and atomic write) you cannot derive a
general synchronisation between N processes for critical sections. See
E.W. Dijkstra, Cooperating sequential processes
http://www.cs.utexas.edu/users/EWD/t...xx/EWD123.html

Besides, what I requested was that someone could show solutions in C+
+0x for some canonical problems in concurrent programming:

http://groups.google.com/group/comp....07b37a3b0323f3

That, however, dispite the lot of talk from the wise guys, did not
happen so far that anyone in this discussion list could publish any
example in the C++0x notation for any of the canonical concurrent
problems.

Best Regards,
Szabolcs

Jun 27 '08 #94

Ben Bacarisse

Szabolcs Ferenczi <sz***************@gmail.comwrites:

On May 2, 8:01Â*pm, Ben Bacarisse <ben.use...@bsb.me.ukwrote:
>Szabolcs Ferenczi <szabolcs.feren...@gmail.comwrites:
If you are so ambitious, can you comment this "correct C++0x code"?

OK, I know nothing about C++0x but it seems clear from what I've been
reading here that there has been a basic misunderstanding about the
purpose of Hans Boehm's example. Â*It was posted as an example of how
atomic variables do not solve the problem (following your request for
an such an example).

I think it will be the best if Hans Boehm himself corrects you but I
never requested such an example from him nor from anyone else.

I see he has elsewhere in this thread.

I would never request any example for something that is clear in
concurrent programming from 1965 on,

Obviously, and I never said you did. You did ask for *an* example.

When it was given you could either choose to assume the author had
made a basic mistake that one would be embarrassed to make as a
student, or you could assume that is was illustrating a more subtle
point. It is possible to take the code fragments put then in a
context in which they make sense -- you just need to assume that
nothing else happens. You chose to suggest that a beginner's mistake
had been made. I don't think that helped move this interesting
discussion forwards.

--
Ben.

Jun 27 '08 #95

Ian Collins

Ben Bacarisse wrote:

>
When it was given you could either choose to assume the author had
made a basic mistake that one would be embarrassed to make as a
student, or you could assume that is was illustrating a more subtle
point. It is possible to take the code fragments put then in a
context in which they make sense -- you just need to assume that
nothing else happens. You chose to suggest that a beginner's mistake
had been made. I don't think that helped move this interesting
discussion forwards.

He created a long drawn out thread on c.p.threads by being equally rude
an condescending to everyone foolish enough to join it.

--
Ian Collins.

Jun 27 '08 #96

"Ian Collins" <ia******@hotmail.comwrote in message
news:68**************@mid.individual.net...

Ben Bacarisse wrote:
>>
When it was given you could either choose to assume the author had
made a basic mistake that one would be embarrassed to make as a
student, or you could assume that is was illustrating a more subtle
point. It is possible to take the code fragments put then in a
context in which they make sense -- you just need to assume that
nothing else happens. You chose to suggest that a beginner's mistake
had been made. I don't think that helped move this interesting
discussion forwards.

He created a long drawn out thread on c.p.threads by being equally rude
an condescending to everyone foolish enough to join it.

Humm... :^(

Jun 27 '08 #97

On 3 Ð¼Ð°Ð¹, 00:06, "Chris Thomasson" <cris...@comcast.netwrote:

"Dmitriy V'jukov" <dvyu...@gmail.comwrote in message

news:f4**********************************@25g2000h sx.googlegroups.com...
On 2 ÃÃÃŠ, 01:08, "Chris Thomasson" <cris...@comcast.netwrote:

So you can't implement VZOOM object lifetime management in
autodetection mode, you need compiler_acquire_barrier and
compiler_release_barrier.

No way could I do highly platform dependant auto-detection with C++0x.
You
can get it to work with signals, but that's a little crazy:
I am saying not about auto-detection logic itself, but about
vz_acquire()/vz_release() functions. I think they look something like
this:
void vz_acquire(void* p)
{
per_thread_rc_array[hash(p)] += 1;
compiler_acquire_barrier(); // <--------------
}
void vz_release(void* p)
{
compiler_release_barrier(); // <--------------
per_thread_rc_array[hash(p)] -= 1;
}
The question: will you have to manually implement and port to every
compiler compiler_acquire_barrier()/compiler_release_barrier()?

The implementation of the function which mutates the array is externally
compiled:

http://groups.google.com/group/comp....hread/1d0b291e...

and I document that link-time optimization level should be turned down, or
off... Oh well.

Link-time optimization can increase performance by 10-20%. And it's on
by default in release build of MSVC...

Btw, Joe Seigh in atomic-ptr uses following:
#define fence() __asm__ __volatile__ ("" : : : "memory")
#define smrnull(hptr) \
do { \
fence(); \
atomic_store(&hptr[0], 0); \
atomic_store(&hptr[1], 0); \
} while (0)
What do you think?

Dmitriy V'jukov

Jun 27 '08 #98

Dmitriy V'jukov schrieb:

Link-time optimization can increase performance by 10-20%.

I am curious if advanced linkers will pay attention to object code reordering
constraints because correctness for multi-threaded execution might be affected.
How much tools fiddle with memory barriers?

Regards,
Markus

Jun 27 '08 #99