How to make this program more efficient?

Bill David

SUBJECT: How to make this program more efficient?

In my program, a thread will check update from server periodically and
generate a stl::map for other part of this program to read data from.
Let's name the update method as doUpdate and stl::map read methods as
getData and copyData.
Since stl::map is not thread-safe, we should do synchronization by
ourselves. A usable solution is to create a boost::mutex::scoped_lock
object in all above methods to make the access to all methods
synchronized. But since the doUpdate method will be executed
periodically and the interval may be 1 hour or longer, while all other
parts can only read the stl::map, the above solution may do
synchronization too much. We only need synchronization on all methods
when we are doUpdating.
Then is it possible to make this program more efficient?
I have thought out some other solution like add some updating flag.
But it's not safe since stl::map may be updated when the following
part of read methods is executing. And the operation on the flag may
also be interrupted by other threads.

Sep 13 '08

Subscribe Post Reply

3596

Jon Harrop

James Kanze wrote:

On Sep 21, 2:19 pm, "Chris M. Thomasson" <n...@spam.invalidwrote:

If there's any thread modifying the object, then all accesses
must be synchronized. Period.

>Did you know that readers threads can access data-structures
without sync?

Not according to Posix, and not on many machines. Reading
without a sync will probably give you some value, but not
necessarily the last written.

You are assuming that it is necessary to get the most recent value when, in
fact, it is not.

>Sure. For read-mostly access patterns, WHY use a rwmutex? Did
you know that rwmutex can be totally suboptimal even in the
presence of concurrent mutators?

And? A rwlock, even when acquired for just reading, ensures
memory synchronization.

Which can be unnecessary and slow.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 21 '08 #51

peter koch

On 21 Sep., 22:11, Jon Harrop <j...@ffconsultancy.comwrote:

James Kanze wrote:
On Sep 21, 2:19 pm, "Chris M. Thomasson" <n...@spam.invalidwrote:
If there's any thread modifying the object, then all accesses
must be synchronized. *Period.

Did you know that readers threads can access data-structures
without sync?

Not according to Posix, and not on many machines. *Reading
without a sync will probably give you some value, but not
necessarily the last written.

You are assuming that it is necessary to get the most recent value when, in
fact, it is not.

So you are describing a situation, where a not recent computation (or
no computation at all) is satisfactory. Also you are describing a
situation where the result must be quite small - be representable by
one word. If not, one could imagine some part of the result
represented a recent computation whereas another part represented an
old one.

I imagine there might be some scenarios where this might be okay, but
it must be a very minor niche that fullfills these properties.

/Peter

Sep 21 '08 #52

Jon Harrop

peter koch wrote:

On 21 Sep., 22:11, Jon Harrop <j...@ffconsultancy.comwrote:
>James Kanze wrote:
On Sep 21, 2:19 pm, "Chris M. Thomasson" <n...@spam.invalidwrote:
If there's any thread modifying the object, then all accesses
must be synchronized. Â*Period.

>Did you know that readers threads can access data-structures
without sync?

Not according to Posix, and not on many machines. Â*Reading
without a sync will probably give you some value, but not
necessarily the last written.

You are assuming that it is necessary to get the most recent value when,
in fact, it is not.

So you are describing a situation, where a not recent computation (or
no computation at all) is satisfactory.

Yes.

Also you are describing a
situation where the result must be quite small - be representable by
one word.

No. The one word can be a pointer to an arbitrarily-large data structure.
The pointer is mutable and the data structure is typically persistent and
immutable.

I imagine there might be some scenarios where this might be okay, but
it must be a very minor niche that fullfills these properties.

On the contrary, there are a huge number of applications ranging from
high-performance numerics in scientific computing to ordinary databases.

For example, if I post a blog comment just as someone downloads the blog
article it makes no practical difference whether or not they see the latest
comment. The only thing that matters is that nobody gets corrupted data
(e.g. half of the old pointer and half of the new pointer) which requires
atomic word-size write but not locks and that updates propagate
sufficiently quickly in practice.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 21 '08 #53

Jerry Coffin

In article <580cf4b1-0903-43a7-beaf-6ce74640f671
@a70g2000hsh.googlegroups.com>, ja*********@gmail.com says...

On Sep 21, 5:57 am, Jerry Coffin <jcof...@taeus.comwrote:
In article <8d804b66-88fa-4b59-b3be-1d9e9cd89e88
@m44g2000hsc.googlegroups.com>, james.ka...@gmail.com says...

[...]
Interestingly, the layout I'm talking about will generally work
correctly even on an architecture with virtually _no_ support for atomic
operations -- modifying one bit in a word will normally be atomic
regardless, simply because it's essentially impossible to spread a write
fo a single bit over more than one clock cycle.

Writing a single bit is NOT atomic on any of the machines I'm
familiar with. Because it is essentially impossible to write a
single bit, so writing a single bit is implemented as
read/modify/write.

Sorry, poor wording on my part. If you tried to only change that bit, it
wouldn't be atomic -- but here you're writing an entire word, of which
only one bit ever changes. Even if you were on a machine where writing
the word wasn't atomic (e.g. you used a 64-bit word on a 32-bit machine)
only the one bit ever changes, so the write to that minimum storage unit
ends up atomic -- e.g. in the case above, of a 64-bit word on a 32-bit
machine that could only update 32 bits at a time, you'd end up writing
the word in two pieces -- but one of those pieces never changes the
value, the only part that matters is the change to part that actually
changes -- and since that's only a single bit, it'll always be contained
in a single item of the size the processor can write atomically.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 22 '08 #54

peter koch

On 22 Sep., 01:51, Jon Harrop <j...@ffconsultancy.comwrote:

peter koch wrote:
On 21 Sep., 22:11, Jon Harrop <j...@ffconsultancy.comwrote:
James Kanze wrote:
On Sep 21, 2:19 pm, "Chris M. Thomasson" <n...@spam.invalidwrote:
If there's any thread modifying the object, then all accesses
must be synchronized. *Period.

Did you know that readers threads can access data-structures
without sync?

Not according to Posix, and not on many machines. *Reading
without a sync will probably give you some value, but not
necessarily the last written.

You are assuming that it is necessary to get the most recent value when,
in fact, it is not.

So you are describing a situation, where a not recent computation (or
no computation at all) is satisfactory.

Yes.

Also you are describing a
situation where the result must be quite small - be representable by
one word.

No. The one word can be a pointer to an arbitrarily-large data structure.
The pointer is mutable and the data structure is typically persistent and
immutable.

I imagine there might be some scenarios where this might be okay, but
it must be a very minor niche that fullfills these properties.

On the contrary, there are a huge number of applications ranging from
high-performance numerics in scientific computing to ordinary databases.

For example, if I post a blog comment just as someone downloads the blog
article it makes no practical difference whether or not they see the latest
comment. The only thing that matters is that nobody gets corrupted data
(e.g. half of the old pointer and half of the new pointer) which requires
atomic word-size write but not locks and that updates propagate
sufficiently quickly in practice.

But I do not see how you avoid the problem that half of the result has
been propagated to the other thread together with the new pointer
value, but the other half of the result has not?

struct result
{
int value;
int result;
};

result result_vec[2] = { result(1,1), result(0,0) };

// global
result* pres = &result_vec[0];

// thread 1:
result temp(2,func(2));
result[1] = temp;
pres = result + 1;

//thread 2:
result* local = pres;
std::cout << "Result: " << *local;

What prevents thread 2 to output e.g. a result(2,0)?

/Peter

Sep 22 '08 #55

Jon Harrop

peter koch wrote:

On 22 Sep., 01:51, Jon Harrop <j...@ffconsultancy.comwrote:
The pointer is mutable and the data structure is typically persistent
and immutable.

But I do not see how you avoid the problem that half of the result has
been propagated to the other thread together with the new pointer
value, but the other half of the result has not?

struct result
{
int value;
int result;
};

result result_vec[2] = { result(1,1), result(0,0) };

// global
result* pres = &result_vec[0];

// thread 1:
result temp(2,func(2));
result[1] = temp;
pres = result + 1;

//thread 2:
result* local = pres;
std::cout << "Result: " << *local;

What prevents thread 2 to output e.g. a result(2,0)?

You are representing the value as an unboxed mutable struct. You need to
represent it as a mutable pointer to an immutable data structure:

struct result { const int value, result; };

To update, you write a single "result *".

Immutable data structures can be shared between threads safely. However,
they cannot be implemented efficiently without a performant run-time
(allocator and GC) so this is not feasible in C++.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 22 '08 #56

peter koch

On 23 Sep., 00:22, Jon Harrop <j...@ffconsultancy.comwrote:

peter koch wrote:
On 22 Sep., 01:51, Jon Harrop <j...@ffconsultancy.comwrote:
The pointer is mutable and the data structure is typically persistent
and immutable.

But I do not see how you avoid the problem that half of the result has
been propagated to the other thread together with the new pointer
value, but the other half of the result has not?

struct result
{
* *int value;
* *int result;
};

result result_vec[2] = { result(1,1), result(0,0) };

// global
result* pres = &result_vec[0];

// *thread 1:
result temp(2,func(2));
result[1] = temp;
pres = result + 1;

//thread 2:
result* local = pres;
std::cout << "Result: " << *local;

What prevents thread 2 to output e.g. a result(2,0)?

You are representing the value as an unboxed mutable struct. You need to
represent it as a mutable pointer to an immutable data structure:

* struct result { const int value, result; };

To update, you write a single "result *".

Immutable data structures can be shared between threads safely. However,
they cannot be implemented efficiently without a performant run-time
(allocator and GC) so this is not feasible in C++.

I still don't get it. Are you telling me that we if we changed to your
structure definition and my thread 1 code to
// thread 1:
result temp(2,func(2));
pres = &temp;

(fullfilling both of your requirements), then thread 2 will always
have a consistent result - without any kind of synchronisation?
I say you will not and that you need a memory barrier writing the
pointer (and nothing more).

/Peter

Sep 22 '08 #57

Jon Harrop

peter koch wrote:

I still don't get it. Are you telling me that we if we changed to your
structure definition and my thread 1 code to

// thread 1:
result temp(2,func(2));
pres = &temp;

(fullfilling both of your requirements), then thread 2 will always
have a consistent result - without any kind of synchronisation?

No, I'm saying this is thread safe if pointer reads and writes are atomic
but it is non-deterministic.

I say you will not and that you need a memory barrier writing the
pointer (and nothing more).

Assuming you mean "deterministic" when you say "consistent", I think you are
wrong. Even if both the reader and writer lock the pointer (which is
pointless if the operations are already atomic) there is still a race
condition. Specifically, the reader (thread 2) might read either the old or
new value.

For example, the following program can either lock or not lock the reads and
writes as a counter is incremented. As pointer reads and writes are atomic
on this architecture, introducing locks only serves to make the program run
several times more slowly (on this 2x4 core AMD64 machine in x86 mode):

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

struct pair { long long i, j; };

const int readers = 7;
const int max = 1000000;
struct pair * volatile global;
pthread_mutex_t mutex;

void set(int n, int m)
{
struct pair * volatile t = (struct pair *)malloc(sizeof(struct pair));
t->i = n;
t->j = m;
/* Atomic pointer write. */
#ifdef LOCKING
pthread_mutex_lock(&mutex);
#endif
global = t;
#ifdef LOCKING
pthread_mutex_unlock(&mutex);
#endif
}

void *PrintHello(void *threadid)
{
while (0 == 0)
{
struct pair * volatile t;
#ifdef LOCKING
pthread_mutex_lock(&mutex);
#endif
t = global; /* Atomic pointer read. */
#ifdef LOCKING
pthread_mutex_unlock(&mutex);
#endif
int n = t->i, m = t->j;
if (n != m)
printf("Thread %d found error: %d != %d.\n", (int)threadid, n, m);
else
if (n == max)
{
printf("Thread %d exiting.\n", (int)threadid);
pthread_exit(NULL);
}
}
}

int main()
{
pthread_t threads[readers];
int rc, t;
void *status;

pthread_mutex_init(&mutex, NULL);
set(0, 0);

for(t=0; t<readers; t++) {
printf("In main: creating thread %d\n", t);
rc = pthread_create(&threads[t], NULL, PrintHello, (void *)t);
if (rc){
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(-1);
}
}

for (t=1; t<=max; ++t)
set(t, t);

pthread_mutex_destroy(&mutex);

pthread_exit(NULL);
}

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 23 '08 #58

courpron

On 23 sep, 03:26, Jon Harrop <j...@ffconsultancy.comwrote:

[...]
I say you will not and that you need a memory barrier writing the
pointer (and nothing more).

Assuming you mean "deterministic" when you say "consistent", I think you are
wrong. Even if both the reader and writer lock the pointer (which is
pointless if the operations are already atomic) there is still a race
condition. Specifically, the reader (thread 2) might read either the old or
new value.

You still make a confusion between locks and memory barriers.

You have to take care about the memory ordering problem, at least on
SMP architectures. Well, even on a monoprocessor architecture,
instruction can be reordered statically, by the compiler. You use
memory barriers to deal with these problems. In the case of static
reordering, the compiler will see the memory barriers and take them
into account (if it doesn't then I don't consider this compiler to
handle multithreading ).

[...]
void set(int n, int m)
{
* struct pair * volatile t = (struct pair *)malloc(sizeof(struct pair));
* t->i = n;
* t->j = m;
* /* Atomic pointer write. */
#ifdef LOCKING
* *pthread_mutex_lock(&mutex);
#endif
* global = t;
#ifdef LOCKING
* *pthread_mutex_unlock(&mutex);
#endif

}
[...]

In your set function, without the locks, the intructions can be
reordered either at runtime on a SMP architecture or by the compiler.
Let's say we are on a monoprocessor, and the compiler reorders the
instructions. Data dependence analysis won't prevent reordering like
this :

t->i = n;
global = t;
// what happens if a reader uses global here ?
t->j = m;

Here the unfortunate reader will see a correct value for i but an
incorrect one for j. To prevent things like this, you need memory
barriers (and not locks which are overkill for this case).
Alexandre Courpron.

Sep 23 '08 #59

courpron

On 23 sep, 18:36, courp...@gmail.com wrote:

[...]
t->i = n;
global = t;
// what happens if a reader uses global here ?
t->j = m;

Here the unfortunate reader will see a correct value for i but an
incorrect one for j.

Note : I should have said that, in this case, j is uninitialized, not
that j gets an incorrect value.

Alexandre Courpron.

Sep 23 '08 #60

Jon Harrop

co******@gmail.com wrote:

On 23 sep, 03:26, Jon Harrop <j...@ffconsultancy.comwrote:
>[...]
I say you will not and that you need a memory barrier writing the
pointer (and nothing more).

Assuming you mean "deterministic" when you say "consistent", I think you
are wrong. Even if both the reader and writer lock the pointer (which is
pointless if the operations are already atomic) there is still a race
condition. Specifically, the reader (thread 2) might read either the old
or new value.

You still make a confusion between locks and memory barriers.

Sorry: I read "synchronization" and assumed Peter was referring to locks.

You have to take care about the memory ordering problem, at least on
SMP architectures...

Yes.

In your set function, without the locks, the intructions can be
reordered either at runtime on a SMP architecture or by the compiler.

Does my use of the "volatile" keyword not prohibit that?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 23 '08 #61

courpron

On 23 sep, 21:27, Jon Harrop <j...@ffconsultancy.comwrote:

[...]

In your set function, without the locks, the intructions can be
reordered either at runtime on a SMP architecture or by the compiler.

Does my use of the "volatile" keyword not prohibit that?

Ah yes, you used a volatile pointer.
It doesn't prohibit this problem at least with runtime reordering. In
case of static reordering, the compiler shall not reorder memory
access in the presence of volatile object (says the standard), but
consulting the compiler manual is a necessity since volatile has
sometimes not a C++ standard behavior depending on the compiler (e.g.
visual C++).

In a SMP environnement, volatile has no effect on the reordering
problem. That's the main problem of the volatile keyword. That's why
volatile is often said to be useless in a multithreaded environnement.
That's also why memory barriers are needed in this case.
Alexandre Courpron.

Sep 23 '08 #62

peter koch

On 23 Sep., 21:27, Jon Harrop <j...@ffconsultancy.comwrote:

courp...@gmail.com wrote:
On 23 sep, 03:26, Jon Harrop <j...@ffconsultancy.comwrote:
[...]
I say you will not and that you need a memory barrier writing the
pointer (and nothing more).

Assuming you mean "deterministic" when you say "consistent", I think you
are wrong. Even if both the reader and writer lock the pointer (which is
pointless if the operations are already atomic) there is still a race
condition. Specifically, the reader (thread 2) might read either the old
or new value.

You still make a confusion between locks and memory barriers.

Sorry: I read "synchronization" and assumed Peter was referring to locks.

You have to take care about the memory ordering problem, at least on
SMP architectures...

Yes.

In your set function, without the locks, the intructions can be
reordered either at runtime on a SMP architecture or by the compiler.

Does my use of the "volatile" keyword not prohibit that?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.http://www.ffconsultancy.com/?u

Sep 23 '08 #63

peter koch

On 23 Sep., 21:27, Jon Harrop <j...@ffconsultancy.comwrote:

courp...@gmail.com wrote:
On 23 sep, 03:26, Jon Harrop <j...@ffconsultancy.comwrote:
[...]
I say you will not and that you need a memory barrier writing the
pointer (and nothing more).

Assuming you mean "deterministic" when you say "consistent", I think you
are wrong. Even if both the reader and writer lock the pointer (which is
pointless if the operations are already atomic) there is still a race
condition. Specifically, the reader (thread 2) might read either the old
or new value.

You still make a confusion between locks and memory barriers.

Sorry: I read "synchronization" and assumed Peter was referring to locks.

Well - I can speak for myself and did not mean locks - simply
synchronisation.

>
You have to take care about the memory ordering problem, at least on
SMP architectures...

Yes.

In your set function, without the locks, the intructions can be
reordered either at runtime on a SMP architecture or by the compiler.

Does my use of the "volatile" keyword not prohibit that?

I don't believe so. Extensions might give stricter guarantees (and
Microsoft does so, if I understand correctly).
But volatile is not needed as long as you have a memory barrier on the
pointer. If the pointer is written using "normal" means your code is
still not guaranteed to be visible only after all parts of the
structure has been written.

/Peter

Sep 23 '08 #64

gpderetta

On Sep 23, 9:12 pm, courp...@gmail.com wrote:

On 23 sep, 21:27, Jon Harrop <j...@ffconsultancy.comwrote:

[...]

In your set function, without the locks, the intructions can be
reordered either at runtime on a SMP architecture or by the compiler.

Does my use of the "volatile" keyword not prohibit that?

Ah yes, you used a volatile pointer.
It doesn't prohibit this problem at least with runtime reordering. In
case of static reordering, the compiler shall not reorder memory
access in the presence of volatile object (says the standard), but
consulting the compiler manual is a necessity since volatile has
sometimes not a C++ standard behavior depending on the compiler (e.g.
visual C++).

Also, IIRC the standard prohibits only reordering of accesses to
volatile objects. Reads and writes to non volatile objects can still
be reordered even across a volatile.
So volatile doesn't even inhibit many compiler reordering operations.

--
gpd

Sep 23 '08 #65

courpron

On 23 sep, 21:40, gpderetta <gpdere...@gmail.comwrote:

On Sep 23, 9:12 pm, courp...@gmail.com wrote:

On 23 sep, 21:27, Jon Harrop <j...@ffconsultancy.comwrote:

*[...]

In your set function, without the locks, the intructions can be
reordered either at runtime on a SMP architecture or by the compiler.

Does my use of the "volatile" keyword not prohibit that?

Ah yes, you used a volatile pointer.
It doesn't prohibit this problem at least with runtime reordering. In
case of static reordering, the compiler shall not reorder memory
access in the presence of volatile object (says the standard), but
consulting the compiler manual is a necessity since volatile has
sometimes not a C++ standard behavior depending on the compiler (e.g.
visual C++).

Also, IIRC the standard prohibits only reordering of accesses to
volatile objects. Reads and writes to non volatile objects can still
be reordered even across a volatile.

You're right.

So volatile doesn't even inhibit many compiler reordering operations.

Indeed, unless you declare all concerned variables as volatile, which
would probably lead to a very inefficient code and still doesn't
prevent the runtime reordering problem.
Alexandre Courpron.

Sep 23 '08 #66

Jon Harrop

peter koch wrote:

On 23 Sep., 21:27, Jon Harrop <j...@ffconsultancy.comwrote:
>courp...@gmail.com wrote:
You still make a confusion between locks and memory barriers.

Sorry: I read "synchronization" and assumed Peter was referring to locks.

Well - I can speak for myself and did not mean locks - simply
synchronisation.

Are memory barriers a form of synchronization?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 23 '08 #67

Jon Harrop

co******@gmail.com wrote:

On 23 sep, 21:27, Jon Harrop <j...@ffconsultancy.comwrote:
> [...]
In your set function, without the locks, the intructions can be
reordered either at runtime on a SMP architecture or by the compiler.

Does my use of the "volatile" keyword not prohibit that?

Ah yes, you used a volatile pointer.
It doesn't prohibit this problem at least with runtime reordering...

This raises the question: how can this be written correctly in C or C++?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 23 '08 #68

gpderetta

On Sep 24, 12:57 am, Jon Harrop <j...@ffconsultancy.comwrote:

courp...@gmail.com wrote:
On 23 sep, 21:27, Jon Harrop <j...@ffconsultancy.comwrote:
[...]
In your set function, without the locks, the intructions can be
reordered either at runtime on a SMP architecture or by the compiler.

Does my use of the "volatile" keyword not prohibit that?

Ah yes, you used a volatile pointer.
It doesn't prohibit this problem at least with runtime reordering...

This raises the question: how can this be written correctly in C or C++?

It can't in C99 nor in C++03 which doens't even acknowledge the
existence of threads. POSIX C requires mutual exclusion around all
shared state that could potentially be concurrently written.

Untill C++0x which will have atomics and memory barrier, you have to
rely on unportable compiler/os extensions.

--
gpd

Sep 23 '08 #69

James Kanze

On Sep 23, 9:40 pm, gpderetta <gpdere...@gmail.comwrote:

On Sep 23, 9:12 pm, courp...@gmail.com wrote:

Also, IIRC the standard prohibits only reordering of accesses
to volatile objects. Reads and writes to non volatile objects
can still be reordered even across a volatile.

Not really. Almost all of the actual specification of volatile
is implementation defined, so an implementation pretty much do
what it wants. The expressed intent in C++ is that it should
work as in C, and the expressed intent in C corresponds pretty
much to what you say, but it remains intent, and not a formal
requirement.

In practice, all of the compilers (Sun CC and g++ on Sparc, g++
and VC++ on Intel/AMD) I know do allow reordering, even when
volatile is used. So the keyword is for all intents and
purposes useless (and when ordering is required, you have to use
some system function or resort to assembler).

So volatile doesn't even inhibit many compiler reordering
operations.

Even less of them than one would expect from its expressed
intent.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 24 '08 #70

James Kanze

On Sep 23, 9:28 pm, peter koch <peter.koch.lar...@gmail.comwrote:

[...]

In your set function, without the locks, the intructions
can be reordered either at runtime on a SMP architecture
or by the compiler.

Does my use of the "volatile" keyword not prohibit that?

I don't believe so. Extensions might give stricter guarantees
(and Microsoft does so, if I understand correctly).

Microsoft has proposed such an extension to the meaning of
volatile, and presumable will implement it in future versions of
the compiler, but the version I have available (version 14, from
Visual Studio 8) doesn't implement any additional guarantees for
volatile. In fact, it doesn't even prevent reordering or fusing
of successive accesses to a volatile variable. Which is IMHO
contrary to the intent of the standard (but the actual
normative specification is "implementation defined"), but almost
universal.

But volatile is not needed as long as you have a memory
barrier on the pointer.

Provided the compiler knows this. Memory barriers are necessary
at the hardware level, but you still have to prevent code
movement by the compiler. This can be done in two ways:

-- the memory barriers (except that I think they're called
fences by Intel) are implemented in a system routine (e.g.
the Interlocked... functions under Windows, or most of the
threading functions under either Windows or Unix), and the
compiler "knows" about this (required by Posix), or

-- the memory barriers are implemented in embedded assembler,
and you've taken steps to prevent code movement by the
compiler accross this assembler (or verified that the
compiler doesn't do it).

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 24 '08 #71

James Kanze

On Sep 24, 1:00 am, Jon Harrop <j...@ffconsultancy.comwrote:

peter koch wrote:
On 23 Sep., 21:27, Jon Harrop <j...@ffconsultancy.comwrote:
courp...@gmail.com wrote:
You still make a confusion between locks and memory
barriers.

Sorry: I read "synchronization" and assumed Peter was
referring to locks.

The two are quite different: although Posix (and doubtlessly
Windows as well) guarantees synchronization accross a lock, lock
free algorithms exist, but they still also require
synchronization.

Well - I can speak for myself and did not mean locks -
simply synchronisation.

Are memory barriers a form of synchronization?

On many machines (e.g. Sparc), they're the only form of memory
synchronization. (I think that Intel refers to them as fences.
I think that Intel also offers some additional guarantees, and
that in particular---if I've understood correctly---it
implicitly generates full memory synchronization around an xchg
instruction. I'm more familiar with Sparc: for Sparc, you
should read section 3.2 of the "Sparc Architecture Manual",
http://www.sparc.org/standards/SPARCV9.pdf)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 24 '08 #72

courpron

On 24 sep, 09:53, James Kanze <james.ka...@gmail.comwrote:

On Sep 23, 9:40 pm, gpderetta <gpdere...@gmail.comwrote:

On Sep 23, 9:12 pm, courp...@gmail.com wrote:
Also, IIRC the standard prohibits only reordering of accesses
to volatile objects. Reads and writes to non volatile objects
can still be reordered even across a volatile.

Not really. *Almost all of the actual specification of volatile
is implementation defined, so an implementation pretty much do
what it wants. *The expressed intent in C++ is that it should
work as in C, and the expressed intent in C corresponds pretty
much to what you say, but it remains intent, and not a formal
requirement.

It is implementation defined, but there is one formal requirement :

[1.9/11] :
The least requirements on a conforming implementation are:
— At sequence points, volatile objects are stable in the sense that
previous evaluations are complete and subsequent evaluations have not
yet occurred.

So, what gpderetta said is correct in the sense of the standard (but,
yes, in practice you have to check the compiler's behavior).
Alexandre Courpron.

Sep 24 '08 #73

Pete Becker

On 2008-09-24 06:03:15 -0400, co******@gmail.com said:

On 24 sep, 09:53, James Kanze <james.ka...@gmail.comwrote:
>On Sep 23, 9:40 pm, gpderetta <gpdere...@gmail.comwrote:

>>On Sep 23, 9:12 pm, courp...@gmail.com wrote:
Also, IIRC the standard prohibits only reordering of accesses
to volatile objects. Reads and writes to non volatile objects
can still be reordered even across a volatile.

Not really. Â*Almost all of the actual specification of volatile
is implementation defined, so an implementation pretty much do
what it wants. Â*The expressed intent in C++ is that it should
work as in C, and the expressed intent in C corresponds pretty
much to what you say, but it remains intent, and not a formal
requirement.

It is implementation defined, but there is one formal requirement :

[1.9/11] :
The least requirements on a conforming implementation are:
â€” At sequence points, volatile objects are stable in the sense that
previous evaluations are complete and subsequent evaluations have not
yet occurred.

So, what gpderetta said is correct in the sense of the standard (but,
yes, in practice you have to check the compiler's behavior).

Not quite: that part of 1.9/11 says that modifications of volatile
objects can't be reordered if there is an intervening sequence point.
There is no ordering requirement for two modifications without an
itervening sequence point.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Sep 24 '08 #74

Jon Harrop

Pete Becker wrote:

Not quite: that part of 1.9/11 says that modifications of volatile
objects can't be reordered if there is an intervening sequence point.
There is no ordering requirement for two modifications without an
itervening sequence point.

But what is a sequence point in terms of C or C++ source code? AFAICT, they
have no way to express sequence points.

From what I can gather, GCC is not reordering the stores of volatile
variables in this case when optimizing but it is also not emitting memory
barriers that would prevent the CPU from breaking the ordering. So the
lock-free code output by GCC is useless.

This raises some questions for me:

1. Is volatile completely useless?

2. Is it impossible to write lock-free programs in C and C++ without
dropping to assembler?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 24 '08 #75

James Kanze

On Sep 24, 12:03 pm, courp...@gmail.com wrote:

On 24 sep, 09:53, James Kanze <james.ka...@gmail.comwrote:

On Sep 23, 9:40 pm, gpderetta <gpdere...@gmail.comwrote:

On Sep 23, 9:12 pm, courp...@gmail.com wrote:
Also, IIRC the standard prohibits only reordering of
accesses to volatile objects. Reads and writes to non
volatile objects can still be reordered even across a
volatile.

Not really. Almost all of the actual specification of
volatile is implementation defined, so an implementation
pretty much do what it wants. The expressed intent in C++
is that it should work as in C, and the expressed intent in
C corresponds pretty much to what you say, but it remains
intent, and not a formal requirement.

It is implementation defined, but there is one formal requirement :

[1.9/11] :
The least requirements on a conforming implementation are:
? At sequence points, volatile objects are stable in the sense that
previous evaluations are complete and subsequent evaluations have not
yet occurred.

Which means? (For starters, a sequence point is a compile time
characterization, and doesn't exist at runtime.)

Accessing a volatile object is observable behavior, but what
consitutes an access is implementation defined. Technically,
that means that the implementation is required to document it;
I've never been able to find such documentation, but judging
from the code generated by the implementations I've used, the
definition is something along the lines: a load or store
instruction has been emitted. Since on a modern machne, a load
instruction doesn't guarantee a write to memory, and a store
doesn't guarantee a read from memory, that's a totally useless
(albeit legal) definition.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 24 '08 #76

=?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=

On 2008-09-24 17:37, Jon Harrop wrote:

Pete Becker wrote:
>Not quite: that part of 1.9/11 says that modifications of volatile
objects can't be reordered if there is an intervening sequence point.
There is no ordering requirement for two modifications without an
itervening sequence point.

But what is a sequence point in terms of C or C++ source code? AFAICT, they
have no way to express sequence points.

From what I can gather, GCC is not reordering the stores of volatile
variables in this case when optimizing but it is also not emitting memory
barriers that would prevent the CPU from breaking the ordering. So the
lock-free code output by GCC is useless.

This raises some questions for me:

1. Is volatile completely useless?

No, it can among other things be useful when writing device-drivers
where a certain memory-location is mapped to registers on the device.
The usage of volatile then assures that the value of the memory-location
is used and not some value in a register or cache. You should probably
also use it for data shared between threads (such as your pointer) but
you need to provide separate synchronisation.

--
Erik WikstrÃ¶m

Sep 24 '08 #77

Jon Harrop

Erik WikstrÃ¶m wrote:

No, it can among other things be useful when writing device-drivers
where a certain memory-location is mapped to registers on the device.
The usage of volatile then assures that the value of the memory-location
is used and not some value in a register or cache. You should probably
also use it for data shared between threads (such as your pointer) but
you need to provide separate synchronisation.

Is it possible to provide the necessary synchronization without locks from C
or C++?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Sep 24 '08 #78

=?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=

On 2008-09-24 19:34, Jon Harrop wrote:

Erik WikstrÃ¶m wrote:
>No, it can among other things be useful when writing device-drivers
where a certain memory-location is mapped to registers on the device.
The usage of volatile then assures that the value of the memory-location
is used and not some value in a register or cache. You should probably
also use it for data shared between threads (such as your pointer) but
you need to provide separate synchronisation.

Is it possible to provide the necessary synchronization without locks from C
or C++?

Not in C++03 (nor C99 AFAIK), since it has no notion of multiple threads
of execution. POSIX extends C to allow threaded execution and you can
use MS specific stuff on Windows but C++0x is the first version of the
C++ standard that handles threads, and it does have some primitives
which can be used to synchronise without using normal locks.

--
Erik WikstrÃ¶m

Sep 24 '08 #79

courpron

On 24 sep, 17:39, James Kanze <james.ka...@gmail.comwrote:

On Sep 23, 9:40 pm, gpderetta <gpdere...@gmail.comwrote:
Also, IIRC the standard prohibits only reordering of
accesses to volatile objects. Reads and writes to non
volatile objects can still be reordered even across a
volatile.
Not really. *Almost all of the actual specification of
volatile is implementation defined, so an implementation
pretty much do what it wants. *The expressed intent in C++
is that it should work as in C, and the expressed intent in
C corresponds pretty much to what you say, but it remains
intent, and not a formal requirement.
It is implementation defined, but there is one formal requirement :
[1.9/11] :
The least requirements on a conforming implementation are:
? At sequence points, volatile objects are stable in the sense that
previous evaluations are complete and subsequent evaluations have not
yet occurred.

Which means? *(For starters, a sequence point is a compile time
characterization, and doesn't exist at runtime.)

That means volatile prevents static reordering of volatile variables
accesses, and just that (we were dissecting all the effects of
volatile). But, for quite a moment, we have already said that volatile
is useless because it doesn't put runtime constraints too (i. e.
memory barriers). So we certainly don't disagree here.
Alexandre Courpron.

Sep 24 '08 #80

peter koch

On 24 Sep., 19:34, Jon Harrop <j...@ffconsultancy.comwrote:

Erik Wikström wrote:
No, it can among other things be useful when writing device-drivers
where a certain memory-location is mapped to registers on the device.
The usage of volatile then assures that the value of the memory-location
is used and not some value in a register or cache. You should probably
also use it for data shared between threads (such as your pointer) but
you need to provide separate synchronisation.

Is it possible to provide the necessary synchronization without locks from C
or C++?

The current version of C++ has no support for threads whatsoever, so
we are talking platform specific options. In that context there are
varying degrees of support. The platform, I know best (Microsoft C++)
has lots of stuff to support threading without locking - I believe
documentation is available on the internet.
The new version of C++ has direct support for threading, also many low-
level primitives.

/Peter

Sep 24 '08 #81

James Kanze

On Sep 24, 6:30 pm, Erik Wikström <Erik-wikst...@telia.comwrote:

On 2008-09-24 17:37, Jon Harrop wrote:
Pete Becker wrote:
Not quite: that part of 1.9/11 says that modifications of
volatile objects can't be reordered if there is an
intervening sequence point. There is no ordering
requirement for two modifications without an itervening
sequence point.

But what is a sequence point in terms of C or C++ source
code? AFAICT, they have no way to express sequence points.

From what I can gather, GCC is not reordering the stores of
volatile variables in this case when optimizing but it is
also not emitting memory barriers that would prevent the CPU
from breaking the ordering. So the lock-free code output by
GCC is useless.

This raises some questions for me:

1. Is volatile completely useless?

It depends on the compiler, but as implemented by Sun CC and g++
on Sparcs, yes. (Or almost. It does provide certain guarantees
with respect to setjmp/longjmp. But that shouldn't be relevant
to any C++ code. There are also guarantees when it is used on a
sig_atomic_t, but those guarantees only apply to signal handlers
and the thread they interrupted.)

No, it can among other things be useful when writing
device-drivers where a certain memory-location is mapped to
registers on the device.

That's certainly the intent. It appears as an example in the C
standard, and is mentionned in the C rationale. On the other
hand, it's not true with Sun CC or g++ on a Sparc, where the
architecture specification is quite clear: membar instructions
are needed for memory mapped IO. (I don't know what actual
hardware does here. If the memory mapped IO is at specific
addresses, the hardware could recognized it, and implicitly
behave as if the membar instructions were there. Even though
the Sparc specification doesn't require it.)

The usage of volatile then assures that the value of the
memory-location is used and not some value in a register or
cache.

Again, that's the intent. It's not the case for g++ or Sun CC
on Sparc.

You should probably also use it for data shared between
threads (such as your pointer) but you need to provide
separate synchronisation.

There's no point in using it on data shared between threads
(except to slow things down). It's not sufficient, and when the
other mechanisms are used correctly, it's not necessary.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 25 '08 #82

James Kanze

On Sep 24, 8:19 pm, courp...@gmail.com wrote:

On 24 sep, 17:39, James Kanze <james.ka...@gmail.comwrote:

On Sep 23, 9:40 pm, gpderetta <gpdere...@gmail.comwrote:
Also, IIRC the standard prohibits only reordering of
accesses to volatile objects. Reads and writes to non
volatile objects can still be reordered even across a
volatile.
Not really. Almost all of the actual specification of
volatile is implementation defined, so an implementation
pretty much do what it wants. The expressed intent in C++
is that it should work as in C, and the expressed intent in
C corresponds pretty much to what you say, but it remains
intent, and not a formal requirement.
It is implementation defined, but there is one formal requirement :
[1.9/11] :
The least requirements on a conforming implementation are:
? At sequence points, volatile objects are stable in the sense that
previous evaluations are complete and subsequent evaluations have not
yet occurred.

Which means? (For starters, a sequence point is a compile time
characterization, and doesn't exist at runtime.)

That means volatile prevents static reordering of volatile
variables accesses, and just that (we were dissecting all the
effects of volatile).

But only if there is a sequence point between them. And only
for whatever the implementation defines "access" to mean: as I
think I said, the Sparc compilers I have access to consider
accessing the value in the memory pipeline sufficient, even
though the memory pipeline is local to each CPU (core, not
visible except from the CPU, and not synchronized with anything
outside the CPU.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 25 '08 #83

Similar topics