std::deque Thread Saftey Situtation

NvrBst

I've read a bit online seeing that two writes are not safe, which I
understand, but would 1 thread push()'ing and 1 thread pop()'ing be
thread-safe? Basically my situation is the follows:

--Thread 1--
1. Reads TCPIP Buffer
2. Adds Buffer to Queue (q.size() to check queue isn't full, and
q.push_back(...))
3. Signals Reading Thread Event & Goes Back to Wait for More Messages
on TCPIP

--Thread 2--
1. Waits for Signal
2. while(q.size() 0) myStruct = q.front();
2a. processes myStruct (doesn't modify it but does copy some
information for another queue).
2b. delete[] myStruct.Buffer
2c. q.pop_front();
3. Goes back to waiting for single

I've run the app for a few hours (basically saturating the TCPIP
traffic) and there were no apparent problems. The "processes
myStruct" takes a while and I can't have the push_back(...) thread
locked while processing is working.

I can add Critical Section locks around the ".size()", ".front()" and/
or ".pop_front()/.push_back()", but my first inclination is that this
is thread safe?

My worry is that say if a .push_back() starts on a deque with 1 node
in it. It sees the 1st node and modifies something in it to point to
the, to be added, new node, and then the ".pop_front()" occurs while
that’s happening. I have no idea how the queue is implemented
internally so unsure if this is a valid worry. Performance is very
important and I would rather absolutely no blocking unless it's needed
which is why I ask here :)

If Critical Sections are needed, would it just be fore the
".pop_front()/.push_back()"? or all member functions I'm using
(.size()/.front()).

Thanks. Any information would be greatly appreciated. These are the
only two threads/functions accessing the queue. I currently
implemented it with locking on all just to be safe, but would like to
remove the locking if it is not needed, or fine tune it.

Aug 29 '08 #1

Subscribe Post Reply

9073

peter koch

On 30 Aug., 00:49, NvrBst <nvr...@gmail.comwrote:

I've read a bit online seeing that two writes are not safe, which I
understand, but would 1 thread push()'ing and 1 thread pop()'ing be
thread-safe? *

Basically my situation is the follows:

--Thread 1--
1. Reads TCPIP Buffer
2. Adds Buffer to Queue (q.size() to check queue isn't full, and
q.push_back(...))
3. Signals Reading Thread Event & Goes Back to Wait for More Messages
on TCPIP

--Thread 2--
1. Waits for Signal
2. while(q.size() 0) myStruct = q.front();
2a. processes myStruct (doesn't modify it but does copy some
information for another queue).
2b. delete[] myStruct.Buffer
2c. q.pop_front();
3. Goes back to waiting for single

I've run the app for a few hours (basically saturating the TCPIP
traffic) and there were no apparent problems. *The "processes
myStruct" takes a while and I can't have the push_back(...) thread
locked while processing is working.

If it did not fail, you were simply lucky. You will surely run out of
luck some day.

>
I can add Critical Section locks around the ".size()", ".front()" and/
or ".pop_front()/.push_back()", but my first inclination is that this
is thread safe?

It is thread-safe only when you protect everything with a mutex (which
could very well be a windows CRITICAL_SECTION).

[snip]

If Critical Sections are needed, would it just be fore the
".pop_front()/.push_back()"? or all member functions I'm using
(.size()/.front()).

On all of them.

>
Thanks. *Any information would be greatly appreciated. *These are the
only two threads/functions accessing the queue. *I currently
implemented it with locking on all just to be safe, but would like to
remove the locking if it is not needed, or fine tune it.

Did you measure the performance without a lock? Presumably, it would
not really make a difference.

/Peter

Aug 29 '08 #2

NvrBst

Ahh thanks for the clarification. I was sorta expecting it not to be
thead safe -_-.

Did you measure the performance without a lock? Presumably, it would
not really make a difference.

The clean locking way IE

lock();
while(q.size() 0) {
Process((.front().Buffer);
delete[] etc
..pop_front();
}
unlock();

Was way to slow (because of the "Process(..)"). Moving the lock
inside the while loop, and adding some tmp vars, and also managing the
"q.size()" in the head of the while loop made the code a little more
ugly, but I believe the performance isn't too much worst; just seemed
to me that if I wanted to increase the performance, tackling the
locking/unlocking that's inside the while loop might be a good thing
to look into (the while loop can run hundreds, maybe thousands of
times depending how fast the other thread is pumping the queue up).

Thanks

Aug 29 '08 #3

peter koch

On 30 Aug., 01:27, NvrBst <nvr...@gmail.comwrote:

Ahh thanks for the clarification. *I was sorta expecting it not to be
thead safe -_-.

Did you measure the performance without a lock? Presumably, it would
not really make a difference.

The clean locking way IE

lock();
while(q.size() 0) {
Process((.front().Buffer);
delete[] etc
.pop_front();}

unlock();

This is not at all clean. You lock while processing many elements,
while you only should lock for the short duration where you pop/push
an element.

Search for "condition variables" for an idea on how to do this kind of
stuff. A producer-consumer structure is really quite basic and most
useful - but it must be implemented correctly.

>
Was way to slow (because of the "Process(..)"). *Moving the lock
inside the while loop, and adding some tmp vars, and also managing the
"q.size()" in the head of the while loop made the code a little more
ugly, but I believe the performance isn't too much worst; just seemed
to me that if I wanted to increase the performance, tackling the
locking/unlocking that's inside the while loop might be a good thing
to look into (the while loop can run hundreds, maybe thousands of
times depending how fast the other thread is pumping the queue up).

A low level mutex is a very fast beast unless you have heavy
contention. If your data is expensive to copy, that might be a
bottleneck, but wait dealing with that until you have gotten your
basic structure to work.

/Peter

Aug 29 '08 #4

NvrBst

A low level mutex is a very fast beast unless you have heavy

contention. If your data is expensive to copy, that might be a
bottleneck, but wait dealing with that until you have gotten your
basic structure to work.

/Peter

The information is too much to copy. When I said clean, I ment more-
so clean to look at (understand). What I have no looks like this:

Enter();
while(.size() 0) {
Leave();
Process();
Enter();
}
Leave();

It works fine but as you might guess a little harder to understand, I
left out some temp variables, but basically it's not as clean to look
at as:

Enter();
while(.size() 0) Process();
Leave();
Is the top way bad? Is there a consumer-producer model I should be
using instead (basically the TCPIP reading can't be blocked, and
processing the reading queue takes a while; it usally sometimes hits
0, but the "while(.size()) {..}" is usally running 75% of the time.

Thanks

Aug 29 '08 #5

peter koch

On 30 Aug., 01:53, NvrBst <nvr...@gmail.comwrote:

A low level mutex is a very fast beast unless you have heavy
contention. If your data is expensive to copy, that might be a
bottleneck, but wait dealing with that until you have gotten your
basic structure to work.

/Peter

The information is too much to copy. *When I said clean, I ment more-
so clean to look at (understand). *What I have no looks like this:

Enter();
while(.size() 0) {
Leave();
Process();
Enter();}

Leave();

This is not so easy to understand - even with correct indentation. And
it is not exception-safe.

>
It works fine but as you might guess a little harder to understand, I
left out some temp variables, but basically it's not as clean to look
at as:

Enter();
while(.size() 0) Process();
Leave();

Is the top way bad? *Is there a consumer-producer model I should be
using instead (basically the TCPIP reading can't be blocked, and
processing the reading queue takes a while; it usally sometimes hits
0, but the "while(.size()) {..}" is usally running 75% of the time.

As I said, you should look for condition variables. Perhaps boost has
something for you. I am not sure that they have a producer-consumer
class, but they have the building blocks you need, and quite likely
also an example that you can use as a skeleton.
I am confident that there are also other solutions out there.

/Peter

Aug 30 '08 #6

zaimoni

On Aug 29, 7:45 pm, NvrBst <nvr...@gmail.comwrote:

Everything I've written is pseudo-code, not my actual
code. I can't get the Indentations right here, and would be too hard
to follow if I just paste, but I can outline the basics:

--Pumping Thread--
WaitForReadSignle();
ReadTCPIP(Buffer, size, ...);
//Copies Buffer, and makes myStruct
m_Q[TCPIP].qLock.Enter();
q.push_back(myStruct);
m_Q[TCPIP].qLock.Leave();
SetEvent(recvieEvent);

--Processing Thread--
WaitForObjects(recvieEvent);
m_Q[TCPIP].qLock.Enter();
while(q.size() 0) {
myStruct = q.front();
q.pop_front();
m_Q[TCPIP].qLock.Leave();
ProcessData(myStruct);
m_Q[TCPIP].Enter();}

m_Q[TCPIP].Leave();

Both threads are in loops, and there are stuff I left out for clarity,
but this, more-than-less, is the structure. I do have try{} finally{}
blocks set up (to ensure that for every "Enter()" there is a
"Leave()"). I left them out for clarity. ....

Enough try-catch blocks will work, although I'd find proofreading them
a bit much for my taste. A thin wrapper class would get the
proofreading time per instance down to practically instant.

Please pardon my use of agonizing detail in the following. A really
thin wrapper class WrapMutex should:
* take the actual mutex as a non-const reference parameter in its
constructor
* store a reference to the actual mutex as private data
** I think this implies not default-constructable, as a parameter is
needed to initialize the reference. Usually not a good trait, but
acceptable for this class.
* call Enter on the mutex in its inline constructor
* call Leave on the mutex in its inline non-virtual destructor
* not be copyable
** We should lose the default assignment operator because of the
reference data member, but the copy constructor needs to be rendered
unusable. Besides the textbook method, deriving from
boost::noncopyable will work.

Then your code

m_Q[TCPIP].qLock.Enter();
while(q.size() 0) {
myStruct = q.front();
q.pop_front();
m_Q[TCPIP].qLock.Leave();
ProcessData(myStruct);
m_Q[TCPIP].qLock.Enter();
}
m_Q[TCPIP].qLock.Leave();

can be replaced by

some_struct_type myStruct;
while(q.size() 0) {
{
WrapMutex(m_Q[TCPIP].qLock);
myStruct = q.front();
q.pop_front();
}
ProcessData(myStruct);
}

As I said, you should look for condition variables. Perhaps boost has
something for you. I am not sure that they have a producer-consumer
class, but they have the building blocks you need, and quite likely
also an example that you can use as a skeleton.
I am confident that there are also other solutions out there.

I'm unsure what boost is,

The main website is http://www.boost.org/ ; it's a very extensive C++
library with a noticeable learning curve. I personally am nonfluent
in it. (I currently use mainly the program-correctness, conditional
compilation of templates, and Boost.Preprocessor sub-libraries.)

Aug 30 '08 #7

zaimoni

On Aug 29, 5:49 pm, NvrBst <nvr...@gmail.comwrote:

I've read a bit online seeing that two writes are not safe, which I
understand, but would 1 thread push()'ing and 1 thread pop()'ing be
thread-safe?

As both push() and pop() are non-const operations, no. (In general,
non-const operations are "writing" to the data structure.)

Assuming the data type has no mutable data members: Even one non-const
operation in parallel with any number of const operations on the same
data structure is technically unsafe. Also, any operation that
invalidates iterators is unsafe when another thread has iterators in
use (for a std::deque, inserting elements at all comes to mind).

There are plausibly some more subtle issues that an expert in the
field would notice.

Aug 30 '08 #8

NvrBst

Ahh thank you. I find examples a lot easier to follow, one question:

some_struct_type myStruct;
while(q.size() 0) {
* {
* WrapMutex(m_Q[TCPIP].qLock);
* myStruct = q.front();
* q.pop_front();
* }
* ProcessData(myStruct);

}

This would imply the ".size()" doesn't have to be wrapped in the
CriticalSections? The reasion I put the locks/unlocks outside the
whileloop was solely for that reasion. I actually already have a
wraper like you listed "CriticalSection::Owner
sLock(m_Q[TCPIP].qLock);" which does what you outlined, but I failed
to realize I could put empty curly brackets around the section I want,
to force it out of scope :)

Most my confusion comes from my C# background (java, and then C# is
what I grew up with), and in C#'s Queue, the size property is stored
as an Int32 which, when reading, is an atomic operation on mostly all
processors. Meaning it even read it before another thread changes it,
or after; either case is fine (thread safe) if there are only two
threads, and each thread either pop's or push's, thus can't cause an
exception. This is because in C# the size doesn't get incremented
untill after the insertion is completed, so exceptions can't occure in
my situation.

I was thinking ".front()" should also be thread safe for the same (C#-
Based) reasion, since it's a 32bit applications, and .front() should
be returning a pointer to the first element. Since the thread
calling .front() is the only thread who's removing elements, and since
this thread knows that ".size()" shows one element, then I would of
assumed ".front()" would be thread-safe as well.

Since .size() was a method call instead of a property, I was thinking
there might be more to it in c++... Which I'm not confused on again:
".size()" ".front()" would both be thread-safe operations for a thread
thats removes elements (being the only thread that removes items)? In
C# Enqueue/Dequeue arn't thread safe, but again, depends on
implementations, which is why I asked here; I would be suprised if
they were, but thought maybe.

So in essance I can change it to the following if I wanted to fine
tune the thread-safty locking to it's bare minimium in my example (and
remove my try statment to boot, yay)?

some_struct_type myStruct;
while(q.size() 0) {
myStruct = q.front();
{
WrapMutex(m_Q[TCPIP].qLock);
q.pop_front();
}
ProcessData(myStruct); }
Thanks, NB

Aug 30 '08 #9

NvrBst

"Which I'm not confused on again" -"Which I'm now confused on
again". Sorry typo's, Amung others; this one change my meaning a bit
more than I wanted.

Aug 30 '08 #10

zaimoni

On Aug 29, 9:35 pm, NvrBst <nvr...@gmail.comwrote:

Ahh thank you. I find examples a lot easier to follow, one question:

some_struct_type myStruct;
while(q.size() 0) {
{
WrapMutex(m_Q[TCPIP].qLock);
myStruct = q.front();
q.pop_front();
}
ProcessData(myStruct);

}

This would imply the ".size()" doesn't have to be wrapped in the
CriticalSections?

Yes, but I got that wrong. In general, it should be in the critical
section.

To be truly safe without locking up the deque for the entire loop, the
loop would have to be reorganized so that the q.size() call was within
the critical section.

The reasion I put the locks/unlocks outside the
whileloop was solely for that reasion.

Ok. That is not exception-safe by itself, although you did mention
using try-catch blocks to counter that. The wrapper class approach
shouldn't need the try-catch blocks to be exception-safe.

...

I was thinking ".front()" should also be thread safe for the same (C#-
Based) reasion, since it's a 32bit applications, and .front() should
be returning a pointer to the first element. Since the thread

I'm not familiar with C#, so I don't know when C# pointers analogize
to either C++ raw pointers, or C++ references. It makes a huge
difference here.

front() and back() return C++ references. begin() and end() return
iterators (which usually have the same behavior as pointers).
Iterators are designed so that a pointer is an iterator for a C-style
array.

calling .front() is the only thread who's removing elements, and since
this thread knows that ".size()" shows one element, then I would of
assumed ".front()" would be thread-safe as well.

It's important that this is the only thread removing elements. As
long as std::deque is implemented correctly, insertions by other
threads to the front or back will only invalidate all iterators;
references will be fine. That is: *q.begin() is not safe, q.front()
would mostly be safe. [Insertions to the middle will invalidate
references.] However, the loop does assume that no other threads are
inserting to the front.

Since .size() was a method call instead of a property, I was thinking
there might be more to it in c++... Which I'm not confused on again:

True. There's more to size() than to empty(), as well.

".size()" ".front()" would both be thread-safe operations for a thread
thats removes elements (being the only thread that removes items)?

q.empty() would come pretty close to being thread-safe (and might well
be for a good implementation on a sufficiently friendly CPU). The
other member functions are likely to be thrown off by temporarily
inconsistent internal state.

As long as all insertions are to the end, it is possible that a good
implementation would have a coincidentally thread-safe q.front().
Insertions to the middle will invalidate this.

The reference from q.front() should be thread-safe regardless, given
that this thread is the only thread removing items. But q.pop_front()
will invalidate the reference, so the assignment is needed.

So in essance I can change it to the following if I wanted to fine
tune the thread-safty locking to it's bare minimium in my example (and
remove my try statment to boot, yay)?

The try-catch statements can be outright gone, yes.

some_struct_type myStruct;
while(q.size() 0) {
myStruct = q.front();
{
WrapMutex(m_Q[TCPIP].qLock);
q.pop_front();
}
ProcessData(myStruct); }

A fortunate implementation would let us get away with:

some_struct_type myStruct;
while(!q.empty()) {
myStruct = q.front();
{
WrapMutex(m_Q[TCPIP].qLock);
q.pop_front();
}
ProcessData(myStruct); }

The loop that would be maximally portable would be like:

some_struct_type myStruct;
bool in_loop = false; /* redundant */
do{
in_loop = false;
{
WrapMutex my_lock(m_Q[TCPIP].qLock);
if (!q.empty())
{
myStruct = q.front();
q.pop_front();
in_loop = true;
};
}
if (in_loop) ProcessData(myStruct);
}
while(in_loop);

Aug 30 '08 #11

Paavo Helde

NvrBst <nv****@gmail.comkirjutas:

>A low level mutex is a very fast beast unless you have heavy
contention. If your data is expensive to copy, that might be a
bottleneck, but wait dealing with that until you have gotten your
basic structure to work.

/Peter

The information is too much to copy. When I said clean, I ment more-

There is no need to copy. You should write a swap() member function for
swapping the contents of an item in the queue with a local data item. The
swap() method can be usually implemented fast and exception-safe;
std::string and all STL containers already have it.

Fetching the item from the queue would then be something like that (using
boost::thread library, this helps to get one's mutex/locks/conditions
semantics right):

std::deque<Itemqueue;
boost::mutex queue_mutex;
boost::condition queue_condition;

....

Item local;
{
boost::mutex::scoped_lock lock(queue_mutex);
while(queue.empty()) {
// wait until an item arrives
// other behaviors can be imagined here as well
queue_condition.wait(lock);
}
queue.front().swap(local);
queue.pop_front();
}
// ... process local to your heart's content, with no mutex locked.

Pushing:

Item tobepushed;
....
{
boost::mutex::scoped_lock lock(queue_mutex);
queue.push_back(Item());
queue.back().swap(tobepushed);
queue_condition.notify_all();
}
And of course, you should pack the queue in a nice class or template,
possibly with a template policy for different waiting strategies. I
wonder whether boost already doesn't have such a queue, seems to be a
quite generic thing to me...

hth
Paavo

Aug 30 '08 #12

zaimoni

On Aug 30, 12:55 am, Paavo Helde <nob...@ebi.eewrote:

NvrBst <nvr...@gmail.comkirjutas:

The information is too much to copy. When I said clean, I ment more-

There is no need to copy. You should write a swap() member function for
swapping the contents of an item in the queue with a local data item.

It depends. If the data is Plain Old Data, this becomes questionable
(and this seems plausible considering we're talking about a struct);
why incur three assignments in place of one?

If there's dynamic memory allocation involved in the assignment
operator, then swap should be implemented regardless, and used
liberally.

....

Fetching the item from the queue would then be something like that (using
boost::thread library, this helps to get one's mutex/locks/conditions
semantics right):

True, but given already-developed code the immediate need to port
becomes questionable. Boost.Thread does have the advantage that it
was rewritten for 1.36.0 to more closely track the C++0x proposals.

....

And of course, you should pack the queue in a nice class or template,
possibly with a template policy for different waiting strategies. I
wonder whether boost already doesn't have such a queue, seems to be a
quite generic thing to me...

Waiting policies (and the initial wait for a non-empty queue) is being
handled by signals currently. I'm not clear on whether there's enough
flexibility to rewrite using Boost.Signals. (Both Boost.Thread and
Boost.Signals are not header-only;

It would be surprising if Boost libraries directly replicated STL
functionality. I don't see any directly relevant data structures in
Boost 1.36.0, possibly because std::queue<T(whose default container
is std::deque<T>) fits quite nicely. (The time to notice this was at
the design stage; converting the code now may not be worth it.)

Aug 30 '08 #13

Paavo Helde

za*****@zaimoni.com kirjutas:

On Aug 30, 12:55 am, Paavo Helde <nob...@ebi.eewrote:
>NvrBst <nvr...@gmail.comkirjutas:

The information is too much to copy. When I said clean, I ment
more-

There is no need to copy. You should write a swap() member function
for swapping the contents of an item in the queue with a local data
item.

It depends. If the data is Plain Old Data, this becomes questionable
(and this seems plausible considering we're talking about a struct);
why incur three assignments in place of one?

One can use copy of course instead of swap when applicable. I just
answered to OP-s claim "The information is too much to copy.".

>(using boost::thread library, this helps to get one's
mutex/locks/conditions semantics right):

True, but given already-developed code the immediate need to port
becomes questionable.

I did not suggest porting, though after seeing mention of critical
sections and lock.Leave() calls in the OP-s posts I actually feel quite
confident that rewriting the system in boost::thread would take less time
and effort than getting the already-developed code correct. I just used
boost::thread in the example because that is what I know best.

Thanks for the boost development info! We are still using 1.34 as we are
in a project stabilization phase and avoiding upgrades during that time.

regards
Paavo

Aug 30 '08 #14

peter koch

On 30 Aug., 04:35, NvrBst <nvr...@gmail.comwrote:

Ahh thank you. *I find examples a lot easier to follow, one question:

some_struct_type myStruct;
while(q.size() 0) {
* {
* WrapMutex(m_Q[TCPIP].qLock);
* myStruct = q.front();
* q.pop_front();
* }
* ProcessData(myStruct);

}

This would imply the ".size()" doesn't have to be wrapped in the
CriticalSections? *

Yes - that is an error with the code: it should be wrapped.

[snip]

Most my confusion comes from my C# background (java, and then C# is
what I grew up with), and in C#'s Queue, the size property is stored
as an Int32 which, when reading, is an atomic operation on mostly all
processors. *Meaning it even read it before another thread changes it,
or after; either case is fine (thread safe) if there are only two
threads, and each thread either pop's or push's, thus can't cause an
exception. *This is because in C# the size doesn't get incremented
untill after the insertion is completed, so exceptions can't occure in
my situation.

This is not correct, and your C# code will not work unless there is
more action involved when reading the Int32 size. The reason is that
even though the size gets written after the data, it might not be seen
this way by another thread. Memory writes undergo a lot of steps
involving caches of different kinds, and the cacheline with the size
might be written to memory before the cache-line containing the data.

>
I was thinking ".front()" should also be thread safe for the same (C#-
Based) reasion, since it's a 32bit applications, and .front() should
be returning a pointer to the first element. *Since the thread
calling .front() is the only thread who's removing elements, and since
this thread knows that ".size()" shows one element, then I would of
assumed ".front()" would be thread-safe as well.

You have the same problem here.

[snip]

/Peter

Aug 30 '08 #15

Jerry Coffin

In article <0bbce4f3-4ab7-47fd-8be1-
7d**********@n38g2000prl.googlegroups.com>, nv****@gmail.com says...

I've read a bit online seeing that two writes are not safe, which I
understand, but would 1 thread push()'ing and 1 thread pop()'ing be
thread-safe? Basically my situation is the follows:

Generally speaking, no, it's not safe.

My advice would be to avoid std::deque in such a situation -- in a
multi-threaded situation, it places an undue burden on the client code.
This is a case where it's quite reasonable to incorporate the locking
into the data structure itself to simplify the client code (a lot).

For one example, I've used code like this under Windows for quite a
while:

template<class T, unsigned max = 256>
class queue {
HANDLE space_avail; // signaled =at least one slot empty
HANDLE data_avail; // signaled =at least one slot full
CRITICAL_SECTION mutex; // protect buffer, in_pos, out_pos

T buffer[max];
long in_pos, out_pos;
public:
queue() : in_pos(0), out_pos(0) {
space_avail = CreateSemaphore(NULL, max, max, NULL);
data_avail = CreateSemaphore(NULL, 0, max, NULL);
InitializeCriticalSection(&mutex);
}

void push(T data) {
WaitForSingleObject(space_avail, INFINITE);
EnterCriticalSection(&mutex);
buffer[in_pos] = data;
in_pos = (in_pos + 1) % max;
LeaveCriticalSection(&mutex);
ReleaseSemaphore(data_avail, 1, NULL);
}

T pop() {
WaitForSingleObject(data_avail,INFINITE);
EnterCriticalSection(&mutex);
T retval = buffer[out_pos];
out_pos = (out_pos + 1) % max;
LeaveCriticalSection(&mutex);
ReleaseSemaphore(space_avail, 1, NULL);
return retval;
}

~queue() {
CloseHandle(space_avail);
CloseHandle(data_avail);
DeleteCriticalSection(&mutex);
}
};

Exception safety depends on assignment of T being nothrow, but (IIRC)
not much else is needed. This uses value semantics, so if you're dealing
with something where copying is expensive, you're expected to use some
sort of smart pointer to avoid copying. Of course, a reference counted
smart pointer will usually have some locking of its own on incrementing
and decrementing the reference count, but that's a separate issue.

Contrary to the statements elsethread, I've seen little use for being
able to parameterize how the waits happen and such. While there might be
situations where it could be useful, I haven't encountered a need in
real life.

Likewise, there's optimization that could be done -- for an obvious
example, the current design switches to kernel mode twice for every push
or pop, which isn't exactly fast. In real use, however, I haven't see
this take enough processor time to bother optimizing it.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Aug 30 '08 #16

NvrBst

Thank you all for your information :) I think I have what I need now.

This is not correct, and your C# code will not work unless there is
more action involved when reading the Int32 size. The reason is that
even though the size gets written after the data, it might not be seen
this way by another thread. Memory writes undergo a lot of steps
involving caches of different kinds, and the cacheline with the size
might be written to memory before the cache-line containing the data.

Only if there were, say, two threads poping. In my case I only have 1
thread poping, and 1 thread pushing. Even if the poping thread reads
0 elements (because say the size cache wasn't updated in time, but
there is really 1 element), that is safe, it'll ignore untill size
says 1 element. If it reads 1+ element then there is definally 1
element to remove (since no other thread is removing items). Same
logic works for reading when there is a max size you can't go over.

Thanks again for all the information, and enjoy the weekend ;)

Aug 31 '08 #17

Thomas J. Gritzan

Paavo Helde schrieb:
[...]

Item local;
{
boost::mutex::scoped_lock lock(queue_mutex);
while(queue.empty()) {
// wait until an item arrives
// other behaviors can be imagined here as well
queue_condition.wait(lock);
}
queue.front().swap(local);
queue.pop_front();
}
// ... process local to your heart's content, with no mutex locked.

Doing the wait on the condition variable while holding the lock is a bad
idea, but you'll recognize that error very early when your application
dead locks.

--
Thomas

Aug 31 '08 #18

Thomas J. Gritzan

I myself wrote:

Paavo Helde schrieb:
[...]
>Item local;
{
boost::mutex::scoped_lock lock(queue_mutex);
while(queue.empty()) {
// wait until an item arrives
// other behaviors can be imagined here as well
queue_condition.wait(lock);
}
queue.front().swap(local);
queue.pop_front();
}
// ... process local to your heart's content, with no mutex locked.

Doing the wait on the condition variable while holding the lock is a bad
idea, but you'll recognize that error very early when your application
dead locks.

Sorry, the code is correct. condition::wait will unlock the mutex while
waiting on the condition variable.

I should have checked the boost.thread documentation earlier.

--
Thomas

Aug 31 '08 #19

Paavo Helde

"Thomas J. Gritzan" <ph*************@gmx.dekirjutas:

I myself wrote:
>Paavo Helde schrieb:
[...]
>>Item local;
{
boost::mutex::scoped_lock lock(queue_mutex);
while(queue.empty()) {
// wait until an item arrives
// other behaviors can be imagined here as well
queue_condition.wait(lock);
}
queue.front().swap(local);
queue.pop_front();
}
// ... process local to your heart's content, with no mutex locked.

Doing the wait on the condition variable while holding the lock is a
bad idea, but you'll recognize that error very early when your
application dead locks.

Sorry, the code is correct. condition::wait will unlock the mutex
while waiting on the condition variable.

Yes, that's one thing I like about boost::thread library that it makes
some kind of errors impossible (like forgetting to unlock the mutex
before wait, or forgetting to relock it after the wait).

regards
Paavo

Aug 31 '08 #20

peter koch

On 31 Aug., 02:45, NvrBst <nvr...@gmail.comwrote:

Thank you all for your information :) *I think I have what I need now.

This is not correct, and your C# code will not work unless there is
more action involved when reading the Int32 size. The reason is that
even though the size gets written after the data, it might not be seen
this way by another thread. Memory writes undergo a lot of steps
involving caches of different kinds, and the cacheline with the size
might be written to memory before the cache-line containing the data.

Only if there were, say, two threads poping. *In my case I only have 1
thread poping, and 1 thread pushing. *Even if the poping thread reads
0 elements (because say the size cache wasn't updated in time, but
there is really 1 element), that is safe, it'll ignore untill size
says 1 element. *If it reads 1+ element then there is definally 1
element to remove (since no other thread is removing items). *Same
logic works for reading when there is a max size you can't go over.

Thanks again for all the information, and enjoy the weekend ;)

No - you are not correct. Imagine a situation where the pushing thread
pushes the object to the dequeue and then increases the counter.
However, the size get written through the cache whereas the object
stays (partly) in the cache. Now the reader will see that there is an
object in the dequeue, but in reality it was never written to a
location that allows the other thread to see it (it is on another
core), so it reads rubbish where the object should be.

/Peter

Aug 31 '08 #21

James Kanze

On Aug 31, 8:56 am, Paavo Helde <nob...@ebi.eewrote:

"Thomas J. Gritzan" <phygon_antis...@gmx.dekirjutas:
Sorry, the code is correct. condition::wait will unlock the
mutex while waiting on the condition variable.

Yes, that's one thing I like about boost::thread library that
it makes some kind of errors impossible (like forgetting to
unlock the mutex before wait, or forgetting to relock it after
the wait).

Except that that's not a property of the boost::thread library,
but the way conditions work in general. All Boost provides here
is a very low level (but portable) wrapper for the Posix
interface, plus RAII for the locks held at the application
level (which is, in itself, already a good thing).

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Aug 31 '08 #22

James Kanze

On Aug 30, 7:06 pm, Jerry Coffin <jcof...@taeus.comwrote:

In article <0bbce4f3-4ab7-47fd-8be1-
7d1b39e7f...@n38g2000prl.googlegroups.com>, nvr...@gmail.com says...

I've read a bit online seeing that two writes are not safe, which I
understand, but would 1 thread push()'ing and 1 thread pop()'ing be
thread-safe? Basically my situation is the follows:

Generally speaking, no, it's not safe.

My advice would be to avoid std::deque in such a situation --
in a multi-threaded situation, it places an undue burden on
the client code.

Avoid it, or wrap it? I use it regularly for communicating
between threads; my Queue class is based on it.

This is a case where it's quite reasonable to incorporate the
locking into the data structure itself to simplify the client
code (a lot).

For one example, I've used code like this under Windows for
quite a while:

template<class T, unsigned max = 256>
class queue {
HANDLE space_avail; // signaled =at least one slot empty
HANDLE data_avail; // signaled =at least one slot full
CRITICAL_SECTION mutex; // protect buffer, in_pos, out_pos

T buffer[max];
long in_pos, out_pos;

And if you replace buffer, in_pos and out_pos with
std::deque<T>, where's the problem.

[...]

Exception safety depends on assignment of T being nothrow, but
(IIRC) not much else is needed. This uses value semantics, so
if you're dealing with something where copying is expensive,
you're expected to use some sort of smart pointer to avoid
copying. Of course, a reference counted smart pointer will
usually have some locking of its own on incrementing and
decrementing the reference count, but that's a separate issue.

My own queues use std::auto_ptr at the interface. This imposes
the requirement that all of the contained objects be dynamically
allocated, and adds some clean-up code in the queue itself
(since you can't put the auto_ptr in the deque), but ensures
that once the message has been posted, the originating thread
won't continue to access it.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Aug 31 '08 #23

NvrBst

No - you are not correct. Imagine a situation where the pushing thread

pushes the object to the dequeue and then increases the counter.
However, the size get written through the cache whereas the object
stays (partly) in the cache. Now the reader will see that there is an
object in the dequeue, but in reality it was never written to a
location that allows the other thread to see it (it is on another
core), so it reads rubbish where the object should be.

I don't quite understand what you say ("on another core" part), but
I'm pretty sure it is safe. The pop'ing thread is only taking the
first element, and the push'ing thread is appending to the end.

The size isn't incremented untill after the object is written, so when
the poping thread reads 1 element that element definally exsists (it
can't be in cache and not already written to the internal array).
Furthermore, there is a "version" number incremented internally
whenever a change is made; this forces the cache to re-validate an
element. If the size reads 0 then there can't be any problem, and if
the size reads 2+ then there can't be any problem with the first
element, so when the size reads "1" should be the only possible error
case; which, as stated, item is avaible before size is 1, and version
is incremented as well.

No way the popping thread can atempt to access the 1st element when it
actually isn't avaliable because it'd read size as "0". I think what
you were talking about is that an old cached _array[0] was there that
was poped, however, "push thread" added a new element, incremented
size, then pop thread took the cached "_array[0]". Again this
situation can't happen because the version is incremented and any
request for _array[0] would be re-validated when it seen version
doesn't match.

There might be a possible problem if the capacity is reached and it
atempts to auto incremented the internal array, however, as long as
your push thread checks capacity before adding, there shouldn't be a
problem.

Sep 1 '08 #24

Jerry Coffin

In article <638bd1ab-a5e5-4567-aa53-
90**********@k37g2000hsf.googlegroups.com>, ja*********@gmail.com
says...

[ ... ]

My advice would be to avoid std::deque in such a situation --
in a multi-threaded situation, it places an undue burden on
the client code.

Avoid it, or wrap it? I use it regularly for communicating
between threads; my Queue class is based on it.

I'd generally avoid it. Wrapping it works, but presents trade offs I
rarely find useful.

The problem with std::deque is that it can and will resize itself when
you add an item and there's not currently room to store that item. Doing
so, however, normally involves allocating a block of memory from the
free store. As such, this operation can be fairly slow. That translates
to a fairly substantial amount of time that's being spent just in
storing the data rather than processing the data. It only really makes
sense when your work items are sufficiently coarse-grained that the
dynamic allocation will be substantially faster than processing a single
work item.

I generally prefer to keep the processing sufficiently fine grained that
when/if the queue is full, it's likely to be faster to wait for an
existing item to be processed than to allocate more memory for the
queue. This devotes nearly all CPU time to processing the data instead
of allocating memory to store it.

Now, my own work with multiple threads has mostly been fairly easy to
break up into relatively small "chunks", and has been mostly quite
processor intensive. Those mean that 1) giving up processor time means
the consumer side can run faster, and 2) the time it would take to
allocate memory for queue would exceed the time taken to remove (at
least) one item from the queue.

OTOH, I can certainly imagine situations where those weren't true. An
obvious example would be something like a network router. In this case,
the "consumer" side is basically sending out network packets. More CPU
time won't speed up network transmissions, and transmitting a single
network packet undoubtedly takes longer than allocating a block of
memory.

And if you replace buffer, in_pos and out_pos with
std::deque<T>, where's the problem.

There isn't necessarily a problem -- but I think the choice between the
two is one of some real trade offs rather than just a matter of whether
you manipulate the pointers into the queue yourself, or use some pre-
written code to do the job.

Of course, you can/could also use reserve() and size() on the deque to
manage it as a fixed-size container, but I don't think at that point
you're really gaining much by using a deque at all. OTOH, it might still
make sense when/if you had both fixed- and variable-sized queues, and
this allowed you to share most code between the two.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 1 '08 #25

zaimoni

On Sep 1, 2:34 am, NvrBst <nvr...@gmail.comwrote:

No - you are not correct. Imagine a situation where the pushing thread
pushes the object to the dequeue and then increases the counter.
However, the size get written through the cache whereas the object
stays (partly) in the cache. Now the reader will see that there is an
object in the dequeue, but in reality it was never written to a
location that allows the other thread to see it (it is on another
core), so it reads rubbish where the object should be.

I don't quite understand what you say ("on another core" part), but
I'm pretty sure it is safe. The pop'ing thread is only taking the
first element, and the push'ing thread is appending to the end.

You're assuming that std::deque has its own internal mutex (or other
compiler trickery reacting to volatile) so that each individual member
function call has its own lock-unlock against that mutex. This is not
required by the C++ standard, but can be remediated by a template
wrapper class at noticeable inefficiency.

A quick check on Google suggests this is not true (MSDN Developer
Network search hit) even for a C# Queue; in C# you would have to use
the Queue.Synchronized method to obtain a wrapper that was moderately
thread-safe.

Without that hidden mutex, the problem without synchronization in C++
(with a size field as an implementation detail, that is incremented
before actually doing the push; translate calls to C# as required):

| Core push-thread #1 cache, q.empty() | --- | main RAM | --- | Core
pop-thread cache, processing former last record |
| Core push-thread #1 cache, 1==q.size(), main queue inconsistent |
--- | main RAM, q.empty() | --- | Core pop-thread cache, processing
former last record |
| Core push-thread #1 cache, 1==q.size(), main queue inconsistent |
--- | main RAM, 1==q.size(), main queue inconsistent | --- | Core pop-
thread cache, processing former last record |
| Core push-thread #1 cache, 1==q.size(), main queue inconsistent |
--- | main RAM, 1==q.size(), main queue inconsistent | --- | Core pop-
thread cache, wants to test !q.empty(), chooses to refresh cache from
main RAM |
| Core push-thread #1 cache, 1==q.size(), main queue inconsistent |
--- | main RAM, 1==q.size(), main queue inconsistent | --- | Core pop-
thread cache, 1==q.size(), main queue inconsistent; !q.empty() so
invokes q.front() | UNDEFINED BEHAVIOR

Worst-case scenario leaves the queue in a permanently inconsistent
state from the push and pop both trying to execute at once. All of
the above should translate into C# fairly directly for an
unsynchronized Queue.

Without an internal-detail size field, 0 < q.size() will have
undefined behavior in the pop-thread (from inconsistent state). !
q.empty() might be well-defined even then.

With a hidden mutex providing per-method synchronization, the sole pop-
thread is forced to wait until the push-thread completed and testing
q.empty() / 0 < q.size() works exactly as long as there is exactly one
pop-thread, no critical sections at all needed.

Sep 1 '08 #26

James Kanze

On Sep 1, 11:45 am, zaim...@zaimoni.com wrote:

On Sep 1, 2:34 am, NvrBst <nvr...@gmail.comwrote:

No - you are not correct. Imagine a situation where the
pushing thread pushes the object to the dequeue and then
increases the counter. However, the size get written
through the cache whereas the object stays (partly) in the
cache. Now the reader will see that there is an object in
the dequeue, but in reality it was never written to a
location that allows the other thread to see it (it is on
another core), so it reads rubbish where the object should
be.

I don't quite understand what you say ("on another core"
part), but I'm pretty sure it is safe. The pop'ing thread
is only taking the first element, and the push'ing thread is
appending to the end.

It's absolutely safe only if the std::deque has been specially
implemented so that each member function has "enough"
synchronization.

I don't think that that's possible, given the interface.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 1 '08 #27

James Kanze

On Sep 1, 6:49 pm, Jerry Coffin <jcof...@taeus.comwrote:

In article <638bd1ab-a5e5-4567-aa53-
9032bd487...@k37g2000hsf.googlegroups.com>, james.ka...@gmail.com
says...

[ ... ]

My advice would be to avoid std::deque in such a situation
-- in a multi-threaded situation, it places an undue
burden on the client code.

Avoid it, or wrap it? I use it regularly for communicating
between threads; my Queue class is based on it.

I'd generally avoid it. Wrapping it works, but presents trade
offs I rarely find useful.

I find the trade off of using tried and tested code, rather than
having to write it myself, useful. (Of course, if you already
had your basic queue working before STL came along...)

The problem with std::deque is that it can and will resize
itself when you add an item and there's not currently room to
store that item. Doing so, however, normally involves
allocating a block of memory from the free store. As such,
this operation can be fairly slow. That translates to a fairly
substantial amount of time that's being spent just in storing
the data rather than processing the data. It only really makes
sense when your work items are sufficiently coarse-grained
that the dynamic allocation will be substantially faster than
processing a single work item.

I suppose that that could be an issue in some cases. For the
most part, my queues never contain very many entries at one
time, although there is a lot of pushing and popping, and the
applications run for very long times (sometimes years without
stopping). Which means that the queue effectively reaches its
largest size very quickly, and then there is no more dynamic
allocation from it. (On the other hand, my messages tend to be
polymorphic, and dynamically allocated.)

I generally prefer to keep the processing sufficiently fine
grained that when/if the queue is full, it's likely to be
faster to wait for an existing item to be processed than to
allocate more memory for the queue. This devotes nearly all
CPU time to processing the data instead of allocating memory
to store it.

I'll admit that I've never had any performance problems with
deque in this regard, although I can easily imagine applications
where it might be the case.

Now, my own work with multiple threads has mostly been fairly
easy to break up into relatively small "chunks", and has been
mostly quite processor intensive. Those mean that 1) giving up
processor time means the consumer side can run faster, and 2)
the time it would take to allocate memory for queue would
exceed the time taken to remove (at least) one item from the
queue.

OTOH, I can certainly imagine situations where those weren't
true. An obvious example would be something like a network
router. In this case, the "consumer" side is basically sending
out network packets. More CPU time won't speed up network
transmissions, and transmitting a single network packet
undoubtedly takes longer than allocating a block of memory.

Yes. One of the advantages of forwarding to a queue, and
letting another process handle it, is that you can get back to
listening at the socket that much faster. And since the system
buffers tend to have a fairly small, fixed size...

And if you replace buffer, in_pos and out_pos with
std::deque<T>, where's the problem.

There isn't necessarily a problem -- but I think the choice
between the two is one of some real trade offs rather than
just a matter of whether you manipulate the pointers into the
queue yourself, or use some pre-written code to do the job.

Of course, you can/could also use reserve() and size() on the
deque to manage it as a fixed-size container, but I don't
think at that point you're really gaining much by using a
deque at all. OTOH, it might still make sense when/if you had
both fixed- and variable-sized queues, and this allowed you to
share most code between the two.

In the end, the main gain was that I didn't have to write any
code to manage the queue. And that other people, maintaining
the software, didn't have to understand code that I'd written.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 1 '08 #28

Jerry Coffin

In article <e78577f0-da15-42c9-96ad-63e4bc2c7638
@w7g2000hsa.googlegroups.com>, ja*********@gmail.com says...

[ ... ]

I'd generally avoid it. Wrapping it works, but presents trade
offs I rarely find useful.

I find the trade off of using tried and tested code, rather than
having to write it myself, useful. (Of course, if you already
had your basic queue working before STL came along...)

The problem is that the part that's tried and tested is downright
trivial compared to what's not -- and at least from the looks of things
in this thread, using that tried and tested code makes it even more
difficult to get the hard part right, so I suspect it's a net loss.

....and yes, if memory serves, my queue code was originally written under
OS/2 1.2 or thereabouts (originally in C, which still shows to some
degree).

[ ... ]

I suppose that that could be an issue in some cases. For the
most part, my queues never contain very many entries at one
time, although there is a lot of pushing and popping, and the
applications run for very long times (sometimes years without
stopping). Which means that the queue effectively reaches its
largest size very quickly, and then there is no more dynamic
allocation from it.

In fairness, though I hadn't really thought about it previously, it's
likely that when I've profiled it, I've tended to use relatively short
runs, which will tend to place an undue emphasis on the initialization,
so it's likely to make less difference overall than it may have looked
like. In any case, I'd agree that it's rarely likely to be a huge factor
in any case.

[ ... ]

In the end, the main gain was that I didn't have to write any
code to manage the queue. And that other people, maintaining
the software, didn't have to understand code that I'd written.

I suppose it might make a difference, but I have a hard time believing
that anybody who could understand the semaphore part would have to pause
for even a moment to understand the queue part. Duplicating it without
any fence-post errors might take a little longer, but even that's still
not exactly rocket science.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 3 '08 #29

James Kanze

On Sep 3, 3:00 am, Jerry Coffin <jcof...@taeus.comwrote:

In article <e78577f0-da15-42c9-96ad-63e4bc2c7638
@w7g2000hsa.googlegroups.com>, james.ka...@gmail.com says...

[ ... ]

I'd generally avoid it. Wrapping it works, but presents
trade offs I rarely find useful.

I find the trade off of using tried and tested code, rather
than having to write it myself, useful. (Of course, if you
already had your basic queue working before STL came
along...)

The problem is that the part that's tried and tested is
downright trivial compared to what's not -- and at least from
the looks of things in this thread, using that tried and
tested code makes it even more difficult to get the hard part
right, so I suspect it's a net loss.

I don't see where the actual implementation of the queue would
affect the rest, unless you're doing some tricky optimizations
to make it lock-free. (std::deque is obvious not lock-free; you
need locks around the accesses to it.) And your code didn't
seem to be trying to implement a lock free queue.

On the other hand, since I work under Unix, my queues used a
condition rather than semaphores, so the issues are somewhat
different, but basically, all I did was copy the code for the
wait and send from Butenhof, and stick an std::deque in as the
queue itself. You can't get much simpler.

...and yes, if memory serves, my queue code was originally
written under OS/2 1.2 or thereabouts (originally in C, which
still shows to some degree).

Which explains why you find the implementation of the queue so
simple:-). Most things are simple when you've been maintaining
them for so many years. My experience is that unless you
implement the queue trivially, using something like std::list,
getting the border conditions right isn't always that obvious
(but once you've got them right, the code is trivial).

[ ... ]

In the end, the main gain was that I didn't have to write any
code to manage the queue. And that other people, maintaining
the software, didn't have to understand code that I'd written.

I suppose it might make a difference, but I have a hard time
believing that anybody who could understand the semaphore part
would have to pause for even a moment to understand the queue
part. Duplicating it without any fence-post errors might take
a little longer, but even that's still not exactly rocket
science.

Well, my own implementation uses condition variables, rather
than semaphores. But it's really, really simple. FWIW
(modified to use Boost, rather than my home build classes, and
with support for handling timeouts removed):

class QueueBase
{
public:

QueueBase() ;
~QueueBase() ;

void send( void* message ) ;
void* receive() ;

bool isEmpty() const ;

private:
boost::mutex mutex ;
boost::condition_variable
cond ;
std::deque< void* queue ;
} ;

template< typename T >
class Queue : public QueueBase
{
public:

~Queue()
{
while ( ! isEmpty() ) {
receive() ;
}
}

void send( std::auto_ptr< T message )
{
QueueBase::send( message.get() ) ;
message.release() ;
}

std::auto_ptr< T receive()
{
return std::auto_ptr< T >(
static_cast< T* >( QueueBase::receive() ) ) ;
}
} ;

QueueBase::QueueBase()
{
}

QueueBase::~QueueBase()
{
assert( queue.empty() ) ;
}

void
QueueBase::send(
void* message )
{
boost::unique_lock< boost::mutex >
lock( mutex ) ;
queue.push_back( message ) ;
cond.notify_one() ;
}

void*
QueueBase::receive()
{
boost::unique_lock< boost::mutex >
lock( mutex ) ;
while ( queue.empty() ) {
cond.wait( lock ) ;
}
void* result = queue.front() ;
queue.pop_front() ;
return result ;
}

bool
QueueBase::isEmpty() const
{
boost::unique_lock< boost::mutex >
lock( mutex ) ;
return queue.empty() ;
}

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 3 '08 #30

std::deque Thread Saftey Situtation

Similar topics