473,321 Members | 1,622 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,321 software developers and data experts.

A little threading problem

Hello all,

I need your wisdom again. I'm working on a multi-threaded application
that handles multiple data sources in small batches each time. The idea
is that there are 3 threads that run simultaneously, each read a fixed
number of records, and then they wait for eachother. After that the main
thread does some processing, and the threads are allowed to continue
reading data.

I summarized this part of the application in the attached python script,
which locks up rather early, for reasons that I don't understand (I
don't have a computer science education), and I'm pretty sure the
problem is related to what I'm trying to fix in my application. Can
anybody explain what's happening (Or maybe even show me a better way of
doing this)?

Regards,

Alban Hertroys,
MAG Productions.

Jul 18 '05 #1
4 1252
Jeremy Jones wrote:
Alban Hertroys wrote: Notify is called before thread B (in this case) hits the
condAllowed.wait() piece of code. So, it sits at that wait() for
forever (because it doesn't get notified, because the notification
already happened), waiting to be notified from the main thread, and the
main thread is waiting on thread B (again, in this case) to call
mainCond.notify(). This approach is a deadlock just wanting to happen
(not waiting, because it already did happen). What is it exactly that
you are trying to accomplish? I'm sure there is a better approach.
Hmm, I already learned something I didn't know by reading through my own
version of the output.
I added an extra counter, printed before waiting in the for-loop in
Main. My wrong assumption was that acquire() would block other threads
from acquiring until release() was called. In that case the for-loop
would wait 3 times (once for each thread), which is what I want.
Unfortunately, in my output I see this:

T-A: acquire mainCond
....
T-B: acquire mainCond
....
T-B: released mainCond
....
T-A: released mainCond

Which is exactly what I was trying to prevent...
But even then, as you pointed out, there is still the possibility that
one of the threads sends a notify() while the main loop isn't yet
waiting, no matter how short the timespan is that it's not waiting...
As for what I'm trying to do; I'm trying to merge three huge XML files
into single seperate database records. Every database record contains
related data from each of the XML files.
For practical purposes this is a two-stage process, where I first store
an uncombined "record" from each XML file into the DB (marked as
'partial'), and then periodicaly merge the related records into one
final record.

I could first store all data as 'partial' and then merge everything, but
I consider it better to do this with relatively small batches at a time
(queries are faster with smaller amounts of data, and the DB stays
smaller too).
The reason I use threads for this is that (to my knowledge) it is not
possible to pause an xml.parsers.xmlproc.xmlproc.Application object once
it starts parsing XML, but I can pause a thread.

This is a timeline of what I'm trying to do:

Main start |combine XML |comb.
|next batch |next
Application A run>..............*| | >...........*| | etc.
Application B run>.........*| | >..............*|
Application C run>................*| >..........*|

Legend: = thread is active

* = batch ready, wait()
| = timeline delimiter

Jul 18 '05 #2
Jeremy Jones wrote:
(not waiting, because it already did happen). What is it exactly that
you are trying to accomplish? I'm sure there is a better approach.


I think I saw at least a bit of the light, reading up on readers and
writers (A colleague showed up with a book called "Operating system
concepts" that has a chapter on process synchronization).
It looks like I should be writing and reading 3 Queues instead of trying
to halt and pause the threads explicitly. That looks a lot easier...

Thanks for pointing out the problem area.
Jul 18 '05 #3
Alban Hertroys wrote:
Jeremy Jones wrote:
(not waiting, because it already did happen). What is it exactly
that you are trying to accomplish? I'm sure there is a better approach.

I think I saw at least a bit of the light, reading up on readers and
writers (A colleague showed up with a book called "Operating system
concepts" that has a chapter on process synchronization).
It looks like I should be writing and reading 3 Queues instead of
trying to halt and pause the threads explicitly. That looks a lot
easier...

Thanks for pointing out the problem area.


That's actually along the lines of what I was going to recommend after
getting more detail on what you are doing. A couple of things that may
(or may not) help you are:

* the Queue class in the Python standard library has a "maxsize"
parameter. When you create a queue, you can specify how large you want
it to grow. You can have your three threads busily parsing XML and
extracting data from it and putting it into a queue and when there are a
total of "maxsize" items in the queue, the next put() call (to put data
into the queue) will block until the consumer thread has reduced the
number of items in the queue. I've never used
xml.parsers.xmlproc.xmlproc.Application, but looking at the data, it
seems to resemble a SAX parser, so you should have no problem putting
(potentially blocking) calls to the queue into your handler. The only
thing this really buys you won't have read the whole XML file into memory.
* the get method on a queue object has a "block" flag. You can
effectively poll your queues something like this:

#untested code
#a_done, b_done and c_done are just checks to see if that particular
document is done
while not (a_done and b_done and c_done):
got_a, got_b, got_c = False, False, False
item_a, item_b, item_c = None, None, None
while (not a_done) and (not got_a):
try:
item_a = queue_a.get(0) #the 0 says don't block and raise an
Empty exception if there's nothing there
got_a = True
except Queue.Empty:
time.sleep(.3)
while (not b_done) and (not got_b):
try:
item_b = queue_b.get(0)
got_a = True
except Queue.Empty:
time.sleep(.3)
while (not c_done) and (not got_c):
try:
item_c = queue_c.get(0)
got_c = True
except Queue.Empty:
time.sleep(.3)
put_into_database_or_whatever(item_a, item_b, item_c)

This will allow you to deal with one item at a time and if the xml files
are different sizes, it should still work - you'll just pass None to
put_into_database_or_whaver for that particular file.

HTH.

Jeremy Jones
Jul 18 '05 #4
Jeremy Jones wrote:
* the get method on a queue object has a "block" flag. You can
effectively poll your queues something like this:

#untested code
#a_done, b_done and c_done are just checks to see if that particular
document is done
while not (a_done and b_done and c_done):
got_a, got_b, got_c = False, False, False
item_a, item_b, item_c = None, None, None
while (not a_done) and (not got_a):
try:
item_a = queue_a.get(0) #the 0 says don't block and raise an
Empty exception if there's nothing there


Actually, it is just fine to let get() block, as long as I put(None) on
the queue when I reach document_end and test for it (removing it from
the "list of queues to read" when get() returns 'None').

I rewrote my test script (the one I sent to the NG) to use Queues this
way, and it works well. It's also a lot easier to read/follow. Currently
I'm implementing it in my application.
I'm glad I don't get paid by the number of lines I write, there are
going to be less lines at the end of today ;)

Thanks a lot for the pointers.
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
by: Jane Austine | last post by:
As far as I know python's threading module models after Java's. However, I can't find something equivalent to Java's interrupt and isInterrupted methods, along with InterruptedException....
17
by: Andrae Muys | last post by:
Found myself needing serialised access to a shared generator from multiple threads. Came up with the following def serialise(gen): lock = threading.Lock() while 1: lock.acquire() try: next...
2
by: Egor Bolonev | last post by:
hi all my program terminates with error i dont know why it tells 'TypeError: run() takes exactly 1 argument (10 given)' =program==================== import os, os.path, threading, sys def...
7
by: John Salerno | last post by:
Ok, here's an attempt at something. I figure I can use this to let me know when my laundry's done! :) I'm hoping you guys can spot ways to make it better/cleaner/more efficient, etc. especially...
2
by: hecklar | last post by:
This is my first time posting here, so i apologize if i'm posting in the wrong subgroup or whatever, but here goes... I’m having a problem with threading and events (permissions?) in a VB.net...
8
by: cj | last post by:
I need more information to read about threading in Visual Basic. Hopefully something linear w/o too many links to get lost in. I've been to...
17
by: OlafMeding | last post by:
Below are 2 files that isolate the problem. Note, both programs hang (stop responding) with hyper-threading turned on (a BIOS setting), but work as expected with hyper-threading turned off. ...
2
by: Panard | last post by:
Hi, I'm experiencing a strange problem while trying to manage a ftp connection into a separate thread. I'm on linux, python 2.4.3 Here is a test : ------ ftp_thread.py ------ import ftplib...
6
by: Milsnips | last post by:
hi there, i created a little test application to send out multiple emails to addresses, but using threads. So anyway, what i've got is 1 form with a START button, and a multiline textbox for...
126
by: Dann Corbit | last post by:
Rather than create a new way of doing things: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2497.html why not just pick up ACE into the existing standard:...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.