473,399 Members | 2,774 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

design choice: multi-threaded / asynchronous wxpython client?

I am a complete ignoramus and newbie when it comes to designing and
coding networked clients (or servers for that matter). I have a copy
of Goerzen (Foundations of Python Network Programming) and once
pointed in the best direction should be able to follow my nose and get
things sorted... but I am not quite sure which is the best path to
take and would be grateful for advice from networking gurus.

I am writing a program to display horse racing tote odds in a desktop
client program. I have access to an HTTP (open one of several URLs,
and I get back an XML doc with some data... not XML-RPC.) source of
XML data which I am able to parse and munge with no difficulty at all.
I have written and successfully tested a simple command line program
which allows me to repeatedly poll the server and parse the XML. Easy
enough, but the real world production complications are:

1) The data for the race about to start updates every (say) 15
seconds, and the data for earlier and later races updates only every
(say) 5 minutes. There is no point for me to be hammering the server
with requests every 15 seconds for data for races after the upcoming
race... I should query for this perhaps every 150s to be safe. But for
the upcoming race, I must not miss any updates and should query every
~7s to be safe. So... in the middle of a race meeting the situation
might be:
race 1 (race done with, no-longer querying), race 2 (race done with,
no longer querying) race 3 (about to start, data on server for this
race updating every 15s, my client querying every 7s), races 4-8 (data
on server for these races updating every 5 mins, my client querying
every 2.5 mins)

2) After a race has started and betting is cut off and there are
consequently no more tote updates for that race (it is possible to
determine when this occurs precisely because of an attribute in the
XML data), I need to stop querying (say) race 3 every 7s and remove
race 4 from the 150s query group and begin querying its data every 7s.

3) I need to dump this data (for all races, not just current about to
start race) to text files, store it as BLOBs in a DB *and* update real
time display in a wxpython windowed client.

My initial thought was to have two threads for the different update
polling cycles. In addition I would probably need another thread to
handle UI stuff, and perhaps another for dealing with file/DB data
write out. But, I wonder if using Twisted is a better idea? I will
still need to handle some threading myself, but (I think) only for
keeping wxpython happy by doing all this other stuff off the main
thread + perhaps also persisting received data in yet another thread.

I have zero experience with these kinds of design choices and would be
very happy if those with experience could point out the pros and cons
of each (synchronous/multithreaded, or Twisted) for dealing with the
two differing sample rates problem outlined above.

Many TIA!


Jun 27 '08 #1
12 2154
bullockbefriending bard wrote:
I am a complete ignoramus and newbie when it comes to designing and
coding networked clients (or servers for that matter). I have a copy
of Goerzen (Foundations of Python Network Programming) and once
pointed in the best direction should be able to follow my nose and get
things sorted... but I am not quite sure which is the best path to
take and would be grateful for advice from networking gurus.

I am writing a program to display horse racing tote odds in a desktop
client program. I have access to an HTTP (open one of several URLs,
and I get back an XML doc with some data... not XML-RPC.) source of
XML data which I am able to parse and munge with no difficulty at all.
I have written and successfully tested a simple command line program
which allows me to repeatedly poll the server and parse the XML. Easy
enough, but the real world production complications are:

1) The data for the race about to start updates every (say) 15
seconds, and the data for earlier and later races updates only every
(say) 5 minutes. There is no point for me to be hammering the server
with requests every 15 seconds for data for races after the upcoming
race... I should query for this perhaps every 150s to be safe. But for
the upcoming race, I must not miss any updates and should query every
~7s to be safe. So... in the middle of a race meeting the situation
might be:
race 1 (race done with, no-longer querying), race 2 (race done with,
no longer querying) race 3 (about to start, data on server for this
race updating every 15s, my client querying every 7s), races 4-8 (data
on server for these races updating every 5 mins, my client querying
every 2.5 mins)

2) After a race has started and betting is cut off and there are
consequently no more tote updates for that race (it is possible to
determine when this occurs precisely because of an attribute in the
XML data), I need to stop querying (say) race 3 every 7s and remove
race 4 from the 150s query group and begin querying its data every 7s.

3) I need to dump this data (for all races, not just current about to
start race) to text files, store it as BLOBs in a DB *and* update real
time display in a wxpython windowed client.

My initial thought was to have two threads for the different update
polling cycles. In addition I would probably need another thread to
handle UI stuff, and perhaps another for dealing with file/DB data
write out. But, I wonder if using Twisted is a better idea? I will
still need to handle some threading myself, but (I think) only for
keeping wxpython happy by doing all this other stuff off the main
thread + perhaps also persisting received data in yet another thread.

I have zero experience with these kinds of design choices and would be
very happy if those with experience could point out the pros and cons
of each (synchronous/multithreaded, or Twisted) for dealing with the
two differing sample rates problem outlined above.

Many TIA!

IMHO using twisted will give you the best performance and framework. Since it
uses callbacks for every request, your machine could handle a LOT of different
external queries and keep everything updated in WX. Might be a little tricky to
get working with WX, but I recall Googling for something like this not long ago
and there appeared to be sufficient information on how to get working.

http://twistedmatrix.com/projects/co...g-reactor.html

Twisted even automatically uses threads to keep SQL database storage routines
from blocking (see Chapter 4 of Twisted Network Programming Essentials)

This is an ambitious project, good luck.

-Larry
Jun 27 '08 #2
HI, that does look like a lot of fun... You might consider breaking
that into 2 separate programs. Write one that's threaded to keep a db
updated properly, and write a completely separate one to handle
displaying data from your db. This would allow you to later change or
add a web interface without having to muck with the code that handles
data.
Jun 27 '08 #3
>
1) The data for the race about to start updates every (say) 15
seconds, and the data for earlier and later races updates only every
(say) 5 minutes. There is no point for me to be hammering the server
with requests every 15 seconds for data for races after the upcoming
Try using an HTTP HEAD instruction instead to check if the data has
changed since last time.
Jun 27 '08 #4
On Apr 27, 10:10*pm, David <wizza...@gmail.comwrote:
*1) The data for the race about to start updates every (say) 15
*seconds, and the data for earlier and later races updates only every
*(say) 5 minutes. There is *no point for me to be hammering the server
*with requests every 15 seconds for data for races after the upcoming

Try using an HTTP HEAD instruction instead to check if the data has
changed since last time.
Thanks for the suggestion... am I going about this the right way here?

import urllib2
request = urllib2.Request("http://get-rich.quick.com")
request.get_method = lambda: "HEAD"
http_file = urllib2.urlopen(request)

print http_file.headers

->>>
Age: 0
Date: Sun, 27 Apr 2008 16:07:11 GMT
Content-Length: 521
Content-Type: text/xml; charset=utf-8
Expires: Sun, 27 Apr 2008 16:07:41 GMT
Cache-Control: public, max-age=30, must-revalidate
Connection: close
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Via: 1.1 jcbw-nc3 (NetCache NetApp/5.5R4D6)

Date is the time of the server response and not last data update. Data
is definitely time of server response to my request and bears no
relation to when the live XML data was updated. I know this for a fact
because right now there is no active race meeting and any data still
available is static and many hours old. I would not feel confident
rejecting incoming data as duplicate based only on same content length
criterion. Am I missing something here?

Actually there doesn't seem to be too much difficulty performance-wise
in fetching and parsing (minidom) the XML data and checking the
internal (it's an attribute) update time stamp in the parsed doc. If
timings got really tight, presumably I could more quickly check each
doc's time stamp with SAX (time stamp comes early in data as one might
reasonably expect) before deciding whether to go the whole hog with
minidom if the time stamp has in fact changed since I last polled the
server.

But if there is something I don't get about HTTP HEAD approach, please
let me know as a simple check like this would obviously be a good
thing for me.
Jun 27 '08 #5
bullockbefriending bard wrote:
3) I need to dump this data (for all races, not just current about to
start race) to text files, store it as BLOBs in a DB *and* update real
time display in a wxpython windowed client.
Why in a BLOB? Why not into specific data types and normalized tables? You
can also save the BLOB for backup or auditing, but this won't allow you to
use your DB to the best of its capabilities... It will just act as a data
container, the same as a network share (which would not penalize you too
much to have connections open/closed).
Jun 27 '08 #6
On 2008-04-27, David <wi******@gmail.comwrote:
>>
1) The data for the race about to start updates every (say) 15
seconds, and the data for earlier and later races updates only every
(say) 5 minutes. There is no point for me to be hammering the server
with requests every 15 seconds for data for races after the upcoming

Try using an HTTP HEAD instruction instead to check if the data has
changed since last time.
Get If-Modified-Since is still better
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html 14.25)

--
Jarkko Torppa
Jun 27 '08 #7
I think twisted is overkill for this problem. Threading, elementtree
and urllib should more than suffice. One thread polling the server for
each race with the desired polling interval. Each time some data is
treated, that thread sends a signal containing information about what
changed. The gui listens to the signal and will, if needed, update
itself with the new information. The database handler also listens to
the signal and updates the db.

2008/4/27, bullockbefriending bard <ki*******@gmail.com>:
I am a complete ignoramus and newbie when it comes to designing and
coding networked clients (or servers for that matter). I have a copy
of Goerzen (Foundations of Python Network Programming) and once
pointed in the best direction should be able to follow my nose and get
things sorted... but I am not quite sure which is the best path to
take and would be grateful for advice from networking gurus.

I am writing a program to display horse racing tote odds in a desktop
client program. I have access to an HTTP (open one of several URLs,
and I get back an XML doc with some data... not XML-RPC.) source of
XML data which I am able to parse and munge with no difficulty at all.
I have written and successfully tested a simple command line program
which allows me to repeatedly poll the server and parse the XML. Easy
enough, but the real world production complications are:

1) The data for the race about to start updates every (say) 15
seconds, and the data for earlier and later races updates only every
(say) 5 minutes. There is no point for me to be hammering the server
with requests every 15 seconds for data for races after the upcoming
race... I should query for this perhaps every 150s to be safe. But for
the upcoming race, I must not miss any updates and should query every
~7s to be safe. So... in the middle of a race meeting the situation
might be:
race 1 (race done with, no-longer querying), race 2 (race done with,
no longer querying) race 3 (about to start, data on server for this
race updating every 15s, my client querying every 7s), races 4-8 (data
on server for these races updating every 5 mins, my client querying
every 2.5 mins)

2) After a race has started and betting is cut off and there are
consequently no more tote updates for that race (it is possible to
determine when this occurs precisely because of an attribute in the
XML data), I need to stop querying (say) race 3 every 7s and remove
race 4 from the 150s query group and begin querying its data every 7s.

3) I need to dump this data (for all races, not just current about to
start race) to text files, store it as BLOBs in a DB *and* update real
time display in a wxpython windowed client.

My initial thought was to have two threads for the different update
polling cycles. In addition I would probably need another thread to
handle UI stuff, and perhaps another for dealing with file/DB data
write out. But, I wonder if using Twisted is a better idea? I will
still need to handle some threading myself, but (I think) only for
keeping wxpython happy by doing all this other stuff off the main
thread + perhaps also persisting received data in yet another thread.

I have zero experience with these kinds of design choices and would be
very happy if those with experience could point out the pros and cons
of each (synchronous/multithreaded, or Twisted) for dealing with the
two differing sample rates problem outlined above.

Many TIA!


--
http://mail.python.org/mailman/listinfo/python-list

--
mvh Björn
Jun 27 '08 #8
On Apr 27, 11:27*pm, "BJörn Lindqvist" <bjou...@gmail.comwrote:
I think twisted is overkill for this problem. Threading, elementtree
and urllib should more than suffice. One thread polling the server for
each race with the desired polling interval. Each time some data is
treated, that thread sends a signal containing information about what
changed. The gui listens to the signal and will, if needed, update
itself with the new information. The database handler also listens to
the signal and updates the db.
So, if i understand you correctly:

Assuming 8 races and we are just about to start the race 1, we would
have 8 polling threads with the race 1 thread polling at faster rate
than the other ones. after race 1 betting closed, could dispense with
that thread, change race 2 thread to poll faster, and so on...? I had
been rather stupidly thinking of just two polling threads, one for the
current race and one for races not yet run... but starting out with a
thread for each extant race seems simpler given there then is no need
to handle the mechanics of shifting the polling of races from the
omnibus slow thread to the current race fast thread.

Having got my minidom parser working nicely, I'm inclined to stick
with it for now while I get other parts of the problem licked into
shape. However, I do take your point that it's probably overkill for
this simple kind of structured, mostly numerical data and will try to
find time to experiment with the elementtree approach later. No harm
at all in shaving the odd second off document parse times.
Jun 27 '08 #9
Date is the time of the server response and not last data update. Data
is definitely time of server response to my request and bears no
relation to when the live XML data was updated. I know this for a fact
because right now there is no active race meeting and any data still
available is static and many hours old. I would not feel confident
rejecting incoming data as duplicate based only on same content length
criterion. Am I missing something here?
It looks like the data is dynamically generated on the server, so the
web server doesn't know if/when the data changed. You will usually see
this for static content (images, html files, etc). You could go by the
Cache-Control line and only fetch data every 30 seconds, but it's
possible for you to miss some updates this way.

Another thing you could try (if necessary, this is a bit of an
overkill) - download the first part of the XML (GET request with a
range header), and check the timestamp you mentinoed. If that changed
then re-request the doc (a download resume is risky, the XML might
change between your 2 requests).

David.
Jun 27 '08 #10
3) I need to dump this data (for all races, not just current about to
start race) to text files, store it as BLOBs in a DB *and* update real
time display in a wxpython windowed client.
A few important questions:

1) How real-time must the display be? (should update immediately after
you get new XML data, or is it ok to update a few seconds later?).

2) How much data is being processed at peak? (100 records a second, 1000?)

3) Does your app need to share fetched data with other apps? If so,
how? (read from db, download HTML, RPC, etc).

4) Does your app need to use data from previous executions? (eg: if
you restart it, does it need to have a fully populated UI, or can it
start from an empty UI and start updating as it downloads new XML
updates).

How you answer the above questionss determines what kind of algorithm
will work best.

David.

PS: I suggest that you contact the people you're downloading the XML
from if you haven't already. eg: it might be against their TOS to
constantly scrape data (I assume not, since they provide XML). You
don't want them to black-list your IP address ;-). Also, maybe they
have ideas for efficient data retrieval (eg: RSS feeds).
Jun 27 '08 #11
Tempting thought, but one of the problems with this kind of horse
racing tote data is that a lot of it is for combinations of runners
rather than single runners. Whilst there might be (say) 14 horses in a
race, there are 91 quinella price combinations (1-2 through 13-14,
i.e. the 2-subsets of range(1, 15)) and 364 trio price combinations.
It is not really practical (I suspect) to have database tables with
columns for that many combinations?
If you normalise your tables correctly, these will be represented as
one-to many or many-to-many relationships in your database. Like the
other poster I don't know the first thing about horses, and I may be
misunderstanding something, but here is one (basic) normalised db
schema:

tables & descriptions:

- horse - holds info about each horse
- race - one record per race. Has times, etc
- race_hourse - holds records linking horses and races together.

You can derive all possible horse combinations from the above info.
You don't need to store it in the db unless you need to link something
else to it (eg: betting data). In which case:

- combination - represents one combination of horses.
- combination_horse - links a combinaition to 1 horse. 1 of these per
horse per combination.
- bet - Represents a bet. Has foreign relationship with combination
(and other tables, eg: better, race)

With a structure like the above you don't need hudreds of database columns :-)

David.
Jun 27 '08 #12
bullockbefriending bard wrote:
1) The data for the race about to start updates every (say) 15
seconds, and the data for earlier and later races updates only
every
(say) 5 minutes. There is no point for me to be hammering the
server with requests every 15 seconds for data for races after the
upcoming race... I should query for this perhaps every 150s to be
safe. But for the upcoming race, I must not miss any updates and
should query every
~7s to be safe. So... in the middle of a race meeting the
situation might be:
I don't fully understand this, but can't you design the server in a
way that you can connect to it and it notifies you about important
things? IMHO, polling isn't ideal.
My initial thought was to have two threads for the different
update polling cycles. In addition I would probably need another
thread to handle UI stuff, and perhaps another for dealing with
file/DB data write out.
No need for any additional threads. UI, networking and file I/O can
operate asynchronously. Using wxPython's timers with callback
functions, you should need only standard Python modules (except
wx).
But, I wonder if using Twisted is a better idea?
IMHO that's only advisable if you like to create own protocols and
reuse them in different apps, or need full-featured customisable
implementations of advanced protocols.

Additionally, you'd *have to* use multiple threads: One for the
Twisted event loop and one for the wxPython one.

There is a wxreactor in Twisted which integrates the wxPython event
loop, but I stopped using it due to strange deadlock problems which
began with some wxPython version. Also, it seems it's no more in
development. But my alternative works perfectly (main thread with
Twisted, and a GUI thread for wxPython, communicating over Python
standard queues).

You'd only need additional threads if you would do heavy number
crunching inside the wxPython or Twisted thread. For the respective
event loop not to hang, it's advisable to use a separate thread for
long-running calculations.
I have zero experience with these kinds of design choices and
would be very happy if those with experience could point out the
pros and cons of each (synchronous/multithreaded, or Twisted) for
dealing with the two differing sample rates problem outlined
above.
I'd favor "as few threads as neccessary" approach. In my experience
this saves pain (i. e. deadlocks and boilerplate queueing code).

Regards,
Björn

--
BOFH excuse #27:

radiosity depletion

Jun 27 '08 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Neil Zanella | last post by:
Hello, I would be very interested in knowing how the following C++ multi-instance singleton (AKA Borg) design pattern based code snippet can be neatly coded in Python. While there may be...
3
by: Florian Weimer | last post by:
I'd like to get up to speed with multi-paradigm design with C++. Can you recommend any books on this topic? I'd prefer something which is not tied to a particular compiler, treats ISO C++, and is...
2
by: forums_mp | last post by:
I've come full circle on a design here. Consider the case where I've got two modes of operation, uni-cast or multi-cast. In either mode you can transmit(send) and/or receive. The distinction...
0
by: Ivan Demkovitch | last post by:
Hi! This is a long one :-) Please correct me if I *did* something wrong and help me with my questions (at the end) I'm learning C# and .NET but good at SQL. I'm working on portal and...
4
by: scottrm | last post by:
I am fairly new to oo design and I am looking at developing an object oriented asp.net application which will be built on top of a relational database. I have read quite a bit of the theory but...
12
by: dbuchanan | last post by:
I am designing a table to contain a field to stores a value that must indicate none, one, or many choices. The form will have a group box with checkboxes (multi-choice) You see the records in...
0
by: none | last post by:
Hi, I am looking for a tool to document and maintain a large data model (500 tables). Oracle Designer used to be the tool of choice for this sort of thing. But given the investment of effort we...
1
by: raiya | last post by:
hi, I'm a teacher and new ms access user. I'm intending to design an ms access db to post multiple choice questions each with 4 choices. I created 2 tables one for the questions and the other for the...
17
by: roN | last post by:
Hi, I'm creating a Website with divs and i do have some troubles, to make it looking the same way in Firefox and IE (tested with IE7). I checked it with the e3c validator and it says: " This...
2
by: thecheyenne | last post by:
Hi and good evening / good morning (depending on your location on this planet) Despite my limited - o.k, non-existing - knowledge of vba, I'd like to design a database to help with the admin of...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.