I am a complete ignoramus and newbie when it comes to designing and
coding networked clients (or servers for that matter). I have a copy
of Goerzen (Foundations of Python Network Programming) and once
pointed in the best direction should be able to follow my nose and get
things sorted... but I am not quite sure which is the best path to
take and would be grateful for advice from networking gurus.
I am writing a program to display horse racing tote odds in a desktop
client program. I have access to an HTTP (open one of several URLs,
and I get back an XML doc with some data... not XML-RPC.) source of
XML data which I am able to parse and munge with no difficulty at all.
I have written and successfully tested a simple command line program
which allows me to repeatedly poll the server and parse the XML. Easy
enough, but the real world production complications are:
1) The data for the race about to start updates every (say) 15
seconds, and the data for earlier and later races updates only every
(say) 5 minutes. There is no point for me to be hammering the server
with requests every 15 seconds for data for races after the upcoming
race... I should query for this perhaps every 150s to be safe. But for
the upcoming race, I must not miss any updates and should query every
~7s to be safe. So... in the middle of a race meeting the situation
might be:
race 1 (race done with, no-longer querying), race 2 (race done with,
no longer querying) race 3 (about to start, data on server for this
race updating every 15s, my client querying every 7s), races 4-8 (data
on server for these races updating every 5 mins, my client querying
every 2.5 mins)
2) After a race has started and betting is cut off and there are
consequently no more tote updates for that race (it is possible to
determine when this occurs precisely because of an attribute in the
XML data), I need to stop querying (say) race 3 every 7s and remove
race 4 from the 150s query group and begin querying its data every 7s.
3) I need to dump this data (for all races, not just current about to
start race) to text files, store it as BLOBs in a DB *and* update real
time display in a wxpython windowed client.
My initial thought was to have two threads for the different update
polling cycles. In addition I would probably need another thread to
handle UI stuff, and perhaps another for dealing with file/DB data
write out. But, I wonder if using Twisted is a better idea? I will
still need to handle some threading myself, but (I think) only for
keeping wxpython happy by doing all this other stuff off the main
thread + perhaps also persisting received data in yet another thread.
I have zero experience with these kinds of design choices and would be
very happy if those with experience could point out the pros and cons
of each (synchronous/multithreaded, or Twisted) for dealing with the
two differing sample rates problem outlined above.
Many TIA! 12 2105
bullockbefriending bard wrote:
I am a complete ignoramus and newbie when it comes to designing and
coding networked clients (or servers for that matter). I have a copy
of Goerzen (Foundations of Python Network Programming) and once
pointed in the best direction should be able to follow my nose and get
things sorted... but I am not quite sure which is the best path to
take and would be grateful for advice from networking gurus.
I am writing a program to display horse racing tote odds in a desktop
client program. I have access to an HTTP (open one of several URLs,
and I get back an XML doc with some data... not XML-RPC.) source of
XML data which I am able to parse and munge with no difficulty at all.
I have written and successfully tested a simple command line program
which allows me to repeatedly poll the server and parse the XML. Easy
enough, but the real world production complications are:
1) The data for the race about to start updates every (say) 15
seconds, and the data for earlier and later races updates only every
(say) 5 minutes. There is no point for me to be hammering the server
with requests every 15 seconds for data for races after the upcoming
race... I should query for this perhaps every 150s to be safe. But for
the upcoming race, I must not miss any updates and should query every
~7s to be safe. So... in the middle of a race meeting the situation
might be:
race 1 (race done with, no-longer querying), race 2 (race done with,
no longer querying) race 3 (about to start, data on server for this
race updating every 15s, my client querying every 7s), races 4-8 (data
on server for these races updating every 5 mins, my client querying
every 2.5 mins)
2) After a race has started and betting is cut off and there are
consequently no more tote updates for that race (it is possible to
determine when this occurs precisely because of an attribute in the
XML data), I need to stop querying (say) race 3 every 7s and remove
race 4 from the 150s query group and begin querying its data every 7s.
3) I need to dump this data (for all races, not just current about to
start race) to text files, store it as BLOBs in a DB *and* update real
time display in a wxpython windowed client.
My initial thought was to have two threads for the different update
polling cycles. In addition I would probably need another thread to
handle UI stuff, and perhaps another for dealing with file/DB data
write out. But, I wonder if using Twisted is a better idea? I will
still need to handle some threading myself, but (I think) only for
keeping wxpython happy by doing all this other stuff off the main
thread + perhaps also persisting received data in yet another thread.
I have zero experience with these kinds of design choices and would be
very happy if those with experience could point out the pros and cons
of each (synchronous/multithreaded, or Twisted) for dealing with the
two differing sample rates problem outlined above.
Many TIA!
IMHO using twisted will give you the best performance and framework. Since it
uses callbacks for every request, your machine could handle a LOT of different
external queries and keep everything updated in WX. Might be a little tricky to
get working with WX, but I recall Googling for something like this not long ago
and there appeared to be sufficient information on how to get working. http://twistedmatrix.com/projects/co...g-reactor.html
Twisted even automatically uses threads to keep SQL database storage routines
from blocking (see Chapter 4 of Twisted Network Programming Essentials)
This is an ambitious project, good luck.
-Larry
HI, that does look like a lot of fun... You might consider breaking
that into 2 separate programs. Write one that's threaded to keep a db
updated properly, and write a completely separate one to handle
displaying data from your db. This would allow you to later change or
add a web interface without having to muck with the code that handles
data.
>
1) The data for the race about to start updates every (say) 15
seconds, and the data for earlier and later races updates only every
(say) 5 minutes. There is no point for me to be hammering the server
with requests every 15 seconds for data for races after the upcoming
Try using an HTTP HEAD instruction instead to check if the data has
changed since last time.
On Apr 27, 10:10*pm, David <wizza...@gmail.comwrote:
*1) The data for the race about to start updates every (say) 15
*seconds, and the data for earlier and later races updates only every
*(say) 5 minutes. There is *no point for me to be hammering the server
*with requests every 15 seconds for data for races after the upcoming
Try using an HTTP HEAD instruction instead to check if the data has
changed since last time.
Thanks for the suggestion... am I going about this the right way here?
import urllib2
request = urllib2.Request("http://get-rich.quick.com")
request.get_method = lambda: "HEAD"
http_file = urllib2.urlopen(request)
print http_file.headers
->>>
Age: 0
Date: Sun, 27 Apr 2008 16:07:11 GMT
Content-Length: 521
Content-Type: text/xml; charset=utf-8
Expires: Sun, 27 Apr 2008 16:07:41 GMT
Cache-Control: public, max-age=30, must-revalidate
Connection: close
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Via: 1.1 jcbw-nc3 (NetCache NetApp/5.5R4D6)
Date is the time of the server response and not last data update. Data
is definitely time of server response to my request and bears no
relation to when the live XML data was updated. I know this for a fact
because right now there is no active race meeting and any data still
available is static and many hours old. I would not feel confident
rejecting incoming data as duplicate based only on same content length
criterion. Am I missing something here?
Actually there doesn't seem to be too much difficulty performance-wise
in fetching and parsing (minidom) the XML data and checking the
internal (it's an attribute) update time stamp in the parsed doc. If
timings got really tight, presumably I could more quickly check each
doc's time stamp with SAX (time stamp comes early in data as one might
reasonably expect) before deciding whether to go the whole hog with
minidom if the time stamp has in fact changed since I last polled the
server.
But if there is something I don't get about HTTP HEAD approach, please
let me know as a simple check like this would obviously be a good
thing for me.
bullockbefriending bard wrote:
3) I need to dump this data (for all races, not just current about to
start race) to text files, store it as BLOBs in a DB *and* update real
time display in a wxpython windowed client.
Why in a BLOB? Why not into specific data types and normalized tables? You
can also save the BLOB for backup or auditing, but this won't allow you to
use your DB to the best of its capabilities... It will just act as a data
container, the same as a network share (which would not penalize you too
much to have connections open/closed).
On 2008-04-27, David <wi******@gmail.comwrote:
>> 1) The data for the race about to start updates every (say) 15 seconds, and the data for earlier and later races updates only every (say) 5 minutes. There is no point for me to be hammering the server with requests every 15 seconds for data for races after the upcoming
Try using an HTTP HEAD instruction instead to check if the data has
changed since last time.
Get If-Modified-Since is still better
( http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html 14.25)
--
Jarkko Torppa
I think twisted is overkill for this problem. Threading, elementtree
and urllib should more than suffice. One thread polling the server for
each race with the desired polling interval. Each time some data is
treated, that thread sends a signal containing information about what
changed. The gui listens to the signal and will, if needed, update
itself with the new information. The database handler also listens to
the signal and updates the db.
2008/4/27, bullockbefriending bard <ki*******@gmail.com>:
I am a complete ignoramus and newbie when it comes to designing and
coding networked clients (or servers for that matter). I have a copy
of Goerzen (Foundations of Python Network Programming) and once
pointed in the best direction should be able to follow my nose and get
things sorted... but I am not quite sure which is the best path to
take and would be grateful for advice from networking gurus.
I am writing a program to display horse racing tote odds in a desktop
client program. I have access to an HTTP (open one of several URLs,
and I get back an XML doc with some data... not XML-RPC.) source of
XML data which I am able to parse and munge with no difficulty at all.
I have written and successfully tested a simple command line program
which allows me to repeatedly poll the server and parse the XML. Easy
enough, but the real world production complications are:
1) The data for the race about to start updates every (say) 15
seconds, and the data for earlier and later races updates only every
(say) 5 minutes. There is no point for me to be hammering the server
with requests every 15 seconds for data for races after the upcoming
race... I should query for this perhaps every 150s to be safe. But for
the upcoming race, I must not miss any updates and should query every
~7s to be safe. So... in the middle of a race meeting the situation
might be:
race 1 (race done with, no-longer querying), race 2 (race done with,
no longer querying) race 3 (about to start, data on server for this
race updating every 15s, my client querying every 7s), races 4-8 (data
on server for these races updating every 5 mins, my client querying
every 2.5 mins)
2) After a race has started and betting is cut off and there are
consequently no more tote updates for that race (it is possible to
determine when this occurs precisely because of an attribute in the
XML data), I need to stop querying (say) race 3 every 7s and remove
race 4 from the 150s query group and begin querying its data every 7s.
3) I need to dump this data (for all races, not just current about to
start race) to text files, store it as BLOBs in a DB *and* update real
time display in a wxpython windowed client.
My initial thought was to have two threads for the different update
polling cycles. In addition I would probably need another thread to
handle UI stuff, and perhaps another for dealing with file/DB data
write out. But, I wonder if using Twisted is a better idea? I will
still need to handle some threading myself, but (I think) only for
keeping wxpython happy by doing all this other stuff off the main
thread + perhaps also persisting received data in yet another thread.
I have zero experience with these kinds of design choices and would be
very happy if those with experience could point out the pros and cons
of each (synchronous/multithreaded, or Twisted) for dealing with the
two differing sample rates problem outlined above.
Many TIA!
-- http://mail.python.org/mailman/listinfo/python-list
--
mvh Björn
On Apr 27, 11:27*pm, "BJörn Lindqvist" <bjou...@gmail.comwrote:
I think twisted is overkill for this problem. Threading, elementtree
and urllib should more than suffice. One thread polling the server for
each race with the desired polling interval. Each time some data is
treated, that thread sends a signal containing information about what
changed. The gui listens to the signal and will, if needed, update
itself with the new information. The database handler also listens to
the signal and updates the db.
So, if i understand you correctly:
Assuming 8 races and we are just about to start the race 1, we would
have 8 polling threads with the race 1 thread polling at faster rate
than the other ones. after race 1 betting closed, could dispense with
that thread, change race 2 thread to poll faster, and so on...? I had
been rather stupidly thinking of just two polling threads, one for the
current race and one for races not yet run... but starting out with a
thread for each extant race seems simpler given there then is no need
to handle the mechanics of shifting the polling of races from the
omnibus slow thread to the current race fast thread.
Having got my minidom parser working nicely, I'm inclined to stick
with it for now while I get other parts of the problem licked into
shape. However, I do take your point that it's probably overkill for
this simple kind of structured, mostly numerical data and will try to
find time to experiment with the elementtree approach later. No harm
at all in shaving the odd second off document parse times.
Date is the time of the server response and not last data update. Data
is definitely time of server response to my request and bears no
relation to when the live XML data was updated. I know this for a fact
because right now there is no active race meeting and any data still
available is static and many hours old. I would not feel confident
rejecting incoming data as duplicate based only on same content length
criterion. Am I missing something here?
It looks like the data is dynamically generated on the server, so the
web server doesn't know if/when the data changed. You will usually see
this for static content (images, html files, etc). You could go by the
Cache-Control line and only fetch data every 30 seconds, but it's
possible for you to miss some updates this way.
Another thing you could try (if necessary, this is a bit of an
overkill) - download the first part of the XML (GET request with a
range header), and check the timestamp you mentinoed. If that changed
then re-request the doc (a download resume is risky, the XML might
change between your 2 requests).
David.
3) I need to dump this data (for all races, not just current about to
start race) to text files, store it as BLOBs in a DB *and* update real
time display in a wxpython windowed client.
A few important questions:
1) How real-time must the display be? (should update immediately after
you get new XML data, or is it ok to update a few seconds later?).
2) How much data is being processed at peak? (100 records a second, 1000?)
3) Does your app need to share fetched data with other apps? If so,
how? (read from db, download HTML, RPC, etc).
4) Does your app need to use data from previous executions? (eg: if
you restart it, does it need to have a fully populated UI, or can it
start from an empty UI and start updating as it downloads new XML
updates).
How you answer the above questionss determines what kind of algorithm
will work best.
David.
PS: I suggest that you contact the people you're downloading the XML
from if you haven't already. eg: it might be against their TOS to
constantly scrape data (I assume not, since they provide XML). You
don't want them to black-list your IP address ;-). Also, maybe they
have ideas for efficient data retrieval (eg: RSS feeds).
Tempting thought, but one of the problems with this kind of horse
racing tote data is that a lot of it is for combinations of runners
rather than single runners. Whilst there might be (say) 14 horses in a
race, there are 91 quinella price combinations (1-2 through 13-14,
i.e. the 2-subsets of range(1, 15)) and 364 trio price combinations.
It is not really practical (I suspect) to have database tables with
columns for that many combinations?
If you normalise your tables correctly, these will be represented as
one-to many or many-to-many relationships in your database. Like the
other poster I don't know the first thing about horses, and I may be
misunderstanding something, but here is one (basic) normalised db
schema:
tables & descriptions:
- horse - holds info about each horse
- race - one record per race. Has times, etc
- race_hourse - holds records linking horses and races together.
You can derive all possible horse combinations from the above info.
You don't need to store it in the db unless you need to link something
else to it (eg: betting data). In which case:
- combination - represents one combination of horses.
- combination_horse - links a combinaition to 1 horse. 1 of these per
horse per combination.
- bet - Represents a bet. Has foreign relationship with combination
(and other tables, eg: better, race)
With a structure like the above you don't need hudreds of database columns :-)
David.
bullockbefriending bard wrote:
1) The data for the race about to start updates every (say) 15
seconds, and the data for earlier and later races updates only
every
(say) 5 minutes. There is no point for me to be hammering the
server with requests every 15 seconds for data for races after the
upcoming race... I should query for this perhaps every 150s to be
safe. But for the upcoming race, I must not miss any updates and
should query every
~7s to be safe. So... in the middle of a race meeting the
situation might be:
I don't fully understand this, but can't you design the server in a
way that you can connect to it and it notifies you about important
things? IMHO, polling isn't ideal.
My initial thought was to have two threads for the different
update polling cycles. In addition I would probably need another
thread to handle UI stuff, and perhaps another for dealing with
file/DB data write out.
No need for any additional threads. UI, networking and file I/O can
operate asynchronously. Using wxPython's timers with callback
functions, you should need only standard Python modules (except
wx).
But, I wonder if using Twisted is a better idea?
IMHO that's only advisable if you like to create own protocols and
reuse them in different apps, or need full-featured customisable
implementations of advanced protocols.
Additionally, you'd *have to* use multiple threads: One for the
Twisted event loop and one for the wxPython one.
There is a wxreactor in Twisted which integrates the wxPython event
loop, but I stopped using it due to strange deadlock problems which
began with some wxPython version. Also, it seems it's no more in
development. But my alternative works perfectly (main thread with
Twisted, and a GUI thread for wxPython, communicating over Python
standard queues).
You'd only need additional threads if you would do heavy number
crunching inside the wxPython or Twisted thread. For the respective
event loop not to hang, it's advisable to use a separate thread for
long-running calculations.
I have zero experience with these kinds of design choices and
would be very happy if those with experience could point out the
pros and cons of each (synchronous/multithreaded, or Twisted) for
dealing with the two differing sample rates problem outlined
above.
I'd favor "as few threads as neccessary" approach. In my experience
this saves pain (i. e. deadlocks and boilerplate queueing code).
Regards,
Björn
--
BOFH excuse #27:
radiosity depletion This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Neil Zanella |
last post by:
Hello,
I would be very interested in knowing how the following C++ multi-instance
singleton (AKA Borg) design pattern based code snippet can be neatly coded
in Python. While there may be...
|
by: Florian Weimer |
last post by:
I'd like to get up to speed with multi-paradigm design with C++. Can
you recommend any books on this topic? I'd prefer something which is
not tied to a particular compiler, treats ISO C++, and is...
|
by: forums_mp |
last post by:
I've come full circle on a design here. Consider the case where I've
got two modes of operation, uni-cast or multi-cast.
In either mode you can transmit(send) and/or receive.
The distinction...
|
by: Ivan Demkovitch |
last post by:
Hi!
This is a long one :-) Please correct me if I *did* something wrong and help
me with my questions (at the end)
I'm learning C# and .NET but good at SQL.
I'm working on portal and...
|
by: scottrm |
last post by:
I am fairly new to oo design and I am looking at developing an object
oriented asp.net application which will be built on top of a relational
database. I have read quite a bit of the theory but...
|
by: dbuchanan |
last post by:
I am designing a table to contain a field to stores a value that must
indicate none, one, or many choices. The form will have a group box
with checkboxes (multi-choice)
You see the records in...
|
by: none |
last post by:
Hi, I am looking for a tool to document and maintain a large data
model (500 tables).
Oracle Designer used to be the tool of choice for this sort of thing.
But given the investment of effort we...
|
by: raiya |
last post by:
hi, I'm a teacher and new ms access user. I'm intending to design an ms access db to post multiple choice questions each with 4 choices. I created 2 tables one for the questions and the other for the...
|
by: roN |
last post by:
Hi,
I'm creating a Website with divs and i do have some troubles, to make it
looking the same way in Firefox and IE (tested with IE7). I checked it with
the e3c validator and it says: "
This...
|
by: thecheyenne |
last post by:
Hi and good evening / good morning (depending on your location on this planet)
Despite my limited - o.k, non-existing - knowledge of vba, I'd like to design a database to help with the admin of...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: erikbower65 |
last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps:
1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal.
2. Connect to...
|
by: erikbower65 |
last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA:
1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
|
by: kcodez |
last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Sept 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: DJRhino1175 |
last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this -
If...
|
by: DJRhino |
last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer)
If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _
310030356 Or 310030359 Or 310030362 Or...
|
by: lllomh |
last post by:
How does React native implement an English player?
|
by: Mushico |
last post by:
How to calculate date of retirement from date of birth
| |