473,385 Members | 1,356 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Problem: 'Threads' in Python?

Hi,
i've got a small problem with my python-script. It is a cgi-script, which is
called regulary (e.g. every 5 minutes) and returns a xml-data-structure.
This script calls a very slow function, with a duration of 10-40 seconds. To
avoid delays, i inserted a cache for the data. So, if the script is called,
it returns the last caculated data-structure and then the function is called
again and the new data is stored in the cache. (There is no problem to use
older but faster data)

My problem is, that the client (A Java program (or browser, command line))
waits, until the whole script has ended and so the cache is worthless. How
can I tell the client/browser/... that after the last print line there is no
more data and it can proceed? Or how can I tell the python script, that
everything after the return of the data (the retieval of the new data and
the storage in a file) can be done in an other thread or in the background?

Greetings

Ralph
Jul 18 '05 #1
5 2057

Ralph Sluiters wrote in message ...
Hi,
i've got a small problem with my python-script. It is a cgi-script, which iscalled regulary (e.g. every 5 minutes) and returns a xml-data-structure.
This script calls a very slow function, with a duration of 10-40 seconds. Toavoid delays, i inserted a cache for the data. So, if the script is called,
it returns the last caculated data-structure and then the function is calledagain and the new data is stored in the cache. (There is no problem to use
older but faster data)

My problem is, that the client (A Java program (or browser, command line))
waits, until the whole script has ended and so the cache is worthless. How
can I tell the client/browser/... that after the last print line there is nomore data and it can proceed? Or how can I tell the python script, that
everything after the return of the data (the retieval of the new data and
the storage in a file) can be done in an other thread or in the background?


Wouldn't a better approach be to decouple the cache mechanism from the cgi
script? Have a long-running Python process act as a memoizing cache and
delegate requests to the slow function. The cgi scripts then connect to
this cache process (via your favorite IPC mechanism). If the cache process
has a record of the call/request, it returns the previous value immediately,
and updates its cache in the meantime. If it doesn't have a record, then it
blocks the cgi script until it gets a result.

How can threading help you if the cgi-process dies after each request unless
you store the value somewhere else? And if you store the value somewhere,
why not have another process manage that storage? If it's possible to
output a complete page before the cgi script terminates (I don't know if the
server blocks until the script terminates), then you could do the cache
updating afterwards. In this case I guess you could use a pickled
dictionary or something as your cache, and you don't need a separate
process. But even here you wouldn't necessarily use threads.

Threads are up there with regexps: powerful, but avoid as much as possible.
--
Francis Avila

Jul 18 '05 #2
> Wouldn't a better approach be to decouple the cache mechanism from the cgi
script? Have a long-running Python process act as a memoizing cache and
delegate requests to the slow function. The cgi scripts then connect to
this cache process (via your favorite IPC mechanism). If the cache process has a record of the call/request, it returns the previous value immediately, and updates its cache in the meantime. If it doesn't have a record, then it blocks the cgi script until it gets a result. The caching can not be decoupled, because the cgi-script gets an folder ID
gets only data from this "folder". So if I decouple die processes, I don't
know which folders to cache and I can not cache all folders, because the
routine is to slow. So I must get the actual folder from cgi and then cache
this one as long as the uses is in this folder and pulls data every 2
Minutes and cache another folder, if
the uses changes his folder.
How can threading help you if the cgi-process dies after each request unless you store the value somewhere else? And if you store the value somewhere,
why not have another process manage that storage? If it's possible to
output a complete page before the cgi script terminates (I don't know if the server blocks until the script terminates), then you could do the cache
updating afterwards. In this case I guess you could use a pickled
dictionary or something as your cache, and you don't need a separate
process. But even here you wouldn't necessarily use threads.

The data is to large to store it in the memmory and with this method, as you
said, threading wouldn't help, but I store the data in the disk.

My code:

#Read from file
try:
oldfile = open(filename,"r")
oldresult =string.joinfields(oldfile.readlines(),'\r\n')
oldfile.close()
except:
# Start routine
oldresult = get_data(ID) # Get xml data
# Print header, so that it is returned via HTTP
print string.joinfields(header, '\r\n')
print oldresult

# ***

# Start routine
result = get_data(ID) # Get xml data
#Save to file
newfile = open(filename, "w")
newfile.writelines(result)
newfile.close()
#END

At the position *** the rest of the script must be uncoupled, so that the
client can proceed with the actual data, but the new data generation for the
next time ist stored in a file.

Ralph
Jul 18 '05 #3
Ralph Sluiters fed this fish to the penguins on Tuesday 06 January 2004
02:07 am:

The caching can not be decoupled, because the cgi-script gets an
folder ID gets only data from this "folder". So if I decouple die
processes, I don't know which folders to cache and I can not cache all
folders, because the routine is to slow. So I must get the actual
folder from cgi and then cache this one as long as the uses is in this
folder and pulls data every 2 Minutes and cache another folder, if
the uses changes his folder.
I've been having some difficulty following this thread but...

Isn't this what Cookies are for? Obtaining some sort of user ID/state
that can be passed into the processing to allow for continuing from a
previous connection?

HTTP is normally stateless. The client requests a page, the page
contents are obtained (either a static page, or some CGI-style
computation generates the immediate page data), the page is returned,
and the connection ends. If the page needs to be updated, that is a
completely separate transaction.

Cookies are used to link these separate transactions into one "whole";
the first time the client requests the page, a cookie is generated. On
subsequent requests (updates) the (now) existing cookie is sent back to
the server to identify the user and allow for selecting the proper
continuation state.

At the position *** the rest of the script must be uncoupled, so that
the client can proceed with the actual data, but the new data
generation for the next time ist stored in a file.
I've not coded CGI stuff (don't have access to a server that permits
user CGI) but my rough view of this task would be:

CGI******
if no cookie
generate a cookie for this user
endif
pass (received or generated) cookie to background process
wait for return-data from background process (if a new cookie, this
will take time to compute, otherwise the background process should
already have computed it)
return web-page with cookie and data

Background********
loop
scan "cache" list for expired cookies (unused threads)
terminate related process thread (process thread should clean up disk
files used)
clean up (delete) cookie from "cache" list
get request (and cookie) from CGI
if the cookie is not in the "cache" list
create new processing thread
endif
Use cookie data to identify (existing) processing thread and read next
data batch from it (queue.queue perhaps, one queue per cookie).
Return data (processing thread continues to compute next update)
endloop
You probably want to include, in "Background" a bit of logic to track
"last request time" and terminate processing threads if no client has
asked for an update in some period of time. The Cookies should also
have expiration times associated so that reconnecting after a period of
time will force a new cookie.

As for the folder? If the user physically navigates to other folders,
that can be passed to the background process and used to update the
threads (or create a new thread, if you assume the cookie identifies a
folder).

Caching would be semi-automatic here. The processing threads could be
folder specific, and when the thread is terminated (on lack of update
requests... let's see, you expect 2-minute update period, allow for a
slow net, say you terminate a process after 5 minutes of disuse...) you
can clean up the disk space (folder) that process was using. The cookie
expiration time would be updated on each update.

The master web page should have whatever HTML tags force a timed
reload to do a new request every 2 minutes.

-- ================================================== ============ <
wl*****@ix.netcom.com | Wulfraed Dennis Lee Bieber KD6MOG <
wu******@dm.net | Bestiaria Support Staff <
================================================== ============ <
Bestiaria Home Page: http://www.beastie.dm.net/ <
Home Page: http://www.dm.net/~wulfraed/ <


Jul 18 '05 #4
You did everything, but not answer my question. I know what cookies are, but
I don't need cookies here. And you said in your answer "start background
process", that was my question. How can I start a background process.

But I've solved it now,

Ralph
Jul 18 '05 #5
Simply put the last part in an extra file 'cachedata.py', then use

import os
os.spawnlp(os.P_NOWAIT, 'python', 'python', 'cachedata.py')

to call this as child process and DON'T wait for this process.

Ralph
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Gardner Pomper | last post by:
Hi, I am pretty new to python, so be gentle :) I have a python script that spawns a number of threads (configurable) on a 12 processor AIX RISC-6000 machine. It works fine, so long as I am...
5
by: Garry Hodgson | last post by:
a colleague of mine has seen an odd problem in some code of ours. we initially noticed it on webware, but in distilling a test case it seems to be strictly a python issue. in the real system, it...
0
by: Atul Kshirsagar | last post by:
I am embedding python in my C++ application. I am using Python *2.3.2* with a C++ extention DLL in multi-threaded environment. I am using SWIG-1.3.19 to generate C++ to Python interface. Now to...
2
by: Holger Joukl | last post by:
Hi, migrating from good old python 1.5.2 to python 2.3, I have a problem running a program that features some threads which execute calls to an extension module. Problem is that all of a sudden,...
3
by: Ronan Viernes | last post by:
Hi, I have created a python script (see below) to count the maximum number of threads per process (by starting new threads continuously until it breaks). ###### #testThread.py import...
8
by: Alban Hertroys | last post by:
Hello, I'm using psycopg to insert records in a number of threads. After the threads finish, another thread runs to collect the inserted data. Now, my problem is that psycopg let's my threads...
12
by: Gurpreet Sachdeva | last post by:
I have written a code to figure out the difference in excecution time of a func before and after using threading... #!/usr/bin/env python import threading import time loops =
2
by: Ugo Di Girolamo | last post by:
I have the following code, that seems to make sense to me. However, it crashes about 1/3 of the times. My platform is Python 2.4.1 on WXP (I tried the release version from the msi and...
6
by: nikhilketkar | last post by:
What are the implications of the Global Interpreter Lock in Python ? Does this mean that Python threads cannot exploit a dual core processor and the only advantage of using threads is in that...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.