473,466 Members | 1,406 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Running a python farm

What's the pythonic way of sending out a set of requests in parallel?

My program throws an image at the server and then waits for the result. I'm
currently using this bit of socket code to send an image to server on
another machine.
import socket
import SocketServer
import cPickle

def oneclient(host, port, array):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
sent = sock.sendall(cPickle.dumps(array, 0))
sock.shutdown(1)
rfile = sock.makefile('rb')
return cPickle.loads(rfile.read())
Processing the image can take several minutes and I have loads of images to
process, but I also have several servers available, so I'd like save some
time by distributing the images around the servers. So for 'n' servers,
throw 'n' images at them. Then as each server finishes, give it another
image to work on until I've done all the images.

What should I be looking for, for a pythonic solution? I'm stuck for
terminology? Threads? Load balancing? Processors farms? Any pointers or
suggestions welcome.

I don't have to do this too often, so I would prefer a simple solution over
performance.
Thanks.
--
"Thinks: I can't think of a thinks. End of thinks routine": Blue Bottle

** Aunty Spam says: Remove the trailing x from the To: field to reply **
Jul 18 '05 #1
5 2517
On Mon, Oct 27, 2003 at 02:28:05PM +0000, Ian McConnell wrote:
What's the pythonic way of sending out a set of requests in parallel?

My program throws an image at the server and then waits for the result. I'm
currently using this bit of socket code to send an image to server on
another machine.
import socket
import SocketServer
import cPickle

def oneclient(host, port, array):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
sent = sock.sendall(cPickle.dumps(array, 0))
sock.shutdown(1)
rfile = sock.makefile('rb')
return cPickle.loads(rfile.read())
Processing the image can take several minutes and I have loads of images to
process, but I also have several servers available, so I'd like save some
time by distributing the images around the servers. So for 'n' servers,
throw 'n' images at them. Then as each server finishes, give it another
image to work on until I've done all the images.


Probably the simplest way is to have n worker threads, and a main thread
that pushes the images to be processed onto a Queue.Queue instance.

So, your worker thread function might look something like (using your
function above):

def worker(host, port, queue):
while 1:
array = queue.get()
if array is None:
return
result = oneclient(host, port, array)
# do something with the result, perhaps store in a shared
# list or dictionary?

Then you just need a main function like:

import Queue, threading

servers = [('host1', 1111), ('host2', 2222)] # etc...

def main():
q = Queue.Queue()
workers = []
for host, port in servers:
t = threading.Thread(target=worker, args=(host, port, q))
t.start()
workers.append(t)

# assume we get arrays to be processed from somewhere
for array in arrays:
q.put(array)

# Tell the worker threads to stop when there's nothing left to do
for i in range(len(workers)):
q.put(None)

# Wait for all the treads to finish
for t in workers:
t.join()

# Do something with the results gathered by the workers
pass

-Andrew.
Jul 18 '05 #2

"Ian McConnell" <ia*@emit.demon.co.ukx> wrote in message
news:87************@emit.demon.co.uk...
What's the pythonic way of sending out a set of requests in parallel?

My program throws an image at the server and then waits for the result. I'm currently using this bit of socket code to send an image to server on
another machine.
import socket
import SocketServer
import cPickle

def oneclient(host, port, array):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
sent = sock.sendall(cPickle.dumps(array, 0))
sock.shutdown(1)
rfile = sock.makefile('rb')
return cPickle.loads(rfile.read())
Processing the image can take several minutes and I have loads of images to process, but I also have several servers available, so I'd like save some
time by distributing the images around the servers. So for 'n' servers,
throw 'n' images at them. Then as each server finishes, give it another
image to work on until I've done all the images.

What should I be looking for, for a pythonic solution? I'm stuck for
terminology? Threads? Load balancing? Processors farms? Any pointers or
suggestions welcome.

I don't have to do this too often, so I would prefer a simple solution over performance.
All the solutions I know of have adequate performance. Andrew's
suggestion of using a thread to handle coordination with each of the
server processors is the conventional textbook approach; there's
much to recommend doing it the way everyone else does.

I've frequently found it simpler to not deal with thread coordination
problems (which can get real nasty) by using the select or poll
services from the select module (7.3 in the 2.2.3 documentation).
This leads to an event driven style that is closer to GUI programming
than the procedure oriented style typical of threads.

Any way you do it, it's going to complicate your program.

John Roth
Thanks.

Jul 18 '05 #3
Ian McConnell wrote:
What's the pythonic way of sending out a set of requests in parallel?

My program throws an image at the server and then waits for the result. I'm
currently using this bit of socket code to send an image to server on
another machine.
import socket
import SocketServer
import cPickle

def oneclient(host, port, array):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
sent = sock.sendall(cPickle.dumps(array, 0))
sock.shutdown(1)
rfile = sock.makefile('rb')
return cPickle.loads(rfile.read())


You don't need threads here. We can leverage the fact that
select.select can poll to see if results are ready, how about something
like:

class Client:
def __init__(host, port, array):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.sock = sock
self.host = host
self.port = port
sock.connect((host, port))
sent = sock.sendall(cPickle.dumps(array, 0))
sock.shutdown(1)
self.rfile = sock.makefile('rb')

def result(self):
r,w,e = selelect.select([self.rfile], [], [], 1)
if not r:
return None
else:
return cPickle.loads(self.rfile.read())

all_images = [...]
hosts = [(host,port), (host1,port1)]
client_list = []
# assign images to hosts
for host, port in hosts:
client_list.append(Client(host, port, all_images.pop()))

# process
while client_list:
next_client_list = []
for o in client_list:
res = o.result()
if o is not None:
# do something with image result
if all_images: # if we have images assign to completed host
next_client_list.append(Client(o.host, o.port,
all_images.pop())
else:
next_client_list.append(o)
client_list = next_client_list

Jul 18 '05 #4
Ian McConnell <ia*@emit.demon.co.ukx> wrote in message news:<87************@emit.demon.co.uk>...
Processing the image can take several minutes and I have loads of images to
process, but I also have several servers available, so I'd like save some
time by distributing the images around the servers. So for 'n' servers,
throw 'n' images at them. Then as each server finishes, give it another
image to work on until I've done all the images.


That's all fine. In the short term, that's the way to get things done.
You many also want to look at xmlrpclib.

For longer term, you may want to do extra decoupling, to make things
more scalable/maintainable/fault-tolerant. A few ideas I got recently
were: (1) prepare tasks well enough so they can kind of run "off-line"
in a pool, you may use GUIDs (globally unique ids) to tag the tasks
(2) do remote-to-local calls instead of local-to-remote calls. This
means you need broadcast capability. (3) Write your own task manager,
an installable on all machines (local and remote), so that you can
dispatch programs and data files more easily, start and stop remote
services. For instance, you now may have a remote server to do image
processing, but what in the future you want to have servers to do
sound file processing? What if you need to upgrade your remote
programs?

regards,

Hung Jung
Jul 18 '05 #5
Ian McConnell <ia*@emit.demon.co.ukx> writes:
What's the pythonic way of sending out a set of requests in parallel?

My program throws an image at the server and then waits for the result. I'm
currently using this bit of socket code to send an image to server on
another machine.


Once again, thanks to all those who replied.

In the end I went with the select and poll method as I only have a small
number of machines to run my code on and this method fitted in easiest with
my existing program.

Actually, it also turns out that my code isn't quite as parallelised as I
thought it was and having results coming back in a different order confused
some of the later bookkeeping, so I've also done a version (pclient2) that
maintains the image order. Even with this limitation, I do get a good speed
up.

I'm sure this code can be made more efficient and is probably fairly easy to
deadlock, but it works for me.


import sys
import socket
import cPickle
import select
import string

class Machine:
def __init__(self, hostname):
colon = string.rfind(hostname, ':')
if colon >= 0:
self.host = hostname[0:colon]
self.port = int(hostname[colon+1:])
else:
self.host = hostname
self.port = 10000 # default port
# Should check for host being alive and possible endian issues
def connect(self):
return (self.host, self.port)
def __repr__(self):
return "%s:%d" % (self.host, self.port)

class Client:
def __init__(self, host, array, opts):
print 'NEW CLIENT', host, array.shape
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.host = host
sock.connect(host.connect())
sent = sock.sendall(cPickle.dumps((array, opts), 0))
sock.shutdown(1)
self.rfile = sock.makefile('rb')

def result(self):
r,w,e = select.select([self.rfile], [], [], 1)
if not r:
return None
else:
return cPickle.loads(self.rfile.read())
def __repr__(self):
return "%s" % self.host
def name(self):
return self.host
def pclient(hostnames, all_images, opts):
"""
hostnames is a list of machine names (as host:port) that are willing
to process data
all_image is a list of images
opts is a dictionary of processing options.
"""

client_list = []
result = []
# assign images to hosts
hosts = []
while len(hostnames) > 0:
host = Machine(hostnames.pop())
if host is not None:
hosts.append(host)
for host in hosts:
client_list.append(Client(host, all_images.pop(), opts))

# process
while client_list:
print len(client_list), 'clients busy'
next_client_list = []
for o in client_list:
res = o.result()
if res is not None:
result.append(res)
if all_images: # if we have images assign to completed host
next_client_list.append(Client(o.host, all_images.pop(),
opts))
else:
next_client_list.append(o)
client_list = next_client_list
return result

#
#
#
# Extra code to ensure results return in same order as submitted.
#
#
#
class fifo:
"""Simple implementation of a First-In-First-Out structure."""
def __init__(self):
self.in_stack = []
def push(self, obj):
self.in_stack.append(obj)
# print 'fifo push', self.in_stack
def pop(self):
return self.in_stack.pop(0)
def __repr__(self):
return str(self.in_stack)
def __len__(self):
return len(self.in_stack)
def head(self):
return self.in_stack[0]

class Workers:
def __init__(self, hosts, opts):
self.idle = hosts
self.busy = fifo()
self.opts = opts

def free(self):
return len(self.idle)

def newjob(self, array):
if array == [] or array == None:
# Delete idle list
self.idle = []
return
host = self.idle.pop()
self.busy.push(Client(host, array, self.opts))

def poll(self):
if len(self.busy) == 0:
return None
host = self.busy.head()
return host.result()

def done(self):
if len(self.busy) == 0:
return None
res = None
while res == None:
res = self.poll()
host = self.busy.pop()
self.idle.append(host.name())
print ' idle:', self.idle, ' busy:', self.busy
return res

def pclient2(hostnames, all_images, opts):
"""
hostnames is a list of machine names (as host:port) that are willing
to process data
all_image is a list of images
opts is a dictionary of processing options.
Returns results in same order as input.
"""
result = []
hosts = []
while len(hostnames) > 0:
host = Machine(hostnames.pop())
if host is not None:
hosts.append(host)
workers = Workers(hosts, opts)

while workers.free() > 0:
workers.newjob(all_images.pop())

print 'PCLIENT2 idle:', workers.idle, ' busy:', workers.busy
while 1:
res = workers.done()
if res == None:
break
result.append(res)
# Queue up another job if there are images left to process
if all_images:
workers.newjob(all_images.pop())
return result
--
"Thinks: I can't think of a thinks. End of thinks routine": Blue Bottle

** Aunty Spam says: Remove the trailing x from the To: field to reply **
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
by: Wolfgang Keller | last post by:
Hello, as a non-developer I am currently participating in an industrial "research" project to develop a so-called "web application". This application serves at the same time as middleware to...
53
by: john67 | last post by:
The company I work for is about to embark on developing a commercial application that will cost us tens-of-millions to develop. When all is said and done it will have thousands of business...
3
by: Steve | last post by:
Hi What is difference between web farm and web garden? What i understand so far Web Farm Multi Server Environment Support Session share Application object not supported Caching not...
53
by: Michael Tobis | last post by:
Someone asked me to write a brief essay regarding the value-add proposition for Python in the Fortran community. Slightly modified to remove a few climatology-related specifics, here it is. I...
2
by: ozcankanbur | last post by:
How can I run an asp.net project on distributed servers? (WEB SERVER / APPLICATION SERVER / DATABASE SERVER) My asp.net project runs with ui.dll,biz.dll and data.dll so you know all the assemblies...
852
by: Mark Tarver | last post by:
How do you compare Python to Lisp? What specific advantages do you think that one has over the other? Note I'm not a Python person and I have no axes to grind here. This is just a question for...
33
by: llothar | last post by:
I'm afraid that the GIL is killing the usefullness of python for some types of applications now where 4,8 oder 64 threads on a chip are here or comming soon. What is the status about that for...
1
by: =?Utf-8?B?VmlqYXkgQ2hpa3Rl?= | last post by:
Hi Experts, With Session Affinity and Web Server Farm on ISA Server 2006 accessing 2 backend IIS servers, I’m getting error “Validation of ViewState MAC failed. If this application is hosted...
0
by: =?Utf-8?B?UGhpbCBKb2huc29u?= | last post by:
Hi, We have an asp.net web application with an .ashx handler that is called from javascript from the same web app. We used to run the web app on one server and the ashx calls from the...
3
by: latif87 | last post by:
An example of how the XML file is structured: <xml> <farm name="NAME1"> <size x="INTEGER1" y="INTEGER2" /> <neighbor name="NAME2" /> <crop name="PLANT" area="INTEGER3" /> <crop … /> … </farm>...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.