Threads vs Processes

Carl J. Van Arsdall

Alright, based a on discussion on this mailing list, I've started to
wonder, why use threads vs processes. So, If I have a system that has a
large area of shared memory, which would be better? I've been leaning
towards threads, I'm going to say why.

Processes seem fairly expensive from my research so far. Each fork
copies the entire contents of memory into the new process. There's also
a more expensive context switch between processes. So if I have a
system that would fork 50+ child processes my memory usage would be huge
and I burn more cycles that I don't have to. I understand that there
are ways of IPC, but aren't these also more expensive?

So threads seems faster and more efficient for this scenario. That
alone makes me want to stay with threads, but I get the feeling from
people on this list that processes are better and that threads are over
used. I don't understand why, so can anyone shed any light on this?
Thanks,

-carl

--

Carl J. Van Arsdall
cv*********@mvista.com
Build and Release
MontaVista Software

Jul 26 '06 #1

Subscribe Post Reply

3974

Chance Ginger

On Wed, 26 Jul 2006 10:54:48 -0700, Carl J. Van Arsdall wrote:

Alright, based a on discussion on this mailing list, I've started to
wonder, why use threads vs processes. So, If I have a system that has a
large area of shared memory, which would be better? I've been leaning
towards threads, I'm going to say why.

Processes seem fairly expensive from my research so far. Each fork
copies the entire contents of memory into the new process. There's also
a more expensive context switch between processes. So if I have a
system that would fork 50+ child processes my memory usage would be huge
and I burn more cycles that I don't have to. I understand that there
are ways of IPC, but aren't these also more expensive?

So threads seems faster and more efficient for this scenario. That
alone makes me want to stay with threads, but I get the feeling from
people on this list that processes are better and that threads are over
used. I don't understand why, so can anyone shed any light on this?
Thanks,

-carl

Not quite that simple. In most modern OS's today there is something
called COW - copy on write. What happens is when you fork a process
it will make an identical copy. Whenever the forked process does
write will it make a copy of the memory. So it isn't quite as bad.

Secondly, with context switching if the OS is smart it might not
flush the entire TLB. Since most applications are pretty "local" as
far as execution goes, it might very well be the case the page (or
pages) are already in memory.

As far as Python goes what you need to determine is how much
real parallelism you want. Since there is a global lock in Python
you will only execute a few (as in tens) instructions before
switching to the new thread. In the case of true process you
have two independent Python virtual machines. That may make things
go much faster.

Another issue is the libraries you use. A lot of them aren't
thread safe. So you need to watch out.

Chance

Jul 26 '06 #2

John Henry

Chance Ginger wrote:

On Wed, 26 Jul 2006 10:54:48 -0700, Carl J. Van Arsdall wrote:

Alright, based a on discussion on this mailing list, I've started to
wonder, why use threads vs processes. So, If I have a system that has a
large area of shared memory, which would be better? I've been leaning
towards threads, I'm going to say why.

Processes seem fairly expensive from my research so far. Each fork
copies the entire contents of memory into the new process. There's also
a more expensive context switch between processes. So if I have a
system that would fork 50+ child processes my memory usage would be huge
and I burn more cycles that I don't have to. I understand that there
are ways of IPC, but aren't these also more expensive?

So threads seems faster and more efficient for this scenario. That
alone makes me want to stay with threads, but I get the feeling from
people on this list that processes are better and that threads are over
used. I don't understand why, so can anyone shed any light on this?
Thanks,

-carl

Not quite that simple. In most modern OS's today there is something
called COW - copy on write. What happens is when you fork a process
it will make an identical copy. Whenever the forked process does
write will it make a copy of the memory. So it isn't quite as bad.

Secondly, with context switching if the OS is smart it might not
flush the entire TLB. Since most applications are pretty "local" as
far as execution goes, it might very well be the case the page (or
pages) are already in memory.

As far as Python goes what you need to determine is how much
real parallelism you want. Since there is a global lock in Python
you will only execute a few (as in tens) instructions before
switching to the new thread. In the case of true process you
have two independent Python virtual machines. That may make things
go much faster.

Another issue is the libraries you use. A lot of them aren't
thread safe. So you need to watch out.

Chance

It's all about performance (and sometimes the "perception" of
performance). Eventhough the thread support (and performance) in
Python is fairly weak (as explained by Chance), it's nonetheless very
useful. My applications threads a lot and it proves to be invaluable -
particularly with GUI type applications. I am the type of user that
gets annoyed very quickly and easily if the program doesn't respond to
me when I click something. So, as a rule of thumb, if the code has to
do much of anything that takes say a tenth of a second or more, I
thread.

I posted a simple demo program yesterday to the Pythoncard list to show
why somebody would want to thread an app. You can properly see it from
archive.

Jul 26 '06 #3

Paul Rubin

"Carl J. Van Arsdall" <cv*********@mvista.comwrites:

Processes seem fairly expensive from my research so far. Each fork
copies the entire contents of memory into the new process.

No, you get two processes whose address spaces get the data. It's
done with the virtual memory hardware. The data isn't copied. The
page tables of both processes are just set up to point to the same
physical pages. Copying only happens if a process writes to one of
the pages. The OS detects this using a hardware trap from the VM
system.

Jul 26 '06 #4

Russell Warren

Another issue is the libraries you use. A lot of them aren't

thread safe. So you need to watch out.

This is something I have a streak of paranoia about (after discovering
that the current xmlrpclib has some thread safety issues). Is there a
list maintained anywhere of the modules that are aren't thread safe?

Russ

Jul 26 '06 #5

Russell Warren

Oops - minor correction... xmlrpclib is fine (I think/hope). It is
SimpleXMLRPCServer that currently has issues. It uses
thread-unfriendly sys.exc_value and sys.exc_type... this is being
corrected.

Jul 26 '06 #6

Paddy

Carl J. Van Arsdall wrote:

Alright, based a on discussion on this mailing list, I've started to
wonder, why use threads vs processes. So, If I have a system that has a
large area of shared memory, which would be better? I've been leaning
towards threads, I'm going to say why.

Processes seem fairly expensive from my research so far. Each fork
copies the entire contents of memory into the new process. There's also
a more expensive context switch between processes. So if I have a
system that would fork 50+ child processes my memory usage would be huge
and I burn more cycles that I don't have to. I understand that there
are ways of IPC, but aren't these also more expensive?

So threads seems faster and more efficient for this scenario. That
alone makes me want to stay with threads, but I get the feeling from
people on this list that processes are better and that threads are over
used. I don't understand why, so can anyone shed any light on this?
Thanks,

-carl

--

Carl J. Van Arsdall
cv*********@mvista.com
Build and Release
MontaVista Software

Carl,
OS writers provide much more tools for debugging, tracing, changing
the priority of, sand-boxing processes than threads (in general) It
*should* be easier to get a process based solution up and running
andhave it be more robust, when compared to a threaded solution.

- Paddy (who shies away from threads in C and C++ too ;-)

Jul 26 '06 #7

John Henry

>
Carl,
OS writers provide much more tools for debugging, tracing, changing
the priority of, sand-boxing processes than threads (in general) It
*should* be easier to get a process based solution up and running
andhave it be more robust, when compared to a threaded solution.

- Paddy (who shies away from threads in C and C++ too ;-)

That mythical "process" is more robust then "thread" application
paradigm again.

No wonder there are so many boring software applications around.

Granted. Threaded program forces you to think and design your
application much more carefully (to avoid race conditions, dead-locks,
....) but there is nothing inherently *non-robust* about threaded
applications.

Jul 26 '06 #8

Gerhard Fiedler

On 2006-07-26 21:02:59, John Henry wrote:

Granted. Threaded program forces you to think and design your
application much more carefully (to avoid race conditions, dead-locks,
...) but there is nothing inherently *non-robust* about threaded
applications.

You just need to make sure that every piece of code you're using is
thread-safe. While OTOH to make sure they are all "process safe" is the job
of the OS, so to speak :)

Gerhard

Jul 27 '06 #9

Joe Knapka

John Henry wrote:

>
>>Carl,
OS writers provide much more tools for debugging, tracing, changing
the priority of, sand-boxing processes than threads (in general) It
*should* be easier to get a process based solution up and running
andhave it be more robust, when compared to a threaded solution.

- Paddy (who shies away from threads in C and C++ too ;-)

That mythical "process" is more robust then "thread" application
paradigm again.

No wonder there are so many boring software applications around.

Granted. Threaded program forces you to think and design your
application much more carefully (to avoid race conditions, dead-locks,
...) but there is nothing inherently *non-robust* about threaded
applications.

In this particular case, the OP (in a different thread)
mentioned that his application will be extended by
random individuals who can't necessarily be trusted
to design their extensions correctly. In that case,
segregating the untrusted code, at least, into
separate processes seems prudent.

The OP also mentioned that:

If I have a system that has a large area of shared memory,
which would be better?

IMO, if you're going to be sharing data structures with
code that can't be trusted to clean up after itself,
you're doomed. There's just no way to make that
scenario work reliably. The best you can do is insulate
that data behind an API (rather than giving untrusted
code direct access to the data -- IOW, don't use threads,
because if you do, they can go around your API and screw
things up), and ensure that each API call leaves the
data structures in a consistent state.

-- JK

Jul 27 '06 #10

bryanjugglercryptographer

Carl J. Van Arsdall wrote:

Alright, based a on discussion on this mailing list, I've started to
wonder, why use threads vs processes.

In many cases, you don't have a choice. If your Python program
is to run other programs, the others get their own processes.
There's no threads option on that.

If multiple lines of execution need to share Python objects,
then the standard Python distribution supports threads, while
processes would require some heroic extension. Don't confuse
sharing memory, which is now easy, with sharing Python
objects, which is hard.

So, If I have a system that has a
large area of shared memory, which would be better? I've been leaning
towards threads, I'm going to say why.

Processes seem fairly expensive from my research so far. Each fork
copies the entire contents of memory into the new process.

As others have pointed out, not usually true with modern OS's.

There's also
a more expensive context switch between processes. So if I have a
system that would fork 50+ child processes my memory usage would be huge
and I burn more cycles that I don't have to.

Again, not usually true. Modern OS's share code across
processes. There's no way to tell the size of 100
unspecified processes, but the number is nothing special.

So threads seems faster and more efficient for this scenario. That
alone makes me want to stay with threads, but I get the feeling from
people on this list that processes are better and that threads are over
used. I don't understand why, so can anyone shed any light on this?

Yes, someone can, and that someone might as well be you.
How long does it take to create and clean up 100 trivial
processes on your system? How about 100 threads? What
portion of your user waiting time is that?
--
--Bryan

Jul 27 '06 #11

sjdevnull

John Henry wrote:

Carl,
OS writers provide much more tools for debugging, tracing, changing
the priority of, sand-boxing processes than threads (in general) It
*should* be easier to get a process based solution up and running
andhave it be more robust, when compared to a threaded solution.

- Paddy (who shies away from threads in C and C++ too ;-)

That mythical "process" is more robust then "thread" application
paradigm again.

No wonder there are so many boring software applications around.

Granted. Threaded program forces you to think and design your
application much more carefully (to avoid race conditions, dead-locks,
...) but there is nothing inherently *non-robust* about threaded
applications.

Indeed. Let's just get rid of all preemptive multitasking while we're
at it; MacOS9's cooperative, non-memory-protected system wasn't
inherently worse as long as every application was written properly.
There was nothing inherently non-robust about it!

The key difference between threads and processes is that threads share
all their memory, while processes have memory protection except with
particular segments of memory they choose to share.

The next most important difference is that certain languages have
different support for threads/procs. If you're writing a Python
application, you need to be aware of the GIL and its implications on
multithreaded performance. If you're writing a Java app, you're
handicapped by the lack of support for multiprocess solutions.

The third most important difference--and it's a very distant
difference--is the performance difference. In practice, most
well-designed systems will be pooling threads/procs and so startup time
is not that critical. For some apps, it may be. Context switching
time may differ, and likewise that is not usually a sticking point but
for particular programs it can be. On some OSes, launching a
copy-on-write process is difficult--that used to be a reason to choose
threads over procs on Windows, but nowadays all modern Windows OSes
offer a CreateProcessEx call that allows full-on COW processes.

In general, though, if you want to share _all_ memory or if you have
measured and context switching sucks on your OS and is a big factor in
your application, use threads. In general, if you don't know exactly
why you're choosing one or the other, or if you want memory protection,
robustness in the face of programming errors, access to more 3rd-party
libraries, etc, then you should choose a multiprocess solution.

(OS designers spent years of hard work writing OSes with protected
memory--why voluntarily throw that out?)

Jul 27 '06 #12

sjdevnull

Russell Warren wrote:

This is something I have a streak of paranoia about (after discovering
that the current xmlrpclib has some thread safety issues). Is there a
list maintained anywhere of the modules that are aren't thread safe?

It's much safer to work the other way: assume that libraries are _not_
thread safe unless they're listed as such. Even things like the
standard C library on mainstream Linux distributions are only about 7
years into being thread-safe by default, anything at all esoteric you
should assume is not until you investigate and find documentation to
the contrary.

Jul 27 '06 #13

sjdevnull

sj*******@yahoo.com wrote:

John Henry wrote:
Granted. Threaded program forces you to think and design your
application much more carefully (to avoid race conditions, dead-locks,
...) but there is nothing inherently *non-robust* about threaded
applications.

Indeed. Let's just get rid of all preemptive multitasking while we're
at it

Also, race conditions and deadlocks are equally bad in multiprocess
solutions as in multithreaded ones. Any time you're doing parallel
processing you need to consider them.

I'd actually submit that initially writing multiprocess programs
requires more design and forethought, since you need to determine
exactly what you want to share instead of just saying "what the heck,
everything's shared!" The payoff in terms of getting _correct_
behavior more easily, having much easier maintenance down the line, and
being more robust in the face of program failures (or unforseen
environment issues) is usually well worth it, though there are
certainly some applications where threads are a better choice.

Jul 27 '06 #14

Nick Craig-Wood

br***********************@yahoo.com <br***********************@yahoo.comwrote:

Yes, someone can, and that someone might as well be you.
How long does it take to create and clean up 100 trivial
processes on your system? How about 100 threads? What
portion of your user waiting time is that?

Here is test prog...

The results are on my 2.6GHz P4 linux system

Forking
1000 loops, best of 3: 546 usec per loop
Threading
10000 loops, best of 3: 199 usec per loop

Indicating that starting up and tearing down new threads is 2.5 times
quicker than starting new processes under python.

This is probably irrelevant in the real world though!
"""
Time threads vs fork
"""

import os
import timeit
import threading

def do_child_stuff():
"""Trivial function for children to run"""
# print "hello from child"
pass

def fork_test():
"""Test forking"""
pid = os.fork()
if pid == 0:
# child
do_child_stuff()
os._exit(0)
# parent - wait for child to finish
os.waitpid(pid, os.P_WAIT)

def thread_test():
"""Test threading"""
t = threading.Thread(target=do_child_stuff)
t.start()
# wait for child to finish
t.join()

def main():
print "Forking"
timeit.main(["-s", "from __main__ import fork_test", "fork_test()"])
print "Threading"
timeit.main(["-s", "from __main__ import thread_test", "thread_test()"])

if __name__ == "__main__":
main()

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick

Jul 27 '06 #15

Steve Holden

Carl J. Van Arsdall wrote:

Paul Rubin wrote:

>>"Carl J. Van Arsdall" <cv*********@mvista.comwrites:

>>>Processes seem fairly expensive from my research so far. Each fork
copies the entire contents of memory into the new process.

No, you get two processes whose address spaces get the data. It's
done with the virtual memory hardware. The data isn't copied. The
page tables of both processes are just set up to point to the same
physical pages. Copying only happens if a process writes to one of
the pages. The OS detects this using a hardware trap from the VM
system.

Ah, alright. So if that's the case, why would you use python threads
versus spawning processes? If they both point to the same address space
and python threads can't run concurrently due to the GIL what are they
good for?

Well, of course they can interleave essentially independent
computations, which is why threads (formerly "lightweight processes")
were traditionally defined.

Further, some thread-safe extension (compiled) libraries will release
the GIL during their work, allowing other threads to execute
simultaneously - and even in parallel on multi-processor hardware.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Jul 27 '06 #16

Gerhard Fiedler

On 2006-07-26 19:10:14, Carl J. Van Arsdall wrote:

Ah, alright. So if that's the case, why would you use python threads
versus spawning processes? If they both point to the same address space
and python threads can't run concurrently due to the GIL what are they
good for?

Nothing runs concurrently on a single core processor (pipelining aside).
Processes don't run any more concurrently than threads. The scheduling is
different, but they still run sequentially.

Gerhard

Jul 27 '06 #17

Carl J. Van Arsdall

br***********************@yahoo.com wrote:

Carl J. Van Arsdall wrote:

>Alright, based a on discussion on this mailing list, I've started to
wonder, why use threads vs processes.

In many cases, you don't have a choice. If your Python program
is to run other programs, the others get their own processes.
There's no threads option on that.

If multiple lines of execution need to share Python objects,
then the standard Python distribution supports threads, while
processes would require some heroic extension. Don't confuse
sharing memory, which is now easy, with sharing Python
objects, which is hard.

Ah, alright, I think I understand, so threading works well for sharing
python objects. Would a scenario for this be something like a a job
queue (say Queue.Queue) for example. This is a situation in which each
process/thread needs access to the Queue to get the next task it must
work on. Does that sound right? Would the same apply to multiple
threads needed access to a dictionary? list?

Now if you are just passing ints and strings around, use processes with
some type of IPC, does that sound right as well? Or does the term
"shared memory" mean something more low-level like some bits that don't
necessarily mean anything to python but might mean something to your
application?

Sorry if you guys think i'm beating this to death, just really trying to
get a firm grasp on what you are telling me and again, thanks for taking
the time to explain all of this to me!

-carl
--

Carl J. Van Arsdall
cv*********@mvista.com
Build and Release
MontaVista Software

Jul 27 '06 #18

John Henry

sj*******@yahoo.com wrote:

sj*******@yahoo.com wrote:
John Henry wrote:
Granted. Threaded program forces you to think and design your
application much more carefully (to avoid race conditions, dead-locks,
...) but there is nothing inherently *non-robust* about threaded
applications.
Indeed. Let's just get rid of all preemptive multitasking while we're
at it

Also, race conditions and deadlocks are equally bad in multiprocess
solutions as in multithreaded ones. Any time you're doing parallel
processing you need to consider them.

Only in the sense that you are far more likely to be dealing with
shared resources in a multi-threaded application. When I start a
sub-process, I know I am doing that to *avoid* resource sharing. So,
the chance of a dead-lock is less - only because I would do it far
less.

I'd actually submit that initially writing multiprocess programs
requires more design and forethought, since you need to determine
exactly what you want to share instead of just saying "what the heck,
everything's shared!" The payoff in terms of getting _correct_
behavior more easily, having much easier maintenance down the line, and
being more robust in the face of program failures (or unforseen
environment issues) is usually well worth it, though there are
certainly some applications where threads are a better choice.

If you're sharing things, I would thread. I would not want to pay the
expense of a process.

It's too bad that programmers are not threading more often.

Jul 27 '06 #19

sjdevnull

John Henry wrote:

If you're sharing things, I would thread. I would not want to pay the
expense of a process.

This is generally a false cost. There are very few applications where
thread/process startup time is at all a fast path, and there are
likewise few where the difference in context switching time matters at
all. Indeed, in a Python program on a multiprocessor system, process
are potentially faster than threads, not slower.

Moreover, to get at best a small performance gain you pay a huge cost
by sacrificing memory protection within the threaded process.

You can share things between processes, but you can't memory protect
things between threads. So if you need some of each (some things
shared and others protected), processes are the clear choice.

Now, for a few applications threads make sense. Usually that means
applications that have to share a great number of complex data
structures (and normally, making the choice for performance reasons
means your design is flawed and you could help performance greatly by
reworking it--though there may be some exceptions). But the general
rule when choosing between them should be "use processes when you can,
and threads when you must".

Sadly, too many programmers greatly overuse threading. That problem is
exacerbated by the number of beginner-level programming books that talk
about how to use threads without ever mentioning processes (and without
going into the design of multi-execution apps).

Jul 27 '06 #20

John Henry

Nick Craig-Wood wrote:

>
Here is test prog...

<snip>

Here's a more real-life like program done in both single threaded mode
and multi-threaded mode. You'll need PythonCard to try this. Just to
make the point, you will notice that the core code is identical between
the two (method on_menuFileStart_exe). The only difference is in the
setup code. I wanted to dismiss the myth that multi-threaded programs
are inherently *evil*, or that it's diffcult to code, or that it's
unsafe.....(what ever dirty water people wish to throw at it).

Don't ask me to try this in process!

To have fun, first run it in single threaded mode (change the main
program to invoke the MyBackground class, instead of the
MyBackgroundThreaded class):

Change:

app = model.Application(MyBackgroundThreaded)

to:

app = model.Application(MyBackground)

Start the process by selecting File->Start, and then try to stop the
program by clicking File->Stop. Note the performance of the program.

Now, run it in multi-threaded mode. Click File->Start several times
(up to 4) and then try to stop the program by clicking File->Stop.

If you want to show off, add several more StaticText items in the
resource file, add them to the textAreas list in MyBackgroundThreaded
class and let it rip!

BTW: This ap also demonstrates the weakness in Python thread - the
threads don't get preempted equally (not even close).

:-)

Two files follows (test.py and test.rsrc.py):

#!/usr/bin/python

"""
__version__ = "$Revision: 1.1 $"
__date__ = "$Date: 2004/10/24 19:21:46 $"
"""

import wx
import threading
import thread
import time

from PythonCard import model

class MyBackground(model.Background):

def on_initialize(self, event):
# if you have any initialization
# including sizer setup, do it here
self.running(False)
self.textAreas=(self.components.TextArea1,)
return

def on_menuFileStart_select(self, event):
on_menuFileStart_exe(self.textAreas[0])
return

def on_menuFileStart_exe(self, textArea):
textArea.visible=True
self.running(True)
for i in range(10000000):
textArea.text = "Got up to %d" % i
## print i
for j in range(i):
k = 0
time.sleep(0)
if not self.running(): break
try:
wx.SafeYield(self)
except:
pass
if not self.running(): break
textArea.text = "Finished at %d" % i
return

def on_menuFileStop_select(self, event):
self.running(False)

def on_Stop_mouseClick(self, event):
self.on_menuFileStop_select(event)
return

def running(self, flag=None):
if flag!=None:
self.runningFlag=flag
return self.runningFlag
class MyBackgroundThreaded(MyBackground):

def on_initialize(self, event):
# if you have any initialization
# including sizer setup, do it here
self.myLock=thread.allocate_lock()
self.myThreadCount = 0
self.running(False)
self.textAreas=[self.components.TextArea1, self.components.TextArea2,
self.components.TextArea3, self.components.TextArea4]
return

def on_menuFileStart_select(self, event):
res=MyBackgroundWorker(self).start()

def on_menuFileStop_select(self, event):
self.running(False)
self.menuBar.setEnabled("menuFileStart", True)

def on_Stop_mouseClick(self, event):
self.on_menuFileStop_select(event)

def running(self, flag=None):
self.myLock.acquire()
if flag!=None:
self.runningFlag=flag
flag=self.runningFlag
self.myLock.release()
return flag

class MyBackgroundWorker(threading.Thread):
def __init__(self, parent):
threading.Thread.__init__(self)
self.parent=parent
self.parent.myLock.acquire()
threadCount=self.parent.myThreadCount
self.parent.myLock.release()
self.textArea=self.parent.textAreas[threadCount]

def run(self):
self.parent.myLock.acquire()
self.parent.myThreadCount += 1
if self.parent.myThreadCount==len(self.parent.textAre as):
self.parent.menuBar.setEnabled("menuFileStart", False)
self.parent.myLock.release()

self.parent.on_menuFileStart_exe(self.textArea)

self.parent.myLock.acquire()
self.parent.myThreadCount -= 1
if self.parent.myThreadCount==0:
self.parent.menuBar.setEnabled("menuFileStart", True)
self.parent.myLock.release()

return

if __name__ == '__main__':
app = model.Application(MyBackgroundThreaded)
app.MainLoop()

Here's the associated resource file:

{'application':{'type':'Application',
'name':'Template',
'backgrounds': [
{'type':'Background',
'name':'bgTemplate',
'title':'Standard Template with File->Exit menu',
'size':(400, 300),
'style':['resizeable'],

'menubar': {'type':'MenuBar',
'menus': [
{'type':'Menu',
'name':'menuFile',
'label':'&File',
'items': [
{'type':'MenuItem',
'name':'menuFileStart',
'label':u'&Start',
},
{'type':'MenuItem',
'name':'menuFileStop',
'label':u'Sto&p',
},
{'type':'MenuItem',
'name':'menuFile--',
'label':u'--',
},
{'type':'MenuItem',
'name':'menuFileExit',
'label':'E&xit',
'command':'exit',
},
]
},
]
},
'components': [

{'type':'StaticText',
'name':'TextArea1',
'position':(10, 100),
'text':u'This is a test',
'visible':False,
},

{'type':'StaticText',
'name':'TextArea2',
'position':(160, 100),
'text':u'This is a test',
'visible':False,
},

{'type':'StaticText',
'name':'TextArea3',
'position':(10, 150),
'text':u'This is a test',
'visible':False,
},

{'type':'StaticText',
'name':'TextArea4',
'position':(160, 150),
'text':u'This is a test',
'visible':False,
},

] # end components
} # end background
] # end backgrounds
} }

Jul 27 '06 #21

bryanjugglercryptographer

Carl J. Van Arsdall wrote:

Ah, alright, I think I understand, so threading works well for sharing
python objects. Would a scenario for this be something like a a job
queue (say Queue.Queue) for example. This is a situation in which each
process/thread needs access to the Queue to get the next task it must
work on. Does that sound right?

That's a reasonable and popular technique. I'm not sure what "this"
refers to in your question, so I can't say if it solves the
problem of which you are thinking.

Would the same apply to multiple
threads needed access to a dictionary? list?

The Queue class is popular with threads because it already has
locking around its basic methods. You'll need to serialize your
operations when sharing most kinds of objects.

Now if you are just passing ints and strings around, use processes with
some type of IPC, does that sound right as well?

Also reasonable and popular. You can even pass many Python objects
by value using pickle, though you lose some safety.

Or does the term
"shared memory" mean something more low-level like some bits that don't
necessarily mean anything to python but might mean something to your
application?

Shared memory means the same memory appears in multiple processes,
possibly at different address ranges. What any of them writes to
the memory, they can all read. The standard Python distribution
now offers shared memory via os.mmap(), but lacks cross-process
locks.

Python doesn't support allocating objects in shared memory, and
doing so would be difficult. That's what the POSH project is
about, but it looks stuck in alpha.
--
--Bryan

Jul 27 '06 #22

Grant Edwards

On 2006-07-27, sj*******@yahoo.com <sj*******@yahoo.comwrote:

>If you're sharing things, I would thread. I would not want to
pay the expense of a process.

This is generally a false cost. There are very few
applications where thread/process startup time is at all a
fast path,

Even if it were, on any sanely designed OS, there really isn't
any extra expense for a process over a thread.

Moreover, to get at best a small performance gain you pay a
huge cost by sacrificing memory protection within the threaded
process.

Threading most certainly shouldn't be done in some attempt to
improve performance over a multi-process model. It should be
done because it fits the algorithm better. If the execution
contexts don't need to share data and can communicate in a
simple manner, then processes probably make more sense. If the
contexts need to operate jointly on complex shared data, then
threads are usually easier.

--
Grant Edwards grante Yow! My life is a patio
at of fun!
visi.com

Jul 27 '06 #23

Carl J. Van Arsdall

br***********************@yahoo.com wrote:

Carl J. Van Arsdall wrote:

>Ah, alright, I think I understand, so threading works well for sharing
python objects. Would a scenario for this be something like a a job
queue (say Queue.Queue) for example. This is a situation in which each
process/thread needs access to the Queue to get the next task it must
work on. Does that sound right?

That's a reasonable and popular technique. I'm not sure what "this"
refers to in your question, so I can't say if it solves the
problem of which you are thinking.

> Would the same apply to multiple
threads needed access to a dictionary? list?

The Queue class is popular with threads because it already has
locking around its basic methods. You'll need to serialize your
operations when sharing most kinds of objects.

Yes yes, of course. I was just making sure we are on the same page, and
I think I'm finally getting there.

>Now if you are just passing ints and strings around, use processes with
some type of IPC, does that sound right as well?

Also reasonable and popular. You can even pass many Python objects
by value using pickle, though you lose some safety.

I actually do use pickle (not for this, but for other things), could you
elaborate on the safety issue?

>
> Or does the term
"shared memory" mean something more low-level like some bits that don't
necessarily mean anything to python but might mean something to your
application?

Shared memory means the same memory appears in multiple processes,
possibly at different address ranges. What any of them writes to
the memory, they can all read. The standard Python distribution
now offers shared memory via os.mmap(), but lacks cross-process
locks.

Python doesn't support allocating objects in shared memory, and
doing so would be difficult. That's what the POSH project is
about, but it looks stuck in alpha.

--

Carl J. Van Arsdall
cv*********@mvista.com
Build and Release
MontaVista Software

Jul 27 '06 #24

bryanjugglercryptographer

Carl J. Van Arsdall wrote:
[...]

I actually do use pickle (not for this, but for other things), could you
elaborate on the safety issue?

>From http://docs.python.org/lib/node63.html :

Warning: The pickle module is not intended to be secure
against erroneous or maliciously constructed data. Never
unpickle data received from an untrusted or unauthenticated
source.

A corrupted pickle can crash Python. An evil pickle could probably
hijack your process.
--
--Bryan

Jul 27 '06 #25

Carl J. Van Arsdall

br***********************@yahoo.com wrote:

Carl J. Van Arsdall wrote:
[...]

>I actually do use pickle (not for this, but for other things), could you
elaborate on the safety issue?

From http://docs.python.org/lib/node63.html :

Warning: The pickle module is not intended to be secure
against erroneous or maliciously constructed data. Never
unpickle data received from an untrusted or unauthenticated
source.

A corrupted pickle can crash Python. An evil pickle could probably
hijack your process.

Ah, i the data is coming from someone else. I understand. Thanks.

--

Carl J. Van Arsdall
cv*********@mvista.com
Build and Release
MontaVista Software

Jul 27 '06 #26

mark

On Wed, 26 Jul 2006 10:54:48 -0700, Carl J. Van Arsdall wrote:

Alright, based a on discussion on this mailing list, I've started to
wonder, why use threads vs processes.

The debate should not be about "threads vs processes", it should be
about "threads vs events". Dr. John Ousterhout (creator of Tcl,
Professor of Comp Sci at UC Berkeley, etc), started a famous debate
about this 10 years ago with the following simple presentation.

http://home.pacbell.net/ouster/threads.pdf

That sentiment has largely been ignored and thread usage dominates but,
if you have been programming for as long as I have, and have used both
thread based architectures AND event/reactor/callback based
architectures, then that simple presentation above should ring very
true. Problem is, young people merely equate newer == better.

On large systems and over time, thread based architectures often tend
towards chaos. I have seen a few thread based systems where the
programmers become so frustrated with subtle timing issues etc, and they
eventually overlay so many mutexes etc, that the implementation becomes
single threaded in practice anyhow(!), and very inefficient.

BTW, I am fairly new to python but I have seen that the python Twisted
framework is a good example of the event/reactor design alternative to
threads. See

http://twistedmatrix.com/projects/co...wto/async.html .

Douglas Schmidt is a famous designer and author (ACE, Corba Tao, etc)
who has written much about reactor design patterns, see
"Pattern-Oriented Software Architecture, Vol 2", Wiley 2000, amongst
many other references of his.

Jul 27 '06 #27

bryanjugglercryptographer

mark wrote:

The debate should not be about "threads vs processes", it should be
about "threads vs events".

We are so lucky as to have both debates.

Dr. John Ousterhout (creator of Tcl,
Professor of Comp Sci at UC Berkeley, etc), started a famous debate
about this 10 years ago with the following simple presentation.

http://home.pacbell.net/ouster/threads.pdf

The Ousterhout school finds multiple lines of execution
unmanageable, while the Tannenbaum school finds asynchronous I/O
unmanageable.

What's so hard about single-line-of-control (SLOC) event-driven
programming? You can't call anything that might block. You have to
initiate the operation, store all the state you'll need in order
to pick up where you left off, then return all the way back to the
event dispatcher.

That sentiment has largely been ignored and thread usage dominates but,
if you have been programming for as long as I have, and have used both
thread based architectures AND event/reactor/callback based
architectures, then that simple presentation above should ring very
true. Problem is, young people merely equate newer == better.

Newer? They're both old as the trees. That can't be why the whiz
kids like them. Threads and process rule because of their success.

On large systems and over time, thread based architectures often tend
towards chaos.

While large SLOC event-driven systems surely tend to chaos. Why?
Because they *must* be structured around where blocking operations
can happen, and that is not the structure anyone would choose for
clarity, maintainability and general chaos avoidance.

Even the simplest of modular structures, the procedure, gets
broken. Whether you can encapsulate a sequence of operations in a
procedure depends upon whether it might need to do an operation
that could block.

Going farther, consider writing a class supporting overriding of
some method. Easy; we Pythoneers do it all the time; that's what
O.O. inheritance is all about. Now what if the subclass's version
of the method needs to look up external data, and thus might
block? How does a method override arrange for the call chain to
return all the way back to the event loop, and to and pick up
again with the same call chain when the I/O comes in?

I have seen a few thread based systems where the
programmers become so frustrated with subtle timing issues etc, and they
eventually overlay so many mutexes etc, that the implementation becomes
single threaded in practice anyhow(!), and very inefficient.

While we simply do not see systems as complex as modern DBMS's
written in the SLOC event-driven style.

BTW, I am fairly new to python but I have seen that the python Twisted
framework is a good example of the event/reactor design alternative to
threads. See

http://twistedmatrix.com/projects/co...wto/async.html .

And consequently, to use Twisted you rewrite all your code as
those 'deferred' things.
--
--Bryan

Jul 28 '06 #28

Nick Vatamaniuc

It seems that both ways are here to stay. If one was so much inferior
and problem-prone, we won't be talking about it now, it would have been
forgotten on the same shelf with a stack of punch cards.

The rule of thumb is 'the right tool for the right job.'

Threading model is very useful for long CPU-bound processing, as it can
potentially take advantage of multiple CPUs/Cores (alas not in Python
now because of GIL). The events will not work as well here. But note,
if there is not much sharing of resources between threads processes
could be used! It turns out that there are very few cases where threads
are simply indispensable.

The event model is usually well suited for I/O or for any large number
of shared resources occurs that would require lots of synchronizations
if threads would be used.

DBMS' are not a good example of typical large, so 'saying see DBMS use
threads -- therefore threads are better' doesn't make a good example.
DBMS are highly optimized, only a few of them actually manage to
successfully take advantage of the multiple execution units. One could
as well cite a hundred of other projects and say 'see it uses an event
model -- therefore event models are better' and so on. Again "right
tool for the right job". A good programmer should know both...

And consequently, to use Twisted you rewrite all your code as
those 'deferred' things.

Then, try re-writing Twisted using threads in the same number of lines
having the same or better performance. I bet you'll end up having a
whole bunch of 'locks', 'waits' and 'notify's instead of a bunch of
"those 'deferred' things." Debugging all those threads should be a
project in an of itself.

-Nick
br***********************@yahoo.com wrote:

mark wrote:
The debate should not be about "threads vs processes", it should be
about "threads vs events".

We are so lucky as to have both debates.

Dr. John Ousterhout (creator of Tcl,
Professor of Comp Sci at UC Berkeley, etc), started a famous debate
about this 10 years ago with the following simple presentation.

http://home.pacbell.net/ouster/threads.pdf

The Ousterhout school finds multiple lines of execution
unmanageable, while the Tannenbaum school finds asynchronous I/O
unmanageable.

What's so hard about single-line-of-control (SLOC) event-driven
programming? You can't call anything that might block. You have to
initiate the operation, store all the state you'll need in order
to pick up where you left off, then return all the way back to the
event dispatcher.

That sentiment has largely been ignored and thread usage dominates but,
if you have been programming for as long as I have, and have used both
thread based architectures AND event/reactor/callback based
architectures, then that simple presentation above should ring very
true. Problem is, young people merely equate newer == better.

Newer? They're both old as the trees. That can't be why the whiz
kids like them. Threads and process rule because of their success.

On large systems and over time, thread based architectures often tend
towards chaos.

While large SLOC event-driven systems surely tend to chaos. Why?
Because they *must* be structured around where blocking operations
can happen, and that is not the structure anyone would choose for
clarity, maintainability and general chaos avoidance.

Even the simplest of modular structures, the procedure, gets
broken. Whether you can encapsulate a sequence of operations in a
procedure depends upon whether it might need to do an operation
that could block.

Going farther, consider writing a class supporting overriding of
some method. Easy; we Pythoneers do it all the time; that's what
O.O. inheritance is all about. Now what if the subclass's version
of the method needs to look up external data, and thus might
block? How does a method override arrange for the call chain to
return all the way back to the event loop, and to and pick up
again with the same call chain when the I/O comes in?

I have seen a few thread based systems where the
programmers become so frustrated with subtle timing issues etc, and they
eventually overlay so many mutexes etc, that the implementation becomes
single threaded in practice anyhow(!), and very inefficient.

While we simply do not see systems as complex as modern DBMS's
written in the SLOC event-driven style.

BTW, I am fairly new to python but I have seen that the python Twisted
framework is a good example of the event/reactor design alternative to
threads. See

http://twistedmatrix.com/projects/co...wto/async.html .

And consequently, to use Twisted you rewrite all your code as
those 'deferred' things.
--
--Bryan

Jul 28 '06 #29

H J van Rooyen

"Dennis Lee Bieber" <wl*****@ix.netcom.comWrote:

| On Thu, 27 Jul 2006 09:17:56 -0700, "Carl J. Van Arsdall"
| <cv*********@mvista.comdeclaimed the following in comp.lang.python:
|
| Ah, alright, I think I understand, so threading works well for sharing
| python objects. Would a scenario for this be something like a a job
| queue (say Queue.Queue) for example. This is a situation in which each
| process/thread needs access to the Queue to get the next task it must
| work on. Does that sound right? Would the same apply to multiple
| threads needed access to a dictionary? list?
| >
| Python's Queue module is only (to my knowledge) an internal
| (thread-shared) communication channel; you'd need something else to work
| IPC -- VMS mailboxes, for example (more general than UNIX pipes with
| their single reader/writer concept)
|
| "shared memory" mean something more low-level like some bits that don't
| necessarily mean anything to python but might mean something to your
| application?
| >
| Most OSs support creation and allocation of memory blocks with an
| attached name; this allows multiple processes to map that block of
| memory into their address space. The contents of said memory block is
| totally up to application agreements (won't work well with Python native
| objects).
|
| mmap()
|
| is one such system. By rough description, it maps a disk file into a
| block of memory, so the OS handles loading the data (instead of, say,
| file.seek(somewhere_long) followed by file.read(some_data_type) you
| treat the mapped memory as an array and use x = mapped[somewhere_long];
| if somewhere_long is not yet in memory, the OS will page swap that part
| of the file into place). The "file" can be shared, so different
| processes can map the same file, and thereby, the same memory contents.
|
| This can be useful, for example, with multiple identical processes
| feeding status telemetry. Each process is started with some ID, and the
| ID determines which section of mapped memory it is to store its status
| into. The controller program can just do a loop over all the mapped
| memory, updating a display with whatever is current -- doesn't matter if
| process_N manages to update a field twice while the monitor is
| scanning... The display always shows the data that was current at the
| time of the scan.
|
| Carried further -- special memory cards can (at least they were
| where I work) be obtained. These cards have fiber-optic connections. In
| a closely distributed system, each computer has one of these cards, and
| the fiber-optics link them in a cycle. Each process (on each computer)
| maps the memory of the card -- the cards then have logic to relay all
| memory changes, via fiber, to the next card in the link... Thus, all the
| closely linked computers "share" this block of memory.

This is nice to share inputs from the real world - but there are some hairy
issues if it is to be used for general purpose consumption - unless there are
hardware restrictions to stop machines stomping on each other's memories - i.e.
the machines have to be *polite* and *well behaved* - or you can easily have a
major smash...
A structure has to agreed on, and respected...

- Hendrik

Jul 28 '06 #30

Tobias Brox

[mark]

http://twistedmatrix.com/projects/co...wto/async.html .

At my work, we started writing a web app using the twisted framework,
but it was somehow too twisted for the developers, so actually they
chose to do threading rather than using twisted's async methods.

--
Tobias Brox, 69°42'N, 18°57'E

Jul 28 '06 #31

mark

On Thu, 27 Jul 2006 20:53:54 -0700, Nick Vatamaniuc wrote:

Debugging all those threads should be a project in an of itself.

Ahh, debugging - I forgot to bring that one up in my argument! Thanks
Nick ;)

Certainly I agree of course that there are many applications which suit
a threaded design. I just think there is a general over-emphasis on
using threads and see it applied very often where an event based
approach would be cleaner and more efficient. Thanks for your comments
Bryan and Nick, an interesting debate.

Jul 28 '06 #32

sturlamolden

Chance Ginger wrote:

Not quite that simple. In most modern OS's today there is something
called COW - copy on write. What happens is when you fork a process
it will make an identical copy. Whenever the forked process does
write will it make a copy of the memory. So it isn't quite as bad.

A noteable exception is a toy OS from a manufacturer in Redmond,
Washington. It does not do COW fork. It does not even fork.

To make a server system scale well on Windows you need to use threads,
not processes. That is why the global interpreter lock sucks so badly
on Windows.

Jul 28 '06 #33

bryanjugglercryptographer

sturlamolden wrote:

A noteable exception is a toy OS from a manufacturer in Redmond,
Washington. It does not do COW fork. It does not even fork.

To make a server system scale well on Windows you need to use threads,
not processes.

Here's one to think about: if you have a bunch of threads running,
and you fork, should the child process be born running all the
threads? Neither answer is very attractive. It's a matter of which
will probably do the least damage in most cases (and the answer
the popular threading systems choose is 'no'; the child process
runs only the thread that called fork).

MS-Windows is more thread-oriented than *nix, and it avoids this
particular problem by not using fork() to create new processes.

That is why the global interpreter lock sucks so badly
on Windows.

It sucks about he same on Windows and *nix: hardly at all on
single-processors, moderately on multi-processors.
--
--Bryan

Jul 28 '06 #34

sjdevnull

sturlamolden wrote:

Chance Ginger wrote:

Not quite that simple. In most modern OS's today there is something
called COW - copy on write. What happens is when you fork a process
it will make an identical copy. Whenever the forked process does
write will it make a copy of the memory. So it isn't quite as bad.

A noteable exception is a toy OS from a manufacturer in Redmond,
Washington. It does not do COW fork. It does not even fork.

That's only true for Windows 98/95/Windows 3.x and other DOS-based
Windows versions.

NTCreateProcess with SectionHandle=NULL creates a new process with a
COW version of the parent process's address space.

It's not called "fork", but it does the same thing. There's a new name
for it in Win2K or XP (maybe CreateProcessEx?) but the functionality
has been there since the NT 3.x days at least and is in all modern
Windows versions.

Jul 28 '06 #35

sjdevnull

mark wrote:

On Wed, 26 Jul 2006 10:54:48 -0700, Carl J. Van Arsdall wrote:
Alright, based a on discussion on this mailing list, I've started to
wonder, why use threads vs processes.

The debate should not be about "threads vs processes", it should be
about "threads vs events".

Events serve a seperate problem space.

Use event-driven state machine models for efficient multiplexing and
fast network I/O (e.g. writing an efficient static HTTP server)

Use multi-execution models for efficient multiprocessing. No matter
how scalable your event-driven app is it's not going to take advantage
of multi-CPU systems, or modern multi-core processors.

Event-driven state machines can be harder to program and maintain than
multi-process solutions, but they are usually easier than
multi-threaded solutions.

On-topic: If your problem is one where event-driven state machines are
a good solution, Python generators can be a _huge_ help.

Jul 28 '06 #36

Threads vs Processes

Similar topics