On Wed, 09 May 2007 18:16:32 -0700, Kay Schluehr wrote:
Every once in a while Erlang style [1] message passing concurrency [2]
is discussed for Python which does not only imply Stackless tasklets [3]
but also some process isolation semantics that lets the runtime easily
distribute tasklets ( or logical 'processes' ) across physical
processes. Syntactically a tasklet might grow out of a generator by
reusing the yield keyword for sending messages:
yield_expr : 'yield' ([testlist] | testlist 'to' testlist)
where the second form is specific for tasklets ( one could also use a
new keyword like "emit" if this becomes confusing - the semantics is
quite different ) and the addition of a new keyword for assigning the
"mailbox" e.g:
required_stmt: 'required' ':' suite
So tasklets could be identified on a lexical level ( just like
generators today ) and compiled accordingly. I just wonder about sharing
semantics. Would copy-on-read / copy-on-write and new opcodes be needed?
What would happen when sharing isn't dropped at all but when the runtime
moves a tasklet around into another OS level thread / process it will be
pickled and just separated on need? I think it would be cleaner to
separate it completely but what are the costs?
What do you think?
[1] http://en.wikipedia.org/wiki/Erlang_...mming_language [2]
http://en.wikipedia.org/wiki/Actor_model [3] http://www.stackless.com/
Funny enough, I'm working on a project right now that is designed for
exactly that: PARLEY,
http://osl.cs.uiuc.edu/parley . (An announcement
should show up in clp-announce as soon as the moderators release it). My
essential thesis is that syntactic sugar should not be necessary -- that a
nice library would be sufficient. I do admit that Erlang's pattern
matching would be nice, although you can get pretty far by using uniform
message formats that can easily be dispatched on -- the tuple
(tag, sender, args, kwargs)
in the case of PARLEY, which maps nicely to instance methods of a
dispatcher class.
The questions of sharing among multiple physical processes is interesting.
Implicit distribution of actors may not even be necessary if it is easy
enough for two hosts to coordinate with each other. In terms of the
general question of assigning actors to tasklets, threads, and processes,
there are added complications in terms of the physical limitations of
Python and Stackless Python:
- because of the GIL, actors in the same process do not gain the
advantag of true parallel computation
- all tasklet I/O has to be non-blocking
- tasklets are cooperative, while threads are preemptive
- communication across processes is slower, has to be serialized, etc.
- using both threads and tasklets in a single process is tricky
PARLEY currently only works within a single process, though one can choose
to use either tasklets or threads. My next goal is to figure out I/O, at
which point I get to tackle the fun question of distribution.
So far, I've not run into any cases where I've wanted to change the
interpreter, though I'd be interested in hearing ideas in this direction
(especially with PyPy as such a tantalizing platform!).
--
Jacob Lee <ar*****@freeshell.org>