"A Fundamental Turn Toward Concurrency in Software"

aurora

Hello!

Just gone though an article via Slashdot titled "The Free Lunch Is Over: A
Fundamental Turn Toward Concurrency in Software"
[http://www.gotw.ca/publications/concurrency-ddj.htm]. It argues that the
continous CPU performance gain we've seen is finally over. And that future
gain would primary be in the area of software concurrency taking advantage
hyperthreading and multicore architectures.

Perhaps something the Python interpreter team can ponder.

Jul 18 '05 #1

Subscribe Post Reply

2862

Paul Rubin

aurora <au******@gmail.com> writes:

Just gone though an article via Slashdot titled "The Free Lunch Is
Over: A Fundamental Turn Toward Concurrency in Software"
[http://www.gotw.ca/publications/concurrency-ddj.htm]. It argues that
the continous CPU performance gain we've seen is finally over. And
that future gain would primary be in the area of software concurrency
taking advantage hyperthreading and multicore architectures.

Well, another gain could be had in making the software less wasteful
of cpu cycles.

I'm a pretty experienced programmer by most people's standards but I
see a lot of systems where I can't for the life of me figure out how
they manage to be so slow. It might be caused by environmental
pollutants emanating from Redmond.

Jul 18 '05 #2

Jack Diederich

On Fri, Jan 07, 2005 at 01:35:46PM -0800, aurora wrote:

Hello!

Just gone though an article via Slashdot titled "The Free Lunch Is Over: A
Fundamental Turn Toward Concurrency in Software"
[http://www.gotw.ca/publications/concurrency-ddj.htm]. It argues that the
continous CPU performance gain we've seen is finally over. And that future
gain would primary be in the area of software concurrency taking advantage
hyperthreading and multicore architectures.

It got most things right, AMD & Intel are moving towards multiple cores on
a chip so programmers will adapt. I don't see this as a big deal, the current
trend is rack farms of cheap boxes for heavy computing needs. Multi-core CPUs
will help those kinds of applications more than single threaded ones. Existing
threaded apps don't have to worry at all.

His picking on Intel to graph CPU speeds was a mistake (I'll be generous and
not say deliberate). Intel screwed up and pursued a megahertz-at-all-costs
strategy for marketing reasons. AMD didn't worry about MHz, just about CPUs
that did more work and so AMD is eating Intel's lunch. Intel has abandoned
their "faster" line of processors and is using their CPUs that are slower in
MHz but get more work done. So the author's "MHz plateau" graph isn't all
Moore's law breaking down, it is the result of Intel's marketing dept breaking
down.

-Jack

ps, I started a python corner to my blog, http://jackdied.com/python
Only one substantial post yet, and the RSS feed isn't up, but there ya go.

Jul 18 '05 #3

Steve Horsley

Jack Diederich wrote:

On Fri, Jan 07, 2005 at 01:35:46PM -0800, aurora wrote:
Hello!

Just gone though an article via Slashdot titled "The Free Lunch Is Over: A
Fundamental Turn Toward Concurrency in Software"
[http://www.gotw.ca/publications/concurrency-ddj.htm]. It argues that the
continous CPU performance gain we've seen is finally over. And that future
gain would primary be in the area of software concurrency taking advantage
hyperthreading and multicore architectures.

It got most things right, AMD & Intel are moving towards multiple cores on
a chip so programmers will adapt. I don't see this as a big deal, the current
trend is rack farms of cheap boxes for heavy computing needs. Multi-core CPUs
will help those kinds of applications more than single threaded ones. Existing
threaded apps don't have to worry at all.

But my understanding is that the current Python VM is single-threaded internally,
so even if the program creates multiple threads, just one core will be dividing
its time between those "threads".
His picking on Intel to graph CPU speeds was a mistake (I'll be generous and
not say deliberate). Intel screwed up and pursued a megahertz-at-all-costs
strategy for marketing reasons. AMD didn't worry about MHz, just about CPUs
that did more work and so AMD is eating Intel's lunch. Intel has abandoned
their "faster" line of processors and is using their CPUs that are slower in
MHz but get more work done. So the author's "MHz plateau" graph isn't all
Moore's law breaking down, it is the result of Intel's marketing dept breaking
down.

You may be right, but I agree with the thrust of the article that multicore
looks to be the new in thing at the moment.

Steve

Jul 18 '05 #4

John Roth

"aurora" <au******@gmail.com> wrote in message
news:op**************@news.cisco.com...

Hello!

Just gone though an article via Slashdot titled "The Free Lunch Is Over: A
Fundamental Turn Toward Concurrency in Software"
[http://www.gotw.ca/publications/concurrency-ddj.htm]. It argues that the
continous CPU performance gain we've seen is finally over. And that future
gain would primary be in the area of software concurrency taking advantage
hyperthreading and multicore architectures.

Perhaps something the Python interpreter team can ponder.

Well, yes. However, it's not as bad as it looks. I've spent a good part
of my professional life with multiprocessors (IBM mainframes) and
I have yet to write a multi-thread program for performance reasons.
All of those systems ran multiple programs, not single programs
that had to take advantage of the multiprocessor environment.
Your typical desktop is no different. My current system has 42
processes running, and I'd be willing to bet that the vast majority
of them aren't multi-threaded.

There are a relatively small number of places where multi-threading
is actually useful; many programmers will never run into an application
where they need to use it.

I think it would be a good idea for the Python team to address
decent support for multiprocessors, but I hardly think it is a crisis.

John Roth

Jul 18 '05 #5

Erik Max Francis

aurora wrote:

Just gone though an article via Slashdot titled "The Free Lunch Is Over: A
Fundamental Turn Toward Concurrency in Software"
[http://www.gotw.ca/publications/concurrency-ddj.htm]. It argues that the
continous CPU performance gain we've seen is finally over. And that future
gain would primary be in the area of software concurrency taking advantage
hyperthreading and multicore architectures.

Well, I think it's reasonable to expect that _eventually_ (whether soon
or relatively soon or not for a long time) Moore's law will fail, and
the exponential increase in computing power over time will cease to
continue. At that point, it seems reasonable to assume you'll do your
best to take advantage of this with parallelization -- if your CPU won't
get faster, just put more and more in the box. So I've always had it in
the back of my mind that languages that can easily support massive
(especially automatic) parallelization will have their day in the sun,
at least someday.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
All the people in my neighborhood turn around and get mad and sing
-- Public Enemy

Jul 18 '05 #6

Nick Coghlan

Steve Horsley wrote:

But my understanding is that the current Python VM is single-threaded
internally,
so even if the program creates multiple threads, just one core will be
dividing
its time between those "threads".

Not really.

The CPython interpreter does have a thing called the 'Global Interpreter Lock'
which synchronises access to the internals of the interpreter. If that wasn't
there, Python threads could corrupt the data structures. In order to do anything
useful, Python code must hold this lock, which leads to the frequent
misapprehension that Python is 'single-threaded'.

However, the threads created by the Python threading mechanism are real OS
threads, and the work load can be distributed between different cores.

In practice, this doesn't happen for a pure Python program, since any running
Python code must hold the interpreter lock. The Python threads end up getting
timesliced instead of running in parallel. Genuine concurrency with pure Python
requires running things in separate processes (to reliably get multiple
instances of the Python interpreter up and running).

Python threads are mainly intended to help deal with 'slow' I/O operations like
disk and network access - the C code that implements those operations *releases*
the GIL before making the slow call, allowing other Python threads to run while
waiting for the I/O call to complete. This behaviour means threading can give
*big* performance benefits on even single-CPU machines, and is likely to be the
biggest source of performance improvements from threading.

However, on multi-processor machines, it is also handy if a CPU-intensive
operation can be handled on one core, while another core keeps running Python code.

Again, this is handled by the relevant extension releasing the GIL before
performing its CPU-intensive operations and reacquiring the GIL when it is done.

So Python's concurrency is built in a couple of layers:

Python-level concurrency:
Multiple processes for true concurrency
Time-sliced concurrency within a process (based on the GIL)

C-level concurrency:
True concurrency if GIL is released when not needed

In some cases, problems with multi-threading are caused by invocation of
extensions which don't correctly release the GIL, effectively preventing *any*
other Python threads from running (since the executing extension never releases it).

As an example, I frequently use SWIG to access hardware API's from Python. My
standard 'exception translator' (which SWIG automatically places around every
call to the extension) now looks something like:

%exception {
Py_BEGIN_ALLOW_THREADS
try {
$action
} except (...) {
Py_BLOCK_THREADS
SWIG_exception(SWIG_RuntimeError, "Unexpected exception")
}
Py_END_ALLOW_THREADS
}

The above means that every call into my extension releases the GIL
automatically, and reacquires it when returning to Python. I usually don't call
the Python C API from the extension, but if I did, I would need to reacquire the
GIL with PyGILState_Ensure() before doing so.

Without those threading API calls in place, operations which access the hardware
always block the entire program, even if the Python program is multi-threaded.

See here for some more info on Python's threading:
http://www.python.org/doc/2.4/api/threads.html

Cheers,
Nick.

--
Nick Coghlan | nc******@email.com | Brisbane, Australia
---------------------------------------------------------------
http://boredomandlaziness.skystorm.net

Jul 18 '05 #7

michele.simionato

> <snip> So I've always had it in

the back of my mind that languages that can easily support massive
(especially automatic) parallelization will have their day in the sun, at least someday.

and the language of the future will be called ... FORTRAN!

:-)

(joking, but it is the only language I know supporting massive
parallelization ...)
Michele Simionato

Jul 18 '05 #8

Lee Harr

>> [http://www.gotw.ca/publications/concurrency-ddj.htm]. It argues that the

continous CPU performance gain we've seen is finally over. And that future
gain would primary be in the area of software concurrency taking advantage
hyperthreading and multicore architectures.
Well, yes. However, it's not as bad as it looks. I've spent a good part
of my professional life with multiprocessors (IBM mainframes) and
I have yet to write a multi-thread program for performance reasons.
All of those systems ran multiple programs, not single programs
that had to take advantage of the multiprocessor environment.
Your typical desktop is no different. My current system has 42
processes running, and I'd be willing to bet that the vast majority
of them aren't multi-threaded.

Exactly. If every one of your processes had its own 2 Ghz processor
running nothing else, I think you would be pretty happy. Your OS
had better be well-written to deal with concurrent access to
memory and disks, but I think for general application development
there will be huge speed boosts with little need for new
programming paradigms.

Jul 18 '05 #9

Skip Montanaro

Jp> How often do you run 4 processes that are all bottlenecked on CPU?

In scientific computing I suspect this happens rather frequently.

"More is never enough." -- Bob Saltzman

Skip

Jul 18 '05 #10

Aahz

In article <11*********************@f14g2000cwb.googlegroups. com>,
<mi***************@gmail.com> wrote:

Michele deleted an attribution:

<snip> So I've always had it in
the back of my mind that languages that can easily support massive
(especially automatic) parallelization will have their day in the sun,
at least someday.

and the language of the future will be called ... FORTRAN!

:-)

(joking, but it is the only language I know supporting massive
parallelization ...)

Less of a joke than you think, perhaps. Back in the early 1980s, a
family friend said something like, "In the year 2000, there will be a
programming language. I don't know what it will look like, and I don't
know what it will do. But I do know one thing: it will be called
FORTRAN."

After all, FORTRAN 2003 contains OOP support....
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"19. A language that doesn't affect the way you think about programming,
is not worth knowing." --Alan Perlis

Jul 18 '05 #11

Donn Cave

Quoth Skip Montanaro <sk**@pobox.com>:
|
| Jp> How often do you run 4 processes that are all bottlenecked on CPU?
|
| In scientific computing I suspect this happens rather frequently.

I think he was trying to say more or less the same thing - responding
to "(IBM mainframes) ... All those systems ran multiple programs ...
My current system has 42 processes running ...", his point was that
however many processes on your desktop, on the rare occasion that
your CPU is pegged, it will be 1 process. The process structure of
a system workload doesn't make it naturally take advantage of SMP.
So "there will still need to be language innovations" etc. -- to
accommodate scientific computing or whatever. Your 4 processes are
most likely not a natural architecture for the task at hand, but
rather a complication introduced specifically to exploit SMP.

Personally I wouldn't care to predict anything here. For all I know,
someday we may decide that we need cooler and more efficient computers
more than we need faster ones.

Donn Cave, do**@drizzle.com

Jul 18 '05 #12

John Roth

"Donn Cave" <do**@drizzle.com> wrote in message
news:41**********@127.0.0.1...

Quoth Skip Montanaro <sk**@pobox.com>:
|
| Jp> How often do you run 4 processes that are all bottlenecked on
CPU?
|
| In scientific computing I suspect this happens rather frequently.

I think he was trying to say more or less the same thing - responding
to "(IBM mainframes) ... All those systems ran multiple programs ...
My current system has 42 processes running ...", his point was that
however many processes on your desktop, on the rare occasion that
your CPU is pegged, it will be 1 process. The process structure of
a system workload doesn't make it naturally take advantage of SMP.
So "there will still need to be language innovations" etc. -- to
accommodate scientific computing or whatever. Your 4 processes are
most likely not a natural architecture for the task at hand, but
rather a complication introduced specifically to exploit SMP.
Exactly. I wasn't addressing some of the known areas where one
can take advantage of multiple processors, or where one can take
advantage of threading on a single processor to avoid delays.

At this point in time, though, I see multithreading for compute
intensive tasks to be an intermediate step. The final step is to
restructure it so it can take advantage of cluster architectures.
Then you can simply ignore all of the complexity of threads.

That still leaves putting long running tasks (such as printing)
into the background so the UI stays responsive.

Personally I wouldn't care to predict anything here. For all I know,
someday we may decide that we need cooler and more efficient computers
more than we need faster ones.
Chuckle. I basically think of shared memory multiprocessing
as being perverse: the bottleneck is memory, not compute
speed, so adding more processors accessing the same memory
doesn't strike me as exactly sane. Nor does pushing compute
speed up and up and up when it just stressed the memory
bottleneck.
Donn Cave, do**@drizzle.com

Jul 18 '05 #13

Peter Hansen

John Roth wrote:

I have yet to write a multi-thread program for performance reasons.

If we include in the set of things covered by the term
"performance" not only throughput, but also latency, then
I suspect you actually have written some multithreaded programs
for "performance" reasons.

*I* certainly have: that's easily the reason for threading
in 95% of the cases I've dealt with, and I suspect those of
many others.

-Peter

Jul 18 '05 #14

aurora

Of course there are many performance bottleneck, CPU, memory, I/O, network
all the way up to the software design and implementation. As a software
guy myself I would say by far better software design would lead to the
greatest performance gain. But that doesn't mean hardware engineer can sit
back and declare this as "software's problem". Even if we are not writing
CPU intensive application we will certain welcome "free performace gain"
coming from a faster CPU or a more optimized compiler.

I think this is significant because it might signify a paradigm shift.
This might well be a hype, but let's just assume this is future direction
of CPU design. Then we might as well start experimenting now. I would just
throw some random ideas: parallel execution at statement level, look up
symbol and attributes predicitively, parallelize hash function, dictionary
lookup, sorting, list comprehension, etc, background just-in-time
compilation, etc, etc.

One of the author's idea is many of today's main stream technology (like
OO) did not come about suddenly but has cumulated years of research before
becoming widely used. A lot of these ideas may not work or does not seems
to matter much today. But in 10 years we might be really glad that we have
tried.

aurora <au******@gmail.com> writes:
Just gone though an article via Slashdot titled "The Free Lunch Is
Over: A Fundamental Turn Toward Concurrency in Software"
[http://www.gotw.ca/publications/concurrency-ddj.htm]. It argues that
the continous CPU performance gain we've seen is finally over. And
that future gain would primary be in the area of software concurrency
taking advantage hyperthreading and multicore architectures.

Well, another gain could be had in making the software less wasteful
of cpu cycles.

I'm a pretty experienced programmer by most people's standards but I
see a lot of systems where I can't for the life of me figure out how
they manage to be so slow. It might be caused by environmental
pollutants emanating from Redmond.

Jul 18 '05 #15

John Roth

"Peter Hansen" <pe***@engcorp.com> wrote in message
news:b9********************@powergate.ca...

John Roth wrote:
I have yet to write a multi-thread program for performance reasons.
If we include in the set of things covered by the term
"performance" not only throughput, but also latency, then
I suspect you actually have written some multithreaded programs
for "performance" reasons.

*I* certainly have: that's easily the reason for threading
in 95% of the cases I've dealt with, and I suspect those of
many others.

Actually, I've never written a multi-threaded program for
any reason. There were only two times I had to deal with concurrency:
one was a very nice co-routine implementation (HASP,
the predecessor to the JES2 subsystem on MVS), and
the other was event driven (on an IBM SP). The former
system didn't have a threading library, let alone a lightweight
one, and the event driven design was a lot simpler for the
second application - and I did consider all three options.

John Roth

-Peter

Jul 18 '05 #16

Carlos Ribeiro

On Sat, 08 Jan 2005 11:52:03 -0800, aurora <au******@gmail.com> wrote:

One of the author's idea is many of today's main stream technology (like
OO) did not come about suddenly but has cumulated years of research before
becoming widely used. A lot of these ideas may not work or does not seems
to matter much today. But in 10 years we might be really glad that we have
tried.

One thing that I would love to see included in Python is a native
library for fast inter-process communication, including support for
message passing primitives and remote object calls. I know that there
are a number of offerings in this arena: third party libraries such as
Pyro, message passing libraries such as the ones from ScyPy, and
standard libraries such as the XMLRPC ones. The key requirements are:
"fast", and "native".

By fast, I mean, highly optimized, and at least as fast (in the same
order of magnitude, let's say) as any other competitive environment
available. By native, it means that it has to be included in the
standard distribution, and has to be as transparent and convenient as
possible. In other words, it has to feel like a native part of the
language.

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: ca********@gmail.com
mail: ca********@yahoo.com

Jul 18 '05 #17

"A Fundamental Turn Toward Concurrency in Software"

Similar topics