Towards faster Python implementations - theory - Page 2

John Nagle

Some faster Python implementations are under development.
JPython has been around for a while, and PyPy and ShedSkin
continue to move forward. It's worth thinking about what slows
down Python implementations .

It isn't the dynamism, really. As others have pointed out
in the Python literature, most of the time, the more elaborate
dynamic features aren't being used for any given variable or
object. If code has something like "x = 1.0", the odds are that
"x" is going to stay a floating point number, and not suddenly turn
into a list or an object reference. The type inference of Shed Skin
builds on that assumption, adding some restrictions about changing of
variable types.

The Shed Skin effort indicates that explicit typing, via 'decorators'
or otherwise, isn't really necessary. What's necessary is the avoidance
of "surprises" in the language. In this context, a "surprise" is
the use of a dynamic feature in a way that can't be seen at compile time.

A typical "surprise" would be the use of "setattr" on an object from
outside the compilation unit that defines the object. Within a module,
"setattr" on an object in that module is no problem; the compiler can see
it and generate the extra machinery needed to make an object dynamically
alterable at run time. But if an object doesn't need that extra machinery
and associated dictionary, it's a big win to discard the excess baggage
and use a simpler fixed-size representation, comparable to a C struct,
for the object.

On the typing front, the neatest way to express typing is via
initialization. With the Shed Skin restrictions, if all variables are
initialized before use (preferably in __init__), there's no need to
maintain an "undefined" flag for a variable. And, of course, if
the type of a varaible is simple and can't change, it doesn't have to
be "boxed", (enclosed in an object) which is a big win.

The point here is that we don't need language changes or declarations
to make Python much faster. All we need are a few restrictions that
insure that, when you're doing something unusual, the compiler can
tell.

John Nagle

May 8 '07

Subscribe Reply

1680

John Nagle

Paul Boddie wrote:

On 9 May, 08:09, "Hendrik van Rooyen" <m...@microcorp .co.zawrote:

>>I am relatively new on this turf, and from what I have seen so far, it
would not bother me at all to tie a name's type to its first use, so that
the name can only be bound to objects of the same type as the type
of the object that it was originally bound to.

But it's interesting to consider the kinds of names you could restrict
in this manner and what the effects would be. In Python, the only kind
of name that can be considered difficult to arbitrarily modify "at a
distance" - in other words, from outside the same scope - are locals,
and even then there are things like closures and perverse
implementation-dependent stack hacks which can expose local namespaces
to modification, although any reasonable "conservati ve Python"
implementation would disallow the latter.

Modifying "at a distance" is exactly what I'm getting at. That's the
killer from an optimizing compiler standpoint. The compiler, or a
maintenance programmer, looks at a block of code, and there doesn't seem
to be anything unusual going on. But, if in some other section of
code, something does a "setattr" to mess with the first block of code,
something unusual can be happening. This is tough on both optimizing
compilers and maintenance programmers.

Python has that capability mostly because it's free in an
"everything is a dictionary" implementation. ("When all you have
is a hash, everything looks like a dictionary".) But that limits
implementation performance. Most of the time, nobody is using
"setattr" to mess with the internals of a function, class, or
module from far, far away. But the cost for that flexibility is
being paid, unnecessarily.

I'm suggesting that the potential for "action at a distance" somehow
has to be made more visible.

One option might be a class "simpleobje ct", from which other classes
can inherit. ("object" would become a subclass of "simpleobject") .
"simpleobje ct" classes would have the following restrictions:

- New fields and functions cannot be introduced from outside
the class. Every field and function name must explicitly appear
at least once in the class definition. Subclassing is still
allowed.
- Unless the class itself uses "getattr" or "setattr" on itself,
no external code can do so. This lets the compiler eliminate the
object's dictionary unless the class itself needs it.

This lets the compiler see all the field names and assign them fixed slots
in a fixed sized object representation. Basically, this means simple objects
have a C/C++ like internal representation, with the performance that comes
with that representation.

With this, plus the "Shed Skin" restrictions, plus the array features of
"numarray", it should be possible to get computationally intensive code
written in Python up to C/C++ levels of performance. Yet all the dynamic
machinery of Python remains available if needed.

All that's necessary is not to surprise the compiler.

John Nagle

May 9 '07 #11

Paul Boddie

John Nagle wrote:

>
Modifying "at a distance" is exactly what I'm getting at. That's the
killer from an optimizing compiler standpoint. The compiler, or a
maintenance programmer, looks at a block of code, and there doesn't seem
to be anything unusual going on. But, if in some other section of
code, something does a "setattr" to mess with the first block of code,
something unusual can be happening. This is tough on both optimizing
compilers and maintenance programmers.

Agreed. It's tempting to look at some code and say which types one
thinks are being manipulated, but the actual program behaviour can be
quite different. Adding explicit type declarations "anchors" the names
to a very restrictive set of types - typically those permitted through
interfaces or inheritance in many object-oriented languages - and the
challenge, as we've already discussed, is to attempt to "anchor" the
names when they are in a sense "floating free" without any explicit
annotations from the programmer.

Python also presents other challenges. Unlike systems programming
languages, almost nothing is a local operation: each Python function
is mostly a product of function calls and the occasional conditional
or looping construct, themselves wired up using yet more function
calls. Whilst the expection is that most of these will return sensible
results (eg. a call to the iter method on a sequence returns an
iterator), there isn't any guarantee that this will be the case, and
even then the precise iterator involved can vary. Some might claim
that the class of operations which could be done locally are precisely
the ones which would benefit from identification and optimisation,
namely those involving primitive types, yet some pretty good solutions
for such cases exist already: Pyrex, various numeric packages, and so
on.

Python has that capability mostly because it's free in an
"everything is a dictionary" implementation. ("When all you have
is a hash, everything looks like a dictionary".) But that limits
implementation performance. Most of the time, nobody is using
"setattr" to mess with the internals of a function, class, or
module from far, far away. But the cost for that flexibility is
being paid, unnecessarily.

I've heard claims that virtual machines are evolving to make
dictionary-based access more performant, and I guess Microsoft's DLR
and any JVM improvements might point in that direction, but Shed Skin
shows the way by avoiding such things entirely.

I'm suggesting that the potential for "action at a distance" somehow
has to be made more visible.

One option might be a class "simpleobje ct", from which other classes
can inherit. ("object" would become a subclass of "simpleobject") .
"simpleobje ct" classes would have the following restrictions:

- New fields and functions cannot be introduced from outside
the class. Every field and function name must explicitly appear
at least once in the class definition. Subclassing is still
allowed.

I suppose one could mandate that only methods defined inside class
blocks belong to the class and subclasses, and that various kinds of
optimisations can then be applied to the code in those methods such
that the self parameter's type is constrained in a way probably
already imposed by CPython. This would forbid the adding of functions
to classes and instances after the fact, but in my fairly conservative
world of programming, I hardly ever do such things. I do add
attributes to instances outside those instances (by the above
definition) quite often, though.

- Unless the class itself uses "getattr" or "setattr" on itself,
no external code can do so. This lets the compiler eliminate the
object's dictionary unless the class itself needs it.

I think this is a natural progression from determining the access
needs of particular classes and their objects.

This lets the compiler see all the field names and assign them fixed slots
in a fixed sized object representation. Basically, this means simple objects
have a C/C++ like internal representation, with the performance that comes
with that representation.

Yes, I agree that this would be desirable.

With this, plus the "Shed Skin" restrictions, plus the array features of
"numarray", it should be possible to get computationally intensive code
written in Python up to C/C++ levels of performance. Yet all the dynamic
machinery of Python remains available if needed.

All that's necessary is not to surprise the compiler.

One thing I had cause to think about again today was the cost of
function invocations. Python not only has pervasive dynamic dispatch,
but there are also such things as *args and **kwargs and the cost of
dealing with them. However, such things are very useful in integrating
components and extending class hierarchies, so I wouldn't want to see
them vanish in their entirety.

Paul

May 9 '07 #12

Klaas

On May 9, 10:02 am, John Nagle <n...@animats.c omwrote:

One option might be a class "simpleobje ct", from which other classes
can inherit. ("object" would become a subclass of "simpleobject") .
"simpleobje ct" classes would have the following restrictions:

- New fields and functions cannot be introduced from outside
the class. Every field and function name must explicitly appear
at least once in the class definition. Subclassing is still
allowed.
- Unless the class itself uses "getattr" or "setattr" on itself,
no external code can do so. This lets the compiler eliminate the
object's dictionary unless the class itself needs it.

This lets the compiler see all the field names and assign them fixed slots
in a fixed sized object representation. Basically, this means simple objects
have a C/C++ like internal representation, with the performance that comes
with that representation.

Hey look, it already exists:

>>class A(object):

.... __slots__ = 'a b c d'.split()

>>a = A()
a.e = 2

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'A' object has no attribute 'e'

>>hasattr(a, '__dict__')

False

-Mike

May 9 '07 #13

Hendrik van Rooyen

"Terry Reedy" <t..y@u,,.eduwr ote:

"Hendrik van Rooyen" <ma**@microcorp .co.zawrote in message
news:013d01c792 10$5e441280$030 00080@hendrik.. .
| I am relatively new on this turf, and from what I have seen so far, it
| would not bother me at all to tie a name's type to its first use, so that
| the name can only be bound to objects of the same type as the type
| of the object that it was originally bound to.
|
| But maybe I am missing the point of dynamism.
|
| Would an implementation of the above break lots of stuff in practice?

For function local variables, if you mean 'originally bound to' in the
current call invocation, that would sometimes be ok (and that is sort of
what Psycho does). But if you mean in the original binding in the first
call invocation, then that would cripple many functions.

Errr - I was thinking a bit simplistic - I know I can write:

def f(x):
for y in x:
print y # using print here as short for doing something complicated

And that would currently "work" with any iterable, as x could
currently be anything.

It seems that such functions are the problem, as something like this:

x = [1,2,3,4,5]
for y in x:
print y

does not have the same hassle for x, but can shift the problem to y:

x = [1,2,3,4,(1,2)]
for y in x:
print y

I can't see an easy way to put the patient back into his straight jacket.

Makes you want to use pre defined globals...

- Hendrik

May 10 '07 #14

Hendrik van Rooyen

"John Nagle" <na***@animats. comwrote:

Paul Boddie wrote:
On 9 May, 08:09, "Hendrik van Rooyen" <m...@microcorp .co.zawrote:

>I am relatively new on this turf, and from what I have seen so far, it
would not bother me at all to tie a name's type to its first use, so that
the name can only be bound to objects of the same type as the type
of the object that it was originally bound to.

But it's interesting to consider the kinds of names you could restrict
in this manner and what the effects would be. In Python, the only kind
of name that can be considered difficult to arbitrarily modify "at a
distance" - in other words, from outside the same scope - are locals,
and even then there are things like closures and perverse
implementation-dependent stack hacks which can expose local namespaces
to modification, although any reasonable "conservati ve Python"
implementation would disallow the latter.

Modifying "at a distance" is exactly what I'm getting at. That's the
killer from an optimizing compiler standpoint. The compiler, or a
maintenance programmer, looks at a block of code, and there doesn't seem
to be anything unusual going on. But, if in some other section of
code, something does a "setattr" to mess with the first block of code,
something unusual can be happening. This is tough on both optimizing
compilers and maintenance programmers.

Python has that capability mostly because it's free in an
"everything is a dictionary" implementation. ("When all you have
is a hash, everything looks like a dictionary".) But that limits
implementation performance. Most of the time, nobody is using
"setattr" to mess with the internals of a function, class, or
module from far, far away. But the cost for that flexibility is
being paid, unnecessarily.

I'm suggesting that the potential for "action at a distance" somehow
has to be made more visible.

One option might be a class "simpleobje ct", from which other classes
can inherit. ("object" would become a subclass of "simpleobject") .
"simpleobje ct" classes would have the following restrictions:

- New fields and functions cannot be introduced from outside
the class. Every field and function name must explicitly appear
at least once in the class definition. Subclassing is still
allowed.
- Unless the class itself uses "getattr" or "setattr" on itself,
no external code can do so. This lets the compiler eliminate the
object's dictionary unless the class itself needs it.

This lets the compiler see all the field names and assign them fixed slots
in a fixed sized object representation. Basically, this means simple objects
have a C/C++ like internal representation, with the performance that comes
with that representation.

With this, plus the "Shed Skin" restrictions, plus the array features of
"numarray", it should be possible to get computationally intensive code
written in Python up to C/C++ levels of performance. Yet all the dynamic
machinery of Python remains available if needed.

All that's necessary is not to surprise the compiler.

If this is all it takes, I would even be happy to have to declare which things
could be surprising - some statement like:

x can be anything

Would that help?

It kind of inverts the logic - and states that if you want what is now
the default behaviour, you have to ask for it.

- Hendrik

May 10 '07 #15

sturlamolden

On May 8, 5:53 pm, John Nagle <n...@animats.c omwrote:

The point here is that we don't need language changes or declarations
to make Python much faster. All we need are a few restrictions that
insure that, when you're doing something unusual, the compiler can
tell.

Franz, CMUCL, SBCL and GCL teams made Lisp almost as fast as C. A
dynamic language can be fast if the implementation is good.

If you look at SBCL and GCL, no code is interpreted. It's all compiled
on the fly to native machine code. The compiler begins with some code
and some input data, and compiles as much as it can. Then the RT
executes the machine code, compiles again, etc. Often long stretches
of code can be compiled without break, and tight performance critical
loops are usually compiled only once. In addition to this, one needs
an efficient system to cache compiled code, in order to do the
compilation work only once. making a dynamic language fast is not
rocket science.

We should have somthing like "GPython", a Python RT on top of a GCC
backend, similar to what the GCL team did for Lisp. There is no
obvious reason as to why Lisp should have better performance than
Python.

May 10 '07 #16

Tim Golden

sturlamolden wrote:

On May 8, 5:53 pm, John Nagle <n...@animats.c omwrote:

> The point here is that we don't need language changes or declarations
to make Python much faster. All we need are a few restrictions that
insure that, when you're doing something unusual, the compiler can
tell.

Franz, CMUCL, SBCL and GCL teams made Lisp almost as fast as C. A
dynamic language can be fast if the implementation is good.

If you look at SBCL and GCL, no code is interpreted. It's all compiled
on the fly to native machine code. The compiler begins with some code
and some input data, and compiles as much as it can. Then the RT
executes the machine code, compiles again, etc. Often long stretches
of code can be compiled without break, and tight performance critical
loops are usually compiled only once. In addition to this, one needs
an efficient system to cache compiled code, in order to do the
compilation work only once. making a dynamic language fast is not
rocket science.

We should have somthing like "GPython", a Python RT on top of a GCC
backend, similar to what the GCL team did for Lisp. There is no
obvious reason as to why Lisp should have better performance than
Python.

I doubt if anyone disputes the gist of what you're
saying[*], viz that Python could be made faster by using
technique (a), (b) or (c) which have been successful
elsewhere. At least that it's worth investgating.

But the relevant bit of your last paragraph is at the start:
"We should...". Unless someone or someones has the time,
inclination, money, backing, wherewithal etc. to implement
this or any other measure of speeding-up, it's all
pie-in-the-sky. Useful, maybe, as discussion of what
options are viable, but a project of this magnitude
doesn't just happen in some developer's lunchbreak.

Many people (and I include myself) are quite
happy with Python's speed. In fact, I'm quite happy
with most things about Python. Others would like to
see it faster. That's great. But unless people
puts their money where their mouths are, I don't
see it happening.

TJG
[*] Actually, knowing this community, I'm sure *someone's*
going to!

May 10 '07 #17

Terry Reedy

"sturlamold en" <st**********@y ahoo.nowrote in message
news:11******** **************@ y80g2000hsf.goo glegroups.com.. .
| Franz, CMUCL, SBCL and GCL teams made Lisp almost as fast as C. A
| dynamic language can be fast if the implementation is good.
|
| If you look at SBCL and GCL, no code is interpreted. It's all compiled
| on the fly to native machine code.

Unfortunately, native machine code depends on the machine, or at least the
machine being emulated by the hardware. Fortunately or not, the dominance
of the x386 model makes this less of a problem.

| The compiler begins with some code
| and some input data, and compiles as much as it can. Then the RT
| executes the machine code, compiles again, etc. Often long stretches
| of code can be compiled without break, and tight performance critical
| loops are usually compiled only once. In addition to this, one needs
| an efficient system to cache compiled code, in order to do the
| compilation work only once. making a dynamic language fast is not
| rocket science.

This sound somewhat similar to Psyco. If a code section is entered with
different machine types, recompilation is needed. One of the problems with
Psyco is that its efficient caching of multiple versions of compiled code
is space-hungry.

| We should have somthing like "GPython", a Python RT on top of a GCC
| backend, similar to what the GCL team did for Lisp. There is no
| obvious reason as to why Lisp should have better performance than
| Python.

In the 1980s, during the Lisp/AI boom, perhaps a billion dollars was put
into making Lisp run faster. Plus decades of work in Computer Science
departments. Other than that, I agree. Now, Python is slowly making
inroads into academia as a subject of research and theses, and the PyPy
group got at least one large EU grant.

Terry Jan Reedy

May 10 '07 #18

olsongt

On May 9, 5:02 pm, John Nagle <n...@animats.c omwrote:

Paul Boddie wrote:

Python has that capability mostly because it's free in an
"everything is a dictionary" implementation. ("When all you have
is a hash, everything looks like a dictionary".) But that limits
implementation performance. Most of the time, nobody is using
"setattr" to mess with the internals of a function, class, or
module from far, far away. But the cost for that flexibility is
being paid, unnecessarily.

I've had this idea in the back of my head for a year or so now, but
have been too lazy or busy or both to give the implementation a try.

The idea is to create a special dictionary for python internals that
contains a pointer-to-a-pointer for the value side of the dictionary,
instead of just the standard PyObject*. Then when you start to eval a
code block, you could do a one-time lookup of the pointer-to-the-
pointer for each method, attribute, etc; and store them in pre-
allocated slots. The calls in the block to various GET_ and SET_
functions can then just deal with pointer derefencing instead of a
dictionary lookup, and the dictionary can still get changed while this
is going on without things blowing up.

I think you'd get a big speed up by changing all those dictionary
function calls to inlined pointer deref code.

Anyone else buy this approach? Or see any fatal flaws?

May 10 '07 #19

Terry Reedy

<ol*****@verizo n.netwrote in message
news:11******** **************@ w5g2000hsg.goog legroups.com...
| The idea is to create a special dictionary for python internals that
| contains a pointer-to-a-pointer for the value side of the dictionary,
| instead of just the standard PyObject*. Then when you start to eval a
| code block, you could do a one-time lookup of the pointer-to-the-
| pointer for each method, attribute, etc; and store them in pre-
| allocated slots. The calls in the block to various GET_ and SET_
| functions can then just deal with pointer derefencing instead of a
| dictionary lookup, and the dictionary can still get changed while this
| is going on without things blowing up.
|
| I think you'd get a big speed up by changing all those dictionary
| function calls to inlined pointer deref code.
|
| Anyone else buy this approach? Or see any fatal flaws?

Within CPython functions, local variables are usually implemented as
numbered slots in a C array (of PyObject pointers). There is why a) the
compiler makes a first pass thru a function body (to determine what is
local and what is not) and b) LOAD_FAST is faster than LOAD_GLOBAL.

Terry Jan Reedy

May 10 '07 #20

Towards faster Python implementations - theory

Similar topics