Speed: bytecode vz C API calls

Jacek Generowicz

I have a program in which I make very good use of a memoizer:

def memoize(callable):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, callable(*args))
return proxy

which, is functionally equivalent to

class memoize:

def __init__ (self, callable):
self.cache = {}
self.callable = callable

def __call__ (self, *args):
try: return self.cache[args]
except KeyError:
return self.cache.setdefault(args, self.callable(*args))

though the latter is about twice as slow.

I've got to the stage where my program is still not fast enough, and
calls to the memoizer proxy are topping profiler output table. So I
thought I'd try to see whether I can speed it up by recoding it in C.

The closure-based version seems impossible to recode in C (anyone know
a way?), so I decided to write an extension type equivalent to "class
memoize" above. This seems to run about 10% faster than the pure
Python version ... which is still a lot slower than the pure Python
closure-based version.

I was expecting C-extensions which make lots of calls to Python C API
functions, not to be spectacularly fast, but I'm still a little
disapponited with what I've got. Does this sort of speedup (or rather,
lack of it) seem right, to those of you experienced with this sort of
thing? or does it look like I'm doing it wrong?

Could anyone suggest how I could squeeze more speed out of the
memoizer? (I'll include the core of my memoize extension type at the below.)

[What's the current state of the art wrt profiling Python, and its
extension modules? I've tried using hotshot (though not extensively
yet), but it seems to show even less information than profile, at
first blush.]
The memoize extension type is based around the following:

typedef struct {
PyObject_HEAD
PyObject* x_attr;
PyObject* cache;
PyObject* fn;
} memoizeObject;
static int
memoize_init(memoizeObject* self, PyObject* args, PyObject* kwds) {
PyArg_ParseTuple(args, "O", &(self->fn));
Py_INCREF(self->fn);
self->cache = PyDict_New();
return 0;
}

static PyObject*
memoize_call(memoizeObject* self, PyObject* args) {
PyObject* value = PyDict_GetItem(self->fn, args);
if (! value) {
PyDict_SetItem(self->cache, args,
PyObject_CallObject(self->fn, args));
value = PyDict_GetItem(self->cache, args);
}
//Py_INCREF(value);
return value;
};

Thanks,

Jul 18 '05 #1

Subscribe Post Reply

2423

Michael Hudson

Jacek Generowicz <ja**************@cern.ch> writes:

Could anyone suggest how I could squeeze more speed out of the
memoizer? (I'll include the core of my memoize extension type at the
below.)

You *might* get a speed bump by making your memoized callable a method
of the memoizer object rather than implementing tp_call.

Cheers,
mwh

--
M-x psych[TAB][RETURN]
-- try it

Jul 18 '05 #2

Jacek Generowicz

Jacek Generowicz <ja**************@cern.ch> writes:

I was expecting C-extensions which make lots of calls to Python C API
functions, not to be spectacularly fast, but I'm still a little
disapponited with what I've got. Does this sort of speedup (or rather,
lack of it) seem right, to those of you experienced with this sort of
thing? or does it look like I'm doing it wrong? static int
memoize_init(memoizeObject* self, PyObject* args, PyObject* kwds) {
PyArg_ParseTuple(args, "O", &(self->fn));
Obvious bug: &(self->cache) !!
Py_INCREF(self->fn);
self->cache = PyDict_New();
return 0;
}

Fixing the bug gives me a factor of 3 speedup over the class-based
version.

That's a bit more like what I was expecting.

Jul 18 '05 #3

Terry Reedy

"Jacek Generowicz" <ja**************@cern.ch> wrote in message
news:ty*************@pcepsft001.cern.ch...

I have a program in which I make very good use of a memoizer:

def memoize(callable):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, callable(*args))
return proxy .... I've got to the stage where my program is still not fast enough, and
calls to the memoizer proxy are topping profiler output table. So I
thought I'd try to see whether I can speed it up by recoding it in C.

Have you tried psyco on above? Or rather, on the proxies?
Can any of your callables be memoized by list rather than dict?
(ie, any count or restricted int args?)

If you have just a few wrapped functions, faster, special-cased proxies
(with exact arg lists) would be feasible and possibly easier for psyco.
Special-casing would include inlining the callable when possible
-- or rather, inlining the memoization within the function.

TJR

Jul 18 '05 #4

Aahz

In article <ty*************@pcepsft001.cern.ch>,
Jacek Generowicz <ja**************@cern.ch> wrote:

I have a program in which I make very good use of a memoizer:

def memoize(callable):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, callable(*args))
return proxy

I've got to the stage where my program is still not fast enough, and
calls to the memoizer proxy are topping profiler output table. So I
thought I'd try to see whether I can speed it up by recoding it in C.

I'm wondering whether you're tackling something the wrong way or perhaps
you need to throw hardware at the problem. Dicts are one of the most
highly optimized parts of Python, and if your cache hit rate is higher
than 90%, you'll be dealing with exceptions only rarely, so the speed of
cache.setdefault() isn't where your time is going. I'm wondering whether
you're hitting some kind of memory issue or something, or possibly you're
having cache effects (in CPU/memory), or maybe even your key is poorly
constructed so that you're getting hash duplicates.

OTOH, if the cache hit rate is smaller than that, you'll be paying the
penalty for raised exceptions *and* the callable() call, in which case
redesigning that part will pay the highest dividends.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.

Jul 18 '05 #5

Jacek Generowicz

"Terry Reedy" <tj*****@udel.edu> writes:

"Jacek Generowicz" <ja**************@cern.ch> wrote in message
news:ty*************@pcepsft001.cern.ch...
I have a program in which I make very good use of a memoizer:

def memoize(callable):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, callable(*args))
return proxy ...
I've got to the stage where my program is still not fast enough, and
calls to the memoizer proxy are topping profiler output table. So I
thought I'd try to see whether I can speed it up by recoding it in C.

Have you tried psyco on above?

Nope, but psyco would be politically unacceptable for my users, in
this case. OTOH, I've been wanting to play with psyco for ages ...
Can any of your callables be memoized by list rather than dict?
(ie, any count or restricted int args?)

You've lost me there.

Jul 18 '05 #6

Jacek Generowicz

aa**@pythoncraft.com (Aahz) writes:

In article <ty*************@pcepsft001.cern.ch>,
Jacek Generowicz <ja**************@cern.ch> wrote:

I have a program in which I make very good use of a memoizer:

def memoize(callable):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, callable(*args))
return proxy

I've got to the stage where my program is still not fast enough, and
calls to the memoizer proxy are topping profiler output table. So I
thought I'd try to see whether I can speed it up by recoding it in C.
I'm wondering whether you're tackling something the wrong way

There _is_ always that possibility, but I don't think that's what my
problem is here (it may well be elsewhere in my program).
or perhaps you need to throw hardware at the problem.
The definition of "not fast enough" in this case, is "much slower than
a competing C++ implementation", so throwing hardware at it advances
me not one bit, I'm afraid.
Dicts are one of the most highly optimized parts of Python,
Which is exactly why I'm using them extensively.
and if your cache hit rate is higher than 90%, you'll be dealing
with exceptions only rarely,
My cache hit rate is well over 90%.
so the speed of cache.setdefault() isn't where your time is going.
I'm wondering whether you're hitting some kind of memory issue or
something, or possibly you're having cache effects (in CPU/memory),
or maybe even your key is poorly constructed so that you're getting
hash duplicates.

No, I think that my memoizer works just fine. The "problem" is that
I've memoized a lot of functions, and quite a few of them are being
called in my inner loops. If I want to compete on speed with C++, I'm
simply going have to eliminate Python bytecode from my inner loops[*].

[*] Unless my overall algorithm is completely wrong, of course.

Jul 18 '05 #7

Jacek Generowicz

Jacek Generowicz <ja**************@cern.ch> writes:

Jacek Generowicz <ja**************@cern.ch> writes:
static int
memoize_init(memoizeObject* self, PyObject* args, PyObject* kwds) {
PyArg_ParseTuple(args, "O", &(self->fn));
Obvious bug: &(self->cache) !!
Py_INCREF(self->fn);
self->cache = PyDict_New();
return 0;

Apologies, I misreported the bug. self->fn is appropriate here
.... using self->fn (instread of self->cache) as the dictionary in the
other function is what was wrong, of course.
Fixing the bug gives me a factor of 3 speedup over the class-based
version.

This is still true.

Jul 18 '05 #8

Aahz

In article <ty*************@pcepsft001.cern.ch>,
Jacek Generowicz <ja**************@cern.ch> wrote:

aa**@pythoncraft.com (Aahz) writes:
In article <ty*************@pcepsft001.cern.ch>,
Jacek Generowicz <ja**************@cern.ch> wrote:

I have a program in which I make very good use of a memoizer:

def memoize(callable):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, callable(*args))
return proxy

I've got to the stage where my program is still not fast enough, and
calls to the memoizer proxy are topping profiler output table. So I
thought I'd try to see whether I can speed it up by recoding it in C.

I'm wondering whether you're tackling something the wrong way

There _is_ always that possibility, but I don't think that's what my
problem is here (it may well be elsewhere in my program).
or perhaps you need to throw hardware at the problem.

The definition of "not fast enough" in this case, is "much slower than
a competing C++ implementation", so throwing hardware at it advances
me not one bit, I'm afraid.
Dicts are one of the most highly optimized parts of Python,

Which is exactly why I'm using them extensively.
and if your cache hit rate is higher than 90%, you'll be dealing
with exceptions only rarely,

My cache hit rate is well over 90%.

All your comments taken as a set make no sense. There's just no way
that a bunch of dict accesses in Python can be much slower than a C++
program. (C program, *maybe* -- not C++)

so the speed of cache.setdefault() isn't where your time is going.
I'm wondering whether you're hitting some kind of memory issue or
something, or possibly you're having cache effects (in CPU/memory),
or maybe even your key is poorly constructed so that you're getting
hash duplicates.

No, I think that my memoizer works just fine. The "problem" is that
I've memoized a lot of functions, and quite a few of them are being
called in my inner loops. If I want to compete on speed with C++, I'm
simply going have to eliminate Python bytecode from my inner loops[*].

[*] Unless my overall algorithm is completely wrong, of course.

Waitasec .... okay, I see what your problem is, assuming you're
describing it correctly. The problem is that you're calling Python
functions in an inner loop. I think you're going to need to write a
solution that doesn't call functions (function unrolling, so to speak).
Instead of returning a memoizer function, just use a dict of dicts.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.

Jul 18 '05 #9

Michael Hudson

"Terry Reedy" <tj*****@udel.edu> writes:

"Jacek Generowicz" <ja**************@cern.ch> wrote in message
news:ty*************@pcepsft001.cern.ch...
I have a program in which I make very good use of a memoizer:

def memoize(callable):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, callable(*args))
return proxy

...
I've got to the stage where my program is still not fast enough, and
calls to the memoizer proxy are topping profiler output table. So I
thought I'd try to see whether I can speed it up by recoding it in C.

Have you tried psyco on above? Or rather, on the proxies?

I doubt that would help -- the largest overhead is likely the function
call, and I think calling across the pysco/python boundary is
expensive.

Cheers,
mwh

--
Screaming 14-year-old boys attempting to prove to each other that
they are more 3133t than j00.
-- Reason #8 for quitting slashdot today, from
http://www.cs.washington.edu/homes/k.../slashdot.html

Jul 18 '05 #10

Jacek Generowicz

aa**@pythoncraft.com (Aahz) writes:

Waitasec .... okay, I see what your problem is, assuming you're
describing it correctly. The problem is that you're calling Python
functions in an inner loop.
.... indeedy ...
I think you're going to need to write a solution that doesn't call
functions (function unrolling, so to speak). Instead of returning a
memoizer function, just use a dict of dicts.

Sounds interesting. But how do I get the dictionary to use the
function which it is memoizing, to calculate values which haven't been
cached yet?

Hmm ... maybe by subclassing dict, I can add the necessary
functionality ... but even then I don't see how to avoid wrapping the
dictionary access in a function call ... and then we're back to square
one.

Jul 18 '05 #11

Peter Otten

Jacek Generowicz wrote:

aa**@pythoncraft.com (Aahz) writes:
Waitasec .... okay, I see what your problem is, assuming you're
describing it correctly. The problem is that you're calling Python
functions in an inner loop.
... indeedy ...
I think you're going to need to write a solution that doesn't call
functions (function unrolling, so to speak). Instead of returning a
memoizer function, just use a dict of dicts.

Sounds interesting. But how do I get the dictionary to use the
function which it is memoizing, to calculate values which haven't been
cached yet?

Guessing into the blue:

try:
result = cache[fun, args]
except KeyError:
result = cache[fun, args] = fun(args)

should make a fast inner loop if cache misses are rare.

Hmm ... maybe by subclassing dict, I can add the necessary
functionality ... but even then I don't see how to avoid wrapping the
dictionary access in a function call ... and then we're back to square
one.

Subclassing dict will not make it faster. Maybe it's time to show some inner
loop lookalike to the wider public for more to the point suggestions...

Peter

Jul 18 '05 #12

Terry Reedy

"Jacek Generowicz" <ja**************@cern.ch> wrote in message
news:ty*************@pcepsft001.cern.ch...

"Terry Reedy" <tj*****@udel.edu> writes:
Can any of your callables be memoized by list rather than dict?
(ie, any count or restricted int args?)
You've lost me there.

Lookup tables. Dicts can memoize any function. Lists can memoize or
partially memoize functions with one or more periodic domains. Perhaps I am
being too obvious or simple for *you*, but I have seen people memoize fib()
or fac() with a generic dict-based wrapper, which is overkill compared to a
simple list.

In another post, you said
If I want to compete on speed with C++, I'm
simply going have to eliminate Python bytecode from my inner loops[*].
[*] Unless my overall algorithm is completely wrong, of course.

For speed, you need to eliminate *executed* and *expensive* bytecodes, not
static, possibly skipped-over, bytecodes. I'll take that as what you meant.
It is certainly what our comments are aimed at. Hence my other suggestion
to eliminate a layer of (expensive) function calls by combining value
calculation and storage in one function (more space, less time). More
specific suggestions will require more specific examples.

Terry J. Reedy

Jul 18 '05 #13

Aahz

In article <ty*************@pcepsft001.cern.ch>,
Jacek Generowicz <ja**************@cern.ch> wrote:

aa**@pythoncraft.com (Aahz) writes:

I think you're going to need to write a solution that doesn't call
functions (function unrolling, so to speak). Instead of returning a
memoizer function, just use a dict of dicts.

Sounds interesting. But how do I get the dictionary to use the
function which it is memoizing, to calculate values which haven't been
cached yet?

Same way you decide which function to call currently. Essentially, you
do something like this:

def foo():
pass

def bar():
pass

lookup = {
'foo': {'func': foo, 'cache': {}},
'bar': {'func': bar, 'cache': {}},
}

for operation,args in WorkQueue:
curr = lookup[operation]
try:
return curr['cache'][args]
except KeyError:
return curr['cache'].setdefault(args, curr['func'](*args))

(Note: untested; you'll have to fill in the blanks)
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.

Jul 18 '05 #14

Jacek Generowicz

"Terry Reedy" <tj*****@udel.edu> writes:

"Jacek Generowicz" <ja**************@cern.ch> wrote in message
news:ty*************@pcepsft001.cern.ch...
"Terry Reedy" <tj*****@udel.edu> writes:
Can any of your callables be memoized by list rather than dict?
(ie, any count or restricted int args?)
You've lost me there.

Lookup tables. Dicts can memoize any function. Lists can memoize or
partially memoize functions with one or more periodic domains. Perhaps I am
being too obvious or simple for *you*, but I have seen people memoize fib()
or fac() with a generic dict-based wrapper, which is overkill compared to a
simple list.

The arguments passed to the functions I have memoized are usually
types, so, no, I cannot memoize them by list.
In another post, you said
If I want to compete on speed with C++, I'm
simply going have to eliminate Python bytecode from my inner loops[*].
[*] Unless my overall algorithm is completely wrong, of course.

For speed, you need to eliminate *executed* and *expensive* bytecodes, not
static, possibly skipped-over, bytecodes. I'll take that as what you meant.
It is certainly what our comments are aimed at. Hence my other suggestion
to eliminate a layer of (expensive) function calls by combining value
calculation and storage in one function (more space, less time).

You are suggesting that I hand-memoize my functions, rather than using
the generic memoizer I posted at the top of the thread? in order to
save, in the case of cache misses, the function call overhead to the
memoized function? If I had a high proportion of cache misses, then I
suspect that would be worth while. In my case, however, I doubt that
would have any appreciable effect. Almost all the calls to the
memoized functions are cache hits, so effectively there is no value
calculation ... it's pretty much all lookup, so my "expensive"
function is *effectively*:
def proxy(*args):
try:
return cache[args]
except KeyError: pass

(where proxy is a closure over cache). Yes, there is something other
than "pass" in the handler, but we (essentially) never go there.

There's really not much left to play with (though I'm open to
suggestions), which is why I wanted to try recoding in C.

Jul 18 '05 #15

Jacek Generowicz

Peter Otten <__*******@web.de> writes:

Jacek Generowicz wrote:
aa**@pythoncraft.com (Aahz) writes:
I think you're going to need to write a solution that doesn't call
functions (function unrolling, so to speak). Instead of returning a
memoizer function, just use a dict of dicts.
Sounds interesting. But how do I get the dictionary to use the
function which it is memoizing, to calculate values which haven't been
cached yet?

Guessing into the blue:

try:
result = cache[fun, args]
except KeyError:
result = cache[fun, args] = fun(args)

Allow me to remind you of the memoizer I posted at the top of the thread:

def memoize(callable):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, callable(*args))
return proxy

One key feature of this is that I can use it to replace any[*]
function with a functionally equivalent but faster one, without having
to change any of the call sites of the original.

IIUC, Aahz suggested to replace the proxy function with a
dictionary. This sounds reasonable, as, after all, the proxy maps its
arguments to its return values, while a dictionary maps keys to
values. The whole point of this is that function call overhead is
large compared to dictionary lookup, which is (almost) all that the
proxy does. However, it's the "almost" that creates the problem. If
I'm prepared to mess around with the calls to the original functions
and replace them with your suggestion, then, of course the problem is
solved ... however, I _really_ do not want to replace the calls to the
original ...
Maybe it's time to show some inner loop lookalike to the wider
public for more to the point suggestions...

Sure.
def foo(some_type):
...
return something
foo = memoize(foo)

data = [int, float, my_class, his_class, str, other_class] * 10**6

map(foo, data) [+]
[*] ... with hashable arguments, etc. etc.

[+] Can you see why I don't want to mess with the call site ?

Jul 18 '05 #16

Jacek Generowicz

aa**@pythoncraft.com (Aahz) writes:

In article <ty*************@pcepsft001.cern.ch>,
Jacek Generowicz <ja**************@cern.ch> wrote:
aa**@pythoncraft.com (Aahz) writes:

I think you're going to need to write a solution that doesn't call
functions (function unrolling, so to speak). Instead of returning a
memoizer function, just use a dict of dicts.
^^^^^^^^

(Maybe this is where we are misunderstanding eachother. On first
reading, I thought that "memoizer" was just a typo. I don't return a
memoizer function, I return a memoized function.)
Sounds interesting. But how do I get the dictionary to use the
function which it is memoizing, to calculate values which haven't been
cached yet?
Same way you decide which function to call currently.

The way I decide which function to call currently is to type its name
directly into the source at the call site. Deciding which function to
call is not the problem: calling any function at all, once I've
reduced myself to having _just_ a dictionary, is the problem.

Maybe I didn't make the use of "memoize" clear enough.

You use it thus:

def foo():
...

foo = memoize(foo)

And from now on foo is a lot faster than it was before[*].
Essentially, you do something like this:

def foo():
pass

def bar():
pass

lookup = {
'foo': {'func': foo, 'cache': {}},
'bar': {'func': bar, 'cache': {}},
}

for operation,args in WorkQueue:
curr = lookup[operation]
try:
return curr['cache'][args]
except KeyError:
return curr['cache'].setdefault(args, curr['func'](*args))

Nononononooooo ! :-)

In any given loop, I know _which_ function I want to call, so the
dictionary of dictionaries is pointless. Let's recap the original
memoizer:

def memoize(callable):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, callable(*args))
return proxy

I thought that you were suggesting to replace "proxy" with just the
dictionary "cache", in order to save the function call overhead, as
proxy is (almost) just a wrapper around the dictionary lookup. That's
great, but what do I do in the (very rare) cases where I'm presented
with an argument/key I haven't seen yet? Somehow, I need to recognize
that the key is not present, call the original function, and add the
key-value pair to the cache. I can't see how to do that without
re-introducing the function call. I could to it by inlining the try
block, but then I wouldn't be able to use the memoized function inside
map ... and turning a map into a for-loop would lose me more than I'd
gained.
[*] With a whole bunch of exceptions which are beside the point here.

Jul 18 '05 #17

Stuart Bishop

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 10/12/2003, at 8:55 AM, Jacek Generowicz wrote:

Peter Otten <__*******@web.de> writes:
Jacek Generowicz wrote:
aa**@pythoncraft.com (Aahz) writes:

I think you're going to need to write a solution that doesn't call
functions (function unrolling, so to speak). Instead of returning a
memoizer function, just use a dict of dicts.

Sounds interesting. But how do I get the dictionary to use the
function which it is memoizing, to calculate values which haven't
been
cached yet?
Guessing into the blue:

try:
result = cache[fun, args]
except KeyError:
result = cache[fun, args] = fun(args)

Allow me to remind you of the memoizer I posted at the top of the
thread:

def memoize(callable):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args,
callable(*args))
return proxy

One key feature of this is that I can use it to replace any[*]
function with a functionally equivalent but faster one, without having
to change any of the call sites of the original.

proxy does. However, it's the "almost" that creates the problem. If
I'm prepared to mess around with the calls to the original functions
and replace them with your suggestion, then, of course the problem is
solved ... however, I _really_ do not want to replace the calls to the
original ...

About the only thing that could be done then is:

def memoize(callable):
cache = {}
def proxy(*args, _cache=cache):
try: return _cache[args]
except KeyError: return cache.setdefault(args, callable(*args))
return proxy

Might help a little bit, although probably not enough.

- --
Stuart Bishop <st****@stuartbishop.net>
http://www.stuartbishop.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (Darwin)

iD8DBQE/1n5TAfqZj7rGN0oRAlSQAJ9WZe39F8L1KECCMTS3HxUnWKGx8w CgjZ8P
68pLTP33rlFI0eSEHK8FVXY=
=25Fx
-----END PGP SIGNATURE-----

Jul 18 '05 #18

Francis Avila

Stuart Bishop wrote in message ...

About the only thing that could be done then is:

def memoize(callable):
cache = {}
def proxy(*args, _cache=cache):

Not legal. *args must always come after default args. There's no way to do
this without changing the calling characteristics of the function, which the
OP has stated is vital.

It is probably possible to custom-build a code object which makes _cache a
local reference to the outer cache, and play with the bytecode to make the
LOAD_DEREF into a LOAD_FAST, but it's not worth the effort. I don't know
how fast/slow a LOAD_DEREF is compared to a LOAD_FAST, though. In any case,
it won't help more than a few percent.

Jacek, I think this is as fast as pure Python will get you with this
approach. If the speed is simply not acceptable, there are some things you
need to consider, Is this the *real* bottleneck?

Consider this before you delve into C solutions, which greatly increase
headaches. An easy way to check is to cut out the map and memoization and
just use a dictionary preloaded with known argument:value pairs for a given
set of test data.

e.g.,

def func(...):
...

data = [foo, bar, ....]
cache = dict([(i, func(i)) for i in data])

def timetrial(_data=data, _cache=cache):
return map(_cache.__getitem__, _data)
# [operator.getitem(_cache, i) for i in _data] *might* be faster.

then timeit.py -s"import ttrial" "ttrial.timetrial()"

If it's still too slow, then either Python simply isn't for this task, or
you need to optimize your functions.

If this isn't the real bottleneck:
- Perhaps the functions I'm memoizing need optimization themselves?
- Are there any modules out there which implement these in C already?
- How does the competing C++ code do it? Does it get its speed from some
wicked data structure for the memoizing, or are the individual functions
faster, or what?

If this *is* the only significant bottleneck, but you want to try harder to
speed it up, you can pursue the C closure path you were mentioning. With
the caveat that I don't do any CPython hacking, I know that closures are
implemented using co_cellvars and the LOAD_DEREF bytecode. The Python C api
does describe an api to cell objects, so that will be the place to look, if
it's possible to do a closure in C. It may not be faster than your
straightforward approach, of course.

However, comparing your Python to a C++ version on the basis of speed is
most certainly the wrong way to go. If the C++ is well designed with good
algorithms, its going to be faster than Python. Period. The advantage of
Python is that it's much faster and easier to *write* (and less likely to
introduce bugs), easier to *read* and maintain, and can model your problem
more intuitively and flexibly. But if you need maximum speed at all costs,
the best Python can do for you is act as a glue language to bits and pieces
of hand-crafted C.

Do you really *need* the speed, or are you just bothered that it's not as
fast as the C++ version?
--
Francis Avila

Jul 18 '05 #19

Jacek Generowicz

"Francis Avila" <fr***********@yahoo.com> writes:

It is probably possible to custom-build a code object which makes _cache a
local reference to the outer cache, and play with the bytecode to make the
LOAD_DEREF into a LOAD_FAST, but it's not worth the effort.
I'm tempted to agree.
Jacek, I think this is as fast as pure Python will get you with this
approach.
That's pretty much what I thought ... which is why I was (reluctantly)
investigating the C approach.
If the speed is simply not acceptable, there are some things you
need to consider, Is this the *real* bottleneck?
Well, when I profile the code, proxy comes out at the top in terms of
total time. Yes, there are other functions that are hotspots too. Yes,
maybe I'm being completely stupid, and 99% of those calls are unnecessary.

However, I simply thought that if I can increase the speed of "proxy"
by a factor of 3 or 7 or something, without too much pain, by recoding
it in C, then I'd get a noticeable speed gain overall. If it involves
re-hacking the interpreter, then I won't bother.
Consider this before you delve into C solutions, which greatly increase
headaches.
You don't need to tell me that :-)
An easy way to check is to cut out the map and memoization and
just use a dictionary preloaded with known argument:value pairs for a given
set of test data.

e.g.,

def func(...):
...

data = [foo, bar, ....]
cache = dict([(i, func(i)) for i in data])

def timetrial(_data=data, _cache=cache):
return map(_cache.__getitem__, _data)
# [operator.getitem(_cache, i) for i in _data] *might* be faster.

then timeit.py -s"import ttrial" "ttrial.timetrial()"
I tried something similar, namely inlining the

try: return cache[args]
except KeyError: return cache.setdefault(args, callable(*args))

rather than using proxy. This gives a little over a factor of 3
speedup, but it's not something I can use everywhere, as often the
proxy is called inside map(...)
If it's still too slow, then either Python simply isn't for this task, or
you need to optimize your functions.

If this isn't the real bottleneck:
- Perhaps the functions I'm memoizing need optimization themselves?
But, to a pretty good approximation, they _never_ get called (the
value is (almost) always looked up). How would optimizing a function
that doesn't get called, help ?
- Are there any modules out there which implement these in C
already?
Other than the competing C++ implementation, no.
- How does the competing C++ code do it? Does it get its speed from some
wicked data structure for the memoizing, or are the individual functions
faster, or what?
I have about 10 functions, written in Python, which the profiler
highlights as taking up most of the time, between them (of which
"proxy" is the main culprit). The C++ version doesn't have _any_
Python in its inner loops. I think it's as simple as that.

(Admittedly, a group of 4 of my functions was initially written with
development ease in mind, and should be replaced by a single mean and
lean C(++) function. That's probably where the real bottleneck is.)
If this *is* the only significant bottleneck,
No, it's not the _only_ bottleneck.

Yes, I am pursuing other paths too. But I feel more confident that I
know what I am doing in the other cases, so I won't bother the list
with those. If you remember, my original question was about what I
could expect in terms of speedup when recoding a Python function in C,
which will make many calls to the Python C API. It's not that I'm
determined to fix my whole speed problem just in this one area, but I
would like to understand a bit more about what to expect when going to
C.
but you want to try harder to speed it up, you can pursue the C
closure path you were mentioning.
Yes, I think that this could be edifying. It's also likely to be a
huge pain, so I'm not that determined to pursue it, as I do have more
productive things to be getting on with.
However, comparing your Python to a C++ version on the basis of speed is
most certainly the wrong way to go. If the C++ is well designed with good
algorithms, its going to be faster than Python. Period.
Not if my inner loops are C.

(BTW, I don't believe that the C++ version is all that wonderfully
designed, but enough about that :-)
But if you need maximum speed at all costs, the best Python can do
for you is act as a glue language to bits and pieces of hand-crafted
C.
I think that it's the other way around, in this case. I build up a
whole structure in Python ... and just need C(++) to call a very small
subset of the functionality I provide, very efficiently.
Do you really *need* the speed, or are you just bothered that it's not as
fast as the C++ version?
Politically speaking, yes, I need the speed. If a silly benchmark (one
which completely misrepresents reality, BTW), shows that the Python
version is slower than the C++ one, then the Python one is terminated.

However, as you point out:
The advantage of Python is that it's much faster and easier to
*write* (and less likely to introduce bugs), easier to *read* and
maintain, and can model your problem more intuitively and flexibly.

.... and for these reasons I am very keen to ensure that the Python
version survives ... which, ironically enough, means that it must be
fast, even though I personally don't care that much about the speed.

Jul 18 '05 #20

Terry Reedy

You presented an example function as taking a type (including new classes)
as parameter. If your type domain is boundedly finite and preknowable*, you
could precompute the function for *all* inputs and replace the call with a
dict lookup (or bound method for map).

(*) If 'preknowable', could either create classes with factory function that
adds them to cache or give metaclass that does same.

Terry

Jul 18 '05 #21

Aahz

In article <ty*************@pcepsft001.cern.ch>,
Jacek Generowicz <ja**************@cern.ch> wrote:

I tried something similar, namely inlining the

try: return cache[args]
except KeyError: return cache.setdefault(args, callable(*args))

rather than using proxy. This gives a little over a factor of 3
speedup, but it's not something I can use everywhere, as often the
proxy is called inside map(...)

Why are you using map()? Generally speaking, map() won't be any faster
than a for loop.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.

Jul 18 '05 #22

Aahz

[Too busy to snip, sorry]

In article <ty*************@lxplus025.cern.ch>,
Jacek Generowicz <ja**************@cern.ch> wrote:

aa**@pythoncraft.com (Aahz) writes:
In article <ty*************@pcepsft001.cern.ch>,
Jacek Generowicz <ja**************@cern.ch> wrote:
aa**@pythoncraft.com (Aahz) writes:

I think you're going to need to write a solution that doesn't call
functions (function unrolling, so to speak). Instead of returning a
memoizer function, just use a dict of dicts. ^^^^^^^^

(Maybe this is where we are misunderstanding eachother. On first
reading, I thought that "memoizer" was just a typo. I don't return a
memoizer function, I return a memoized function.)
Sounds interesting. But how do I get the dictionary to use the
function which it is memoizing, to calculate values which haven't been
cached yet?

Same way you decide which function to call currently.

The way I decide which function to call currently is to type its name
directly into the source at the call site. Deciding which function to
call is not the problem: calling any function at all, once I've
reduced myself to having _just_ a dictionary, is the problem.

Maybe I didn't make the use of "memoize" clear enough.

You use it thus:

def foo():
...

foo = memoize(foo)

And from now on foo is a lot faster than it was before[*].

That's why I changed the Subject: to "Function unrolling"; you're going
to need to change your code to avoid function calls in inner loops. One
technique you might try is to create a generalized inner loop function:

def innerloop(iterable, func, cache):
l = []
for args in iterable:
try:
l.append(cache[args])
except KeyError:
l.append(cache.setdefault(args, func(args))

return l

Yes, it's a performance hack, but I bet it's a lot clearer than C++
code.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.

Jul 18 '05 #23

Jacek Generowicz

aa**@pythoncraft.com (Aahz) writes:

Generally speaking, map() won't be any faster than a for loop.

This is at odds with my own experience (including a lot of timing runs
in the last two days), countless posts on this newsgroup over the
years, and, for exapmle, this article

http://www.python.org/doc/essays/list2str.html

by Guido, in which he concludes that

"An implied loop in map() is faster than an explicit for loop"

Jul 18 '05 #24

Aahz

In article <ty*************@pcepsft001.cern.ch>,
Jacek Generowicz <ja**************@cern.ch> wrote:

aa**@pythoncraft.com (Aahz) writes:

Generally speaking, map() won't be any faster than a for loop.

This is at odds with my own experience (including a lot of timing runs
in the last two days), countless posts on this newsgroup over the
years, and, for exapmle, this article

http://www.python.org/doc/essays/list2str.html

All right, what I should have said is that map() isn't enough faster
than a for loop in the cases where it is faster to make much difference,
and there are definitely cases where it's slower than the equivalent for
loop, and it's usually less readable than some other way of getting
speed.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.

Jul 18 '05 #25

Jacek Generowicz

aa**@pythoncraft.com (Aahz) writes:

That's why I changed the Subject: to "Function unrolling"; you're going
to need to change your code to avoid function calls in inner loops. One
technique you might try is to create a generalized inner loop function:

def innerloop(iterable, func, cache):
l = []
for args in iterable:
try:
l.append(cache[args])
except KeyError:
l.append(cache.setdefault(args, func(args))

return l
OK, I see what you mean.

I agree, I can do this, in some places, and toy tests suggest that
getting rid of the function call could give me a factor of 3
improvement. I'll have to change the code in many places, but that's
simpler than going to C.

(However, we disagree on the specific formula you give above: I think
that map is faster: see other post in the thread. [A quick test
suggests that a map is faster than the loop by a factor of 2.])
Yes, it's a performance hack, but I bet it's a lot clearer than C++
code.

No doubt about that :-)
Cheers,

Jul 18 '05 #26

Jacek Generowicz

"Terry Reedy" <tj*****@udel.edu> writes:

You presented an example function as taking a type (including new classes)
as parameter. If your type domain is boundedly finite
It's unboundedly finite :-)
and preknowable*, you could precompute the function for *all* inputs
and replace the call with a dict lookup (or bound method for map).

(*) If 'preknowable', could either create classes with factory function that
adds them to cache or give metaclass that does same.

Hmmm ... all the non-builtin types in my domain are actually
dynamically created, and are themselves instances of a metaclass,
which could be instructed to add information about the new classes, to
the caches, as the classes are being created.

However, there are a number of problems I foresee (which I don't think
are particularly interesting to discuss here). Still, it could be put
to good use in some of the cases. I'll bear it in mind, thanks.

But first, I think that I'll try inlining the exception handler in
those cases where it can be done.
Cheers,

Jul 18 '05 #27

Francis Avila

Aahz wrote in message ...

In article <ty*************@pcepsft001.cern.ch>,
Jacek Generowicz <ja**************@cern.ch> wrote:
aa**@pythoncraft.com (Aahz) writes:
http://www.python.org/doc/essays/list2str.html

All right, what I should have said is that map() isn't enough faster
than a for loop in the cases where it is faster to make much difference,
and there are definitely cases where it's slower than the equivalent for
loop, and it's usually less readable than some other way of getting
speed.

That essay is a bit old, so not everything it says may still be true.

But in 2.3 at least, one case in which map is almost always much faster is
when the function used is a builtin. In this case, map (which is also
builtin) calls the C function from C, and I'm pretty sure it gets this speed
by staying outside of the Python eval loop. This can be wickedly fast (in
Python terms), and faster than list comps or for-loops with explicit list
appending, as a simple timeit will show.

For *Python* functions and lambdas (non-builtins), map can very often be
slower than a for-loop or list comprehension, for reasons not quite clear to
me. And of course, where a for-loop or list comp can avoid calls
alltogether, it will always be faster than the equivalent function-using map
(unless map uses None as the function).

I think in the OP's case, any little drop of speed he can get more than
justifies the very, very slightly un-idiomatic use of map instead of a
for-loop. Be warned that map() may go away in future Pythons, though....
--
Francis Avila

Jul 18 '05 #28

Jacek Generowicz

"Francis Avila" <fr***********@yahoo.com> writes:

Aahz wrote in message ...
[about map]

and it's usually less readable than some other way of getting
speed.

Actually, I use map in my code because I find it very elegant and very
readable[*]. Usually, the speed has nothing to do with it. But if I
already have a map, and I'm interested in speed then I'm loathe to
replace it with something slower.
I think in the OP's case, any little drop of speed he can get more than
justifies the very, very slightly un-idiomatic use of map instead of a
for-loop.
Why do you consider my use of map to be un-idiomatic? I want to
transform a sequence into another one, by applying a function to each
element of the original. This is exactly the point of map. What am I missing?
Be warned that map() may go away in future Pythons, though....

Yes, I've heard idle rumours to this effect. I think it would be a
great shame.

Yes, I realize that there are list comprehensions. I appreciate both,
and find that sometimes one looks more natural than the other, and
sometimes it's the other way around. I find this ability to pick the
right-looking idiom worth more than a One Way To Do it principle.

[*] There seems to be a faction in the Python community which
considers map, reduce, filter and lambda to be evil infiltrators
from those perfidious inventions that are functional languages,
and therefore the claim is made that they have no place in
Python. Some newbies believe this **** and the myth spreads. Just
my 2 cents.

Jul 18 '05 #29

Peter Otten

Jacek Generowicz wrote:

One key feature of this is that I can use it to replace any[*]
function with a functionally equivalent but faster one, without having
to change any of the call sites of the original.
Understood, but not until this post.
Strictly speaking, all optimizations that fulfill the above criterium should
be built into the compiler, while memoizing will not for its memory
tradeoffs and dependancy of the input data.
IIUC, Aahz suggested to replace the proxy function with a
dictionary. This sounds reasonable, as, after all, the proxy maps its
arguments to its return values, while a dictionary maps keys to
values. The whole point of this is that function call overhead is
large compared to dictionary lookup, which is (almost) all that the
proxy does. However, it's the "almost" that creates the problem. If
I'm prepared to mess around with the calls to the original functions
and replace them with your suggestion, then, of course the problem is
solved ... however, I _really_ do not want to replace the calls to the
original ...
See below for my ideas towards a generalized memoizing framework. I've
adressed the problem by writing a map() variant that knows about result
caches (essentially Aahz' innerloop() I see). You could use it to shade the
builtin.
def foo(some_type):
...
return something
foo = memoize(foo)

data = [int, float, my_class, his_class, str, other_class] * 10**6

map(foo, data) [+]
Do you discard the result for real?
[+] Can you see why I don't want to mess with the call site ?

No.

Now to my code. The idea is to make memoized functions available to
different modules from a central registry, and to expose the cached results
for hand-optimized hotspots. Otherwise it was all mentioned in this thread,
if not the original post.

<memoize.py>
import __builtin__

class RegistryItem:
""" provide the necessary infrastructure to transparently speed up
one function
"""
def __init__(self, func, argc):
cache = self.cache = {}
def memoized(*args):
try:
return cache[args]
except KeyError:
result = cache[args] = func(*args)
return result

assert argc > 0
if argc == 1:
def callseq(seq):
print seq[:3]
result = []
append = result.append
for arg in seq:
try:
append(cache[arg])
except KeyError:
value = cache[arg] = func(arg)
append(value)
return result
else:
def callseq(*seqs):
result = []
append = result.append
if len(seqs) == 1:
# multiple args as one tuple in the sequence
for args in seqs[0]:
try:
append(cache[args])
except KeyError:
value = cache[args] = func(*args)
append(value)
else:
# multiple args in multiple sequences
# XXX special case (too lazy for now)
return map(memoized, *seqs)
return result
self.wrapped = memoized
self.callseq = callseq
self.original = func

class Registry:
""" Registry for function variants using a result cache
fast = register(slow, <number of args>)
"""
def __init__(self):
self.data = {}
def __call__(self, func, argc):
assert argc > 0
try:
return self.data[func]
except KeyError:
ri = RegistryItem(func, argc)
self.data[func] = ri
self.data[ri.wrapped] = ri
return ri.wrapped

def __getitem__(self, func):
return self.data[func]

def getmap(self):
""" provide a cache-enhanced version of map
"""
data = self.data
def memomap(func, *seqs):
try:
callseq = data[func].callseq
except KeyError:
return __builtin__.map(func, *seqs)
return callseq(*seqs)
return memomap

registry = register = Registry()
</memoize.py>

<example.py>
# for improved clarity of this example I've have omitted
# rebinding the original names and instead used fastXXX
import memoize, time

def slow(arg):
print "no cache or cache miss with %r" % arg
time.sleep(0.1)
return "<%s>" % arg

sample = range(10) * 100

# preparation code

fast = memoize.register(slow, True)
# typically used like so:
# slow = memoize.register(slow)

fastmap = memoize.registry.getmap()
# could also be used like so:
# map = memoize.registry[slow].getmap()
# some tests
assert fast is memoize.registry[fast].wrapped
assert fast is memoize.registry[slow].wrapped
assert slow is memoize.registry[slow].original
# stray function call
print slow(10)

# stray function call using cache
print fast(20)

#built-in loop support
fastmap(slow, sample)
fastmap(fast, sample)

# hand-crafted code using the cache
cache = memoize.registry[slow].cache
for arg in sample:
cache[arg]

</example.py>

Feel free to play with the above. Be warned that there will be bugs.
However, before working on the code I will have to profile to make at least
sure it does not slow down things... maybe this weekend.

Peter

Jul 18 '05 #30

Bengt Richter

On Tue, 9 Dec 2003 23:50:06 -0500, "Francis Avila" <fr***********@yahoo.com> wrote:

Stuart Bishop wrote in message ...
About the only thing that could be done then is:

def memoize(callable):
cache = {}
def proxy(*args, _cache=cache):
Not legal. *args must always come after default args. There's no way to do

Never say never ;-)
A function has the __get__ method, which is what getattr magic uses to make a bound method.
You can use this put cache in the "self" position of a bound method, like

def sq(x): return x*x ... def memoize(func): # callable is a builtin ;-) ... def proxy(cache, *args):
... try: return cache[args]
... except KeyError: return cache.setdefault(args, func(*args))
... return proxy.__get__({}, proxy.__class__)
...
msq = memoize(sq)
msq <bound method function.proxy of {}> msq(3) 9 msq <bound method function.proxy of {(3,): 9}> msq(3) 9 msq(2) 4 msq <bound method function.proxy of {(3,): 9, (2,): 4}>

Looking at the code, it might be worth timing, if you have 90%+ cache hits.
I'd be curious to see what you get
import dis
dis.dis(msq)
3 0 SETUP_EXCEPT 12 (to 15)
3 LOAD_FAST 0 (cache)
6 LOAD_FAST 1 (args)
9 BINARY_SUBSCR
10 RETURN_VALUE
11 POP_BLOCK
12 JUMP_FORWARD 41 (to 56)

4 >> 15 DUP_TOP
16 LOAD_GLOBAL 2 (KeyError)
19 COMPARE_OP 10 (exception match)
22 JUMP_IF_FALSE 29 (to 54)
25 POP_TOP
26 POP_TOP
27 POP_TOP
28 POP_TOP
29 LOAD_FAST 0 (cache)
32 LOAD_ATTR 3 (setdefault)
35 LOAD_FAST 1 (args)
38 LOAD_DEREF 0 (func)
41 LOAD_FAST 1 (args)
44 CALL_FUNCTION_VAR 0
47 CALL_FUNCTION 2
50 RETURN_VALUE
51 JUMP_FORWARD 2 (to 56) 54 POP_TOP 55 END_FINALLY 56 LOAD_CONST 0 (None)

59 RETURN_VALUE

this without changing the calling characteristics of the function, which the
OP has stated is vital. Should be ok, but I wonder about non-hashable args. Never expect them?

It is probably possible to custom-build a code object which makes _cache a
local reference to the outer cache, and play with the bytecode to make the
LOAD_DEREF into a LOAD_FAST, but it's not worth the effort. I don't know
how fast/slow a LOAD_DEREF is compared to a LOAD_FAST, though. In any case,
it won't help more than a few percent. The above seems to do it for the cache-hit case ;-)
Jacek, I think this is as fast as pure Python will get you with this
approach. If the speed is simply not acceptable, there are some things you
need to consider, Is this the *real* bottleneck?

Regards,
Bengt Richter

Jul 18 '05 #31

Jacek Generowicz

Peter Otten <__*******@web.de> writes:

See below for my ideas towards a generalized memoizing framework.
[I haven't the time to have a look at it in any detail right now, so
forgive me for not responding/commenting in full, just yet.]

def foo(some_type):
...
return something
foo = memoize(foo)

data = [int, float, my_class, his_class, str, other_class] * 10**6

map(foo, data) [+]

Do you discard the result for real?

No, sorry, of course not.

[+] Can you see why I don't want to mess with the call site ?

No.

It would mean taking it out of the map and that would mean losing the
C-coded loop that map provides.
Now to my code. The idea is to make memoized functions available to
different modules from a central registry, and to expose the cached results
for hand-optimized hotspots. Otherwise it was all mentioned in this thread,
if not the original post.

I'll make sure to find the time to study this.

Jul 18 '05 #32

Jacek Generowicz

bo**@oz.net (Bengt Richter) writes:

On Tue, 9 Dec 2003 23:50:06 -0500, "Francis Avila" <fr***********@yahoo.com> wrote:
Stuart Bishop wrote in message ...
About the only thing that could be done then is:

def memoize(callable):
cache = {}
def proxy(*args, _cache=cache):

To be honest, I have trouble believing that a local variable lookup
would be significantly faster than a closure lookup ... or is there
some other point to this ?

Not legal. *args must always come after default args. There's no way to do Never say never ;-)
A function has the __get__ method, which is what getattr magic uses
to make a bound method. You can use this put cache in the "self"
position of a bound method, like
>>> def sq(x): return x*x ... >>> def memoize(func): # callable is a builtin ;-)

... def proxy(cache, *args):
... try: return cache[args]
... except KeyError: return cache.setdefault(args, func(*args))
... return proxy.__get__({}, proxy.__class__)

Looking at the code, it might be worth timing, if you have 90%+ cache hits.
I'd be curious to see what you get

I get your version being almost imperceptibly slower.

I used this:

import timing
import sys

def Amemoize(func):
def proxy(cache, *args):
try: return cache[args]
except KeyError: return cache.setdefault(args, func(*args))
return proxy.__get__({}, proxy.__class__)
def Bmemoize(func):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, func(*args))
return proxy
N = int(sys.argv[1])

def foo(n):
if n<2: return 1
return foo(n-1) + foo(n-2)

Afoo = Amemoize(foo)
Bfoo = Bmemoize(foo)
Acode = """
def Acall():
""" + "Afoo(10);" * N + "\n"

Bcode = """
def Bcall():
""" + "Bfoo(10);" * N + "\n"

exec Acode
exec Bcode

timing.start()
Acall()
timing.finish()
print timing.micro()

timing.start()
Bcall()
timing.finish()
print timing.micro()

Jul 18 '05 #33

Bengt Richter

On 12 Dec 2003 11:17:16 +0100, Jacek Generowicz <ja**************@cern.ch> wrote:

bo**@oz.net (Bengt Richter) writes:
On Tue, 9 Dec 2003 23:50:06 -0500, "Francis Avila" <fr***********@yahoo.com> wrote:
>Stuart Bishop wrote in message ...
>>About the only thing that could be done then is:
>>
>>def memoize(callable):
>> cache = {}
>> def proxy(*args, _cache=cache):
To be honest, I have trouble believing that a local variable lookup
would be significantly faster than a closure lookup ... or is there
some other point to this ? No, I think you are right. I was mainly reacting to the "there's no way" statement ;-)
>Not legal. *args must always come after default args. There's no way to do Never say never ;-)
A function has the __get__ method, which is what getattr magic uses
to make a bound method. You can use this put cache in the "self"
position of a bound method, like
>>> def sq(x): return x*x

...
>>> def memoize(func): # callable is a builtin ;-)

... def proxy(cache, *args):
... try: return cache[args]
... except KeyError: return cache.setdefault(args, func(*args))
... return proxy.__get__({}, proxy.__class__)

Looking at the code, it might be worth timing, if you have 90%+ cache hits.
I'd be curious to see what you get

I get your version being almost imperceptibly slower.

I used this:

When things are that close, it's very hard to draw conclusions except that they are
probably close ;-) But to split hairs you probably have to run the two tests separately,
in quiescent system situations, so that the first is not improving or worsening the
environment for the second (i.e., in terms of interpreter state re memory, or even cpu
cache in some cases, though that usually only applies to the first time through a loop.
And also keep the fastest of 5-10 trials, to eliminate glitches due to spurious irrelevant
system activity. I don't have your "timing" module, so I don't know what it does. If it
randomly reordered multiloop trials of A vs B and took minimums, then maybe interactions
could be washed out, but otherwise A & B in the same test source leaves some ambiguity
about the real ranking (which of course will not matter, practically speaking;-)

import timing
import sys

[...]

Regards,
Bengt Richter

Jul 18 '05 #34

Jacek Generowicz

bo**@oz.net (Bengt Richter) writes:

On 12 Dec 2003 11:17:16 +0100, Jacek Generowicz <ja**************@cern.ch> wrote:
bo**@oz.net (Bengt Richter) writes:

To be honest, I have trouble believing that a local variable lookup
would be significantly faster than a closure lookup ... or is there
some other point to this ?
No, I think you are right. I was mainly reacting to the "there's no
way" statement ;-)

Sure ... I was sort of answering to a few posts upthread, as I hadn't
done that yet.

I get your version being almost imperceptibly slower.

When things are that close, it's very hard to draw conclusions
except that they are probably close ;-) But to split hairs you
probably have to run the two tests separately, in quiescent system
situations,
.... when it's that close, I just don't care :-)

When it's that close, the Python version, CPU, temperature, phase of
the moon etc. are probably more significant than the algorithm
.... and, most importantly, my time is better spent on finding more
effective ways of speeding up my code.
I don't have your "timing" module,

Are you sure? It's on every Python installation I currently have
access to, so I assumed it was in the standard distribution. Hmm
.... OK, the library reference lists it under "Obsolete".

Jul 18 '05 #35

Speed: bytecode vz C API calls

Similar topics