Trivial performance questions

Brian Patterson

I have noticed in the book of words that hasattr works by calling getattr
and raising an exception if no such attribute exists. If I need the value
in any case, am I better off using getattr within a try statement myself, or
is there some clever implementation enhancement which makes this a bad idea?

i.e. should I prefer:
if hasattr(self,"datum"):
datum=getattr("datum")
else:
datum=None
self.datum=None

over:
try:
datum=self.getattr("datum")
except:
self.datum=None
datum=None

The concept of deliberately raising an error is still foreign to this python
newbie, but I really like the trapping facilities. I just worry about the
performance implications and memory usage of such things, especially since
I'm writing for Zope.

And while I'm here: Is there a difference in performance when checking:
datum is None
over:
datum == None

and similarly:
if x is None or y is None:
or:
if None in (x,y):

I appreciate that these are trivial in the extreme, but I seem to be writing
dozens of them, and I may as well use the right one and squeeze what
performance I can.

Many thanks,
Christopher Boomer.

Jul 18 '05 #1

Subscribe Post Reply

3444

Michael Hudson

"Brian Patterson" <bp@computastore.com> writes:

I have noticed in the book of words that hasattr works by calling
getattr and raising an exception if no such attribute exists. If I
need the value in any case, am I better off using getattr within a
try statement myself, or is there some clever implementation
enhancement which makes this a bad idea?

i.e. should I prefer:
if hasattr(self,"datum"):
datum=getattr("datum")
else:
datum=None
self.datum=None

over:
try:
datum=self.getattr("datum")
except:
Don't do that, do "except AttributeError:" instead.
self.datum=None
datum=None

The concept of deliberately raising an error is still foreign to
this python newbie, but I really like the trapping facilities. I
just worry about the performance implications and memory usage of
such things, especially since I'm writing for Zope.
The answer to which of these runs the fastest depends on how often you
expect the attribute to exist. If it's not that often, the first form
will probably be quicker, if not the second.

However, there's a third form:

datum = getattr(self, "datum", None)

which is what you want here.
And while I'm here: Is there a difference in performance when checking:
datum is None
over:
datum == None
Yes (I think...), but you shouldn't care much about it.
and similarly:
if x is None or y is None:
or:
if None in (x,y):
Pass. Time it if you really care (my bet's on the former being
quicker).
I appreciate that these are trivial in the extreme, but I seem to be
writing dozens of them, and I may as well use the right one and
squeeze what performance I can.

This is an unhelpful attitude. You're writing in Python after all!

If profile shows some of this code to be a hotspot, *then* and only
then is it an appropriate time to worry about such trivial performance
gains.

Cheers,
mwh

--
The only problem with Microsoft is they just have no taste.
-- Steve Jobs, (From _Triumph of the Nerds_ PBS special)
and quoted by Aahz on comp.lang.python

Jul 18 '05 #2

Peter Hansen

Michael Hudson wrote:

If profile shows some of this code to be a hotspot, *then* and only
then is it an appropriate time to worry about such trivial performance
gains.

And never forget to include the second criterion for bothering to
worry about performance: the code does not meet its performance
requirements.

Even if profiling shows you a hotspot (as it almost always would),
you are still wasting your time if you don't actually *need* the
code to be faster. Shaving a few seconds of runtime of a program
that takes a minute to run is likely to be a waste of your time
in the long run, especially when you consider how many times the
program will have to be run to pay back the investment in
optimization and the resultant increase in maintenance costs.

And remember that use of Python in the first place includes an
implicit acceptance that performance is not your biggest concern.

-Peter

Jul 18 '05 #3

Duncan Booth

"Brian Patterson" <bp@computastore.com> wrote in
news:bm**********@titan.btinternet.com:

I have noticed in the book of words that hasattr works by calling
getattr and raising an exception if no such attribute exists. If I
need the value in any case, am I better off using getattr within a try
statement myself, or is there some clever implementation enhancement
which makes this a bad idea?
If you were to inline the code, but you check the existence of an attribute
in more than one place, then you would naturally extract the duplicated
code out into a single function which you call from each location. The
presence of 'hasattr' means the potentially duplicated code is already
extracted into a support function, so you should use it *where appropriate*
to make your code shorter & easier to read.
i.e. should I prefer:
if hasattr(self,"datum"):
datum=getattr("datum")
else:
datum=None
self.datum=None

over:
try:
datum=self.getattr("datum")
except:
self.datum=None
datum=None

Probably you should prefer:

datum = getattr(self, "datum", None)

although this doesn't have the side effect of setting self.datum if it was
unset. Alternatively you could set self.datum every time with:

self.datum = datum = getattr(self, "datum", None)

--
Duncan Booth du****@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

Jul 18 '05 #4

Brian Patterson

>> I appreciate that these are trivial in the extreme, but I seem to be

writing dozens of them, and I may as well use the right one and
squeeze what performance I can.
This is an unhelpful attitude. You're writing in Python after all!

I have never considered using the fastest available option to be an
unhelpful attitude, especially when it does not impact on readability. It
occurred to me that someone more knowledgable might know whether there was a
'right' answer to these trivial questions.

However, it appears not. Sorry for wasting your time :(

Thanks for the tip on the getattr default. This is much cleaner to read,
almost certainly quicker, and will serve the purpose well. I had convinced
myself that it was not available in 2.1.3.

Jul 18 '05 #5

Peter Otten

Brian Patterson wrote:

I have noticed in the book of words that hasattr works by calling getattr
and raising an exception if no such attribute exists. If I need the value
in any case, am I better off using getattr within a try statement myself,
or is there some clever implementation enhancement which makes this a bad
idea?

In rare cases, i. e. when attribute access has side effects, there are not
only differences in performance, but also in the result:

class AskMeOnce(object):
def __getattribute__(self, name):
result = object.__getattribute__(self, name)
delattr(self, name)
return result
t = AskMeOnce()
t.color = "into the blue"

#raises an exception
#if hasattr(t, "color"):
# print t.color

#works
try:
print t.color
except AttributeError:
pass

:-)
Peter

Jul 18 '05 #6

Michael Hudson

"Brian Patterson" <bp@computastore.com> writes:

I appreciate that these are trivial in the extreme, but I seem to be
writing dozens of them, and I may as well use the right one and
squeeze what performance I can.
This is an unhelpful attitude. You're writing in Python after all!

I have never considered using the fastest available option to be an
unhelpful attitude, especially when it does not impact on readability.

That's not what you said!
It occurred to me that someone more knowledgable might know whether
there was a 'right' answer to these trivial questions.

However, it appears not. Sorry for wasting your time :(
Well, it was hardly a waste.
Thanks for the tip on the getattr default. This is much cleaner to read,
almost certainly quicker, and will serve the purpose well. I had convinced
myself that it was not available in 2.1.3.

No, I think it's (at least) 1.5.2 vintage...

Cheers,
mwh

--
ARTHUR: Why should a rock hum?
FORD: Maybe it feels good about being a rock.
-- The Hitch-Hikers Guide to the Galaxy, Episode 8

Jul 18 '05 #7

Alex Martelli

Brian Patterson wrote:
...

newbie, but I really like the trapping facilities. I just worry about the
performance implications and memory usage of such things, especially since
I'm writing for Zope.

And while I'm here: Is there a difference in performance when checking:
datum is None
over:
datum == None

and similarly:
if x is None or y is None:
or:
if None in (x,y):

I appreciate that these are trivial in the extreme, but I seem to be
writing dozens of them, and I may as well use the right one and squeeze
what performance I can.

I see you've already been treated to almost all the standard "performance
does not matter" arguments (pretty well presented). They're right (and
I would have advanced them myself if others hadn't already done so quite
competently), *BUT*...

....but, when you're wondering which of two equivalently readable and
maintainable idioms is "the one obvious way to do it", there is
nothing wrong with finding out the performance to help you. After
all, which one is right is not necessarily obvious unless you're
Dutch! To put it another way: there is nothing wrong in getting
into the habit of always using one idiom over another when they appear
to be equivalent; such stylistic uniformity can indeed often be
preferable to choosing haphazardly in each case. And all other things
being equal it IS better to choose, as one's habitual style, the
microscopically faster one -- why not, after all?

So, for this kind of tasks as well as for many others, what you
need is timeit.py from Python 2.3. I'm not sure it's compatible
with Python 2.1.3, which I understand you're constrained to use
due to Zope -- I think so, but haven't tried. It's sure quite
compatible with Python 2.2. I've copied it into my ~/bin and
done a chmod+x, and now when I wonder about performance it's easy
to check it (sometimes there are tricky parts, but not often); if
I need to check for a specific release, I can explicitly say e.g.
$ python2.2 ~/bin/timeit.py ...
or whatever.

So, for Python 2.3 on my machine:

[alex@lancelot clean]$ timeit.py -c -s'datum=23' 'datum==None'
1000000 loops, best of 3: 0.47 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'datum=23' 'datum is None'
1000000 loops, best of 3: 0.29 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'datum=None' 'datum is None'
1000000 loops, best of 3: 0.29 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'datum=None' 'datum == None'
1000000 loops, best of 3: 0.41 usec per loop

no doubt here, then: "datum is None" wins hands-down over
"datum == None" whether datum is None or not. And indeed,
it so happens that experienced Pythonistas generally prefer
'is' for this specific test (this also has other reasons,
such as the preference for words over punctuation, and the
fact that if datum is an instance of a user-coded class
there are no bounds to the complications its __eq__ or
__cmp__ might cause, while 'is' doesn't run ANY such risk).

Similarly:

[alex@lancelot clean]$ timeit.py -c -s'x=1' -s'y='2 'None in (x,y)'
1000000 loops, best of 3: 1 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'x=1' -s'y='2 'x is None or y is None'
1000000 loops, best of 3: 0.48 usec per loop

again, the form with more words and no punctuation (the more readable
one by Pythonistas' usual tastes) is faster -- confirming it's the
preferable style.

These measurements also help put such things in perspective: we ARE
after all talking about differences of 120 to 500 nanoseconds (on
my 30-months-old box, a dinosaur by today's standards). Still, if
they're executed in some busy inner loop, it MIGHT easily pile up to
several milliseconds' worth, and, who knows - given that after all
choosing a consistent style IS preferable, and that often the
indications you get from these measurements will push you towards
readability and Pythonicity, it doesn't seem a bad idea to me.

Now, about the hasattr vs getattr issue...:

[alex@lancelot clean]$ timeit.py -c 'hasattr([], "pop")'
1000000 loops, best of 3: 0.95 usec per loop
[alex@lancelot clean]$ timeit.py -c 'getattr([], "pop", None)'
1000000 loops, best of 3: 1.11 usec per loop
[alex@lancelot clean]$ timeit.py -c 'hasattr([], "pok")'
100000 loops, best of 3: 2.4 usec per loop
[alex@lancelot clean]$ timeit.py -c 'getattr([], "pok", None)'
100000 loops, best of 3: 2.6 usec per loop

you can see that three-args getattr always takes a tiny little
bit longer than hasattr -- about 0.2 microseconds. More time for
both when getting non-existent attributes, of course, since the
exception is raised and handled in that case. But in any case,
given that getattr has already done all the work you needed,
while hasattr may be just the beginning (you still need to get
the attribute if it's there), you also need to consider:

[alex@lancelot clean]$ timeit.py -c '[].pop'
1000000 loops, best of 3: 0.48 usec per loop

and that attribute fetch consumes 2-3 times longer than the
speed-up of hasattr vs 3-args getattr. So, if the attribute
will be present at least 30%-50% of the time, we could expect
3-attribute getattr to be a winner; for rarely present
attributes, though, hasattr may still be faster (by a tiny
little bit).

We can also measure the try/except approach:
[alex@lancelot clean]$ timeit.py -c '
try: [].pop
except AttributeError: pass
'
1000000 loops, best of 3: 0.6 usec per loop

[alex@lancelot clean]$ timeit.py -c '
try: [].pok
except AttributeError: pass
'
100000 loops, best of 3: 8.1 usec per loop

If the exception doesn't occur try/except is quite fast,
but, if it does, it's far slower than any of the others.
So, if performance matters, it should only be considered
if the attribute is _overwhelmingly_ more likely to be
present than absent.

We can put together these solutions in small functions,
e.g. a.py:

def hasattr_pop(obj=[]):
if hasattr(obj, 'pop'):
return obj.pop
else:
return None

def getattr_pop(obj=[]):
return getattr(obj, 'pop', None)

def tryexc_pop(obj=[]):
try: return obj.pop
except AttributeError: return None

and similarly for pok instead of pop. Now:

[alex@lancelot clean]$ timeit.py -c -s'import a' 'a.hasattr_pop()'
100000 loops, best of 3: 2.1 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'import a' 'a.getattr_pop()'
100000 loops, best of 3: 1.9 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'import a' 'a.tryexc_pop()'
1000000 loops, best of 3: 1.46 usec per loop

for an attribute that's present, small advantage to the try/except,
getattr by a nose faster than the hasattr check. But:

[alex@lancelot clean]$ timeit.py -c -s'import a' 'a.hasattr_pok()'
100000 loops, best of 3: 3.4 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'import a' 'a.getattr_pok()'
100000 loops, best of 3: 3.5 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'import a' 'a.tryexc_pok()'
100000 loops, best of 3: 12.3 usec per loop

here, by the tiniest of margin, hasattr beats getattr -- and
try/except is the pits.

So, in this case, I don't think a single approach can be
universally recommended. getattr is most compact; but you
should also keep in your quiver the try/except, for those
extremely performance-sensitive cases where the attribute
will almost always be present, AND the hasattr as the best
compromise for attributes that are absent some resonable
percentage of the time AND the default value takes effort to
construct (by using None as the default we've favoured the
getattr approach, which always constructs the default object,
by giving it a very cheap-to-construct one -- try with some
default object that DOES take work to build, and you'll see).

That much being said, you'll almost always see me using
getattr for this -- it's just too compact, handy, readable --
I'll optimize it out only when dealing with a real bottleneck,
or avoid using it when the effort of constructing the default
object is "obviously" quite big.
Alex

Jul 18 '05 #8

Peter Hansen

Alex Martelli wrote:

So, for this kind of tasks as well as for many others, what you
need is timeit.py from Python 2.3. I'm not sure it's compatible
with Python 2.1.3, which I understand you're constrained to use
due to Zope -- I think so, but haven't tried. It's sure quite
compatible with Python 2.2.

It is also quite compatible with Python 2.0, based on running
successfully several of your following examples, so one would
assume it will also run fine with Python 2.1.

A quick inspection of the code backs up the empirical evidence,
showing no Python 2.2+ dependencies that don't have automatic
fallbacks (as with the attempt to include itertools).

(Thanks for the tutorial on timeit.py Alex. I've finally stuck
it in all my older Python installations, after your repeated
helpful promptings!)

-Peter

Jul 18 '05 #9

Geoff Gerrietts

Quoting Brian Patterson (bp@computastore.com):

I have noticed in the book of words that hasattr works by calling getattr
and raising an exception if no such attribute exists. If I need the value
in any case, am I better off using getattr within a try statement myself, or
is there some clever implementation enhancement which makes this a bad idea?

i.e. should I prefer:
if hasattr(self,"datum"):
datum=getattr("datum")
else:
datum=None
self.datum=None

over:
try:
datum=self.getattr("datum")
except:
self.datum=None
datum=None

The concept of deliberately raising an error is still foreign to
this python newbie, but I really like the trapping facilities. I
just worry about the performance implications and memory usage of
such things, especially since I'm writing for Zope.
Generally prefer
d = getattr(self, "datum", None)
if d is None:
self.datum = None

This won't always result in the fastest code though. In particular,
there's a slight performance edge to be had if you're doing lots of
lookups (many tens of thousands) and you expect a low percentage
(single digit? is that too many?) to be None. Then try/except becomes
faster. If you think you're likely to be in this situation, it should
be pretty trivial to write some test cases to find the actual tradeoff
point.
And while I'm here: Is there a difference in performance when checking:
datum is None
over:
datum == None
Generally prefer the former, but the difference is likely to be masked
by other factors.
and similarly:
if x is None or y is None:
or:
if None in (x,y):
I've preferred the latter thinking it was less work on the
interpreter, under the general premise that the code for the "in"
operation was one swatch of C, while is / or / is was three different
swatches of C with "python internals" gluing them together. My
analysis is obviously pretty surface though.
I appreciate that these are trivial in the extreme, but I seem to be
writing dozens of them, and I may as well use the right one and
squeeze what performance I can.

Maybe I'm a heretic, but I think this is a healthy attitude to have.
If you can write it optimally the first time with no significant
increase in effort, then nobody's going to hafta come back and rewrite
it later: that's a big maintenance win.

--G.

--
Geoff Gerrietts <geoff at gerrietts net>
"A man can't be too careful in the choice of his enemies." --Oscar Wilde

Jul 18 '05 #10

Peter Hansen

Geoff Gerrietts wrote:

Maybe I'm a heretic, but I think this is a healthy attitude to have.
If you can write it optimally the first time with no significant
increase in effort, then nobody's going to hafta come back and rewrite
it later: that's a big maintenance win.

Not unless you add Alex' constraint that the two alternatives under
consideration are equally readable. Otherwise the less readable one
is always going to cost you more at maintenance time. And I'd add
my own constraint that you actually have to *need* the speed. Otherwise
even the "insignificant" increase in effort that it will cost you will
not be paying for itself.

http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast

Making it right means making it readable too. Optimization should
always come later, and not at all if you don't actually need it.

My group has invested almost thirty person-years writing Python code in
the last few years. To the best of my ability to recall, only two of
the tasks we've worked on in that time was directly related to
performance concerns and the resulting optimization for speed. Given
that the combined optimization efforts consumed perhaps a few weeks
of our time, we spend something like 0.4% of our time focusing on
performance. This seems to me a healthy amount.

(Curiously enough, when we coded more in C, I suspect we spent a
substantially larger amount of time caught up in performance issues.
This change is due merely to greater experience, not because of
the change in language, though the two are related.)

-Peter

Jul 18 '05 #11

Geoff Gerrietts

Quoting Peter Hansen (pe***@engcorp.com):

Not unless you add Alex' constraint that the two alternatives under
consideration are equally readable. Otherwise the less readable one
is always going to cost you more at maintenance time.
Yes to your first sentence, not so sure to the second. The implication
is the code will always be touched, and my contention is that if you
don't pay at least trivial attention to writing something optimal --
includes avoiding geometric algorithms -- then you're significantly
increasing the amount of maintenance work necessary.

Example: pulling out list.sort(lambda x, y: cmp(x[0],y[0])) and
putting in an abstract transform_sort is /only responsible/. The
list.sort(callable) idiom might be more readable to a novice -- it has
been to the novices I've worked with -- but its performance
implications on nontrivial lists are astonishing.
And I'd add my own constraint that you actually have to *need* the
speed. Otherwise even the "insignificant" increase in effort that
it will cost you will not be paying for itself.
Capitalism has bred a real reliance on "good enough": when you hit
your payoff point, you don't go any farther. It's a useful metric to
apply, but a dangerous premise to base all your decisions on. "Good
enough" needs to be critically evaluated for both the short term and
the long term.

A half-million micro-optimizations may not pay for themselves
individually. But in the long term, when confronted with a total
system rewrite because the collected work can no longer perform
adequately, and standard optimization techniques have met with
diminishing returns, you're going to regret not having paid attention
the first time through, when you didn't hafta re-teach yourself what
the code is doing. The little bits where you're just /paying
attention/ to the performance implications of what you're doing
aggregate over time to reduce the maintenance overhead.
http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast
It's an interesting formulation but it stinks of propaganda to me.
When generic catchphrases are re-interpreted by almost every viewer
its a pretty fair bet they're not precise enough to be really useful.
The discussion on this page makes me think of Biblical scholars
debating the meaning of ambiguous passages.
Making it right means making it readable too. Optimization should
always come later, and not at all if you don't actually need it.
I won't disagree with that.
My group has invested almost thirty person-years writing Python code in
the last few years. To the best of my ability to recall, only two of
the tasks we've worked on in that time was directly related to
performance concerns and the resulting optimization for speed. Given
that the combined optimization efforts consumed perhaps a few weeks
of our time, we spend something like 0.4% of our time focusing on
performance. This seems to me a healthy amount.
My group has invested probably something like 15 person-years writing
Python code in the last few years. We have probably put about one of
those person years into trying to account for performance bottlenecks.
Management is presently of the opinion that a drastic rewrite is the
only way to resolve the remaining issues. Perhaps the most distinct
difference between your group and mine is that many of our developers
are fairly novice, and prone to select solutions that are not
well-informed about performance issues and algorithm complexity. On
the other hand, maybe our code is just more heavily used?
(Curiously enough, when we coded more in C, I suspect we spent a
substantially larger amount of time caught up in performance issues.
This change is due merely to greater experience, not because of
the change in language, though the two are related.)

Yes. Younger engineers tend to emphasize performance too much, because
it's a huge nebulous area that they don't understand, and which may
well bite them in the ass HARD. Older engineers can automatically
navigate through the most dangerous fields of landmines, and tend to
underemphasize performance too much, because the most important
aspects are habit and the less important aspects can be safely
ignored.

At first blush, I thought "maybe there's an equilibrium that needs to
be found". But I don't think so now. I think it's important for
younger (intermediate?) developers to be obsessed with performance, so
they can learn the dangers of bad algorithms, how to recognize them,
how to avoid them. And it's worth building good habits where you
choose an optimal idiom rather than a slower one.

You can disagree, but I've done a lot of reading and thinking on the
matter, in part because my experience and my beliefs have been at odds
in the past. Consequently, you're going to hafta try harder than
invoking the divine authority of Kent Beck (or even Knuth!) to
persuade me. Still, I can yet be persuaded; my mind is quite
tractable.

--G.

--
Geoff Gerrietts "I don't think it's immoral to want to
<geoff at gerrietts net> make money." -- Guido van Rossum

Jul 18 '05 #12

Peter Hansen

Geoff Gerrietts wrote:

Quoting Peter Hansen (pe***@engcorp.com):

Not unless you add Alex' constraint that the two alternatives under
consideration are equally readable. Otherwise the less readable one
is always going to cost you more at maintenance time.
Yes to your first sentence, not so sure to the second. The implication
is the code will always be touched, and my contention is that if you
don't pay at least trivial attention to writing something optimal --
includes avoiding geometric algorithms -- then you're significantly
increasing the amount of maintenance work necessary.

I won't disagree with most of that (we're rapidly reaching near total
agreement here! :-) but I do think that assuming "the code will always
be touched" is a very healthy attitude, in the same way you think that
at least trivial attention to performance is a healthy attitude.

We certainly have code that hasn't been touched during maintenance,
but nobody could have predicted which areas of the code that would be.
Capitalism has bred a real reliance on "good enough": when you hit
your payoff point, you don't go any farther. It's a useful metric to
apply, but a dangerous premise to base all your decisions on. "Good
enough" needs to be critically evaluated for both the short term and
the long term.
As an XP team, we tend to consider that critical evaluation to be
the domain of the customer, so we basically don't worry about it
until there is feedback that we're doing the wrong thing. This,
in cooperation with the customer, makes the best use of the our
resources (for which the customer is paying, in effect). But,
yeah, that's just the XP view of things.
A half-million micro-optimizations may not pay for themselves
Phew! I seriously hope your group hasn't examined that many
pieces of code with performance concerns in mind! We don't have
even that many lines of code, let alone areas that could be
micro-optimized.
individually. But in the long term, when confronted with a total
system rewrite because the collected work can no longer perform
adequately, and standard optimization techniques have met with
diminishing returns, you're going to regret not having paid attention
the first time through,
There's some truth in that, but I can't shake the nagging feeling
that simply by using Python, we've moved into a realm where the
best way to optimize a serious problem area is to rewrite in C
or Pyrex, or get a faster processor. (Like you, I can be
persuaded, but this is what _my_ experience has taught me.)

http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast

It's an interesting formulation but it stinks of propaganda to me.
When generic catchphrases are re-interpreted by almost every viewer
its a pretty fair bet they're not precise enough to be really useful.
The discussion on this page makes me think of Biblical scholars
debating the meaning of ambiguous passages.

Actually, it's probably just that re-interpretation and discussion
which proves so very useful, not the phrase itself. Like a Zen
koan or something, it's too short (or ambiguous) to have direct,
hard meaning, but the meme it carries is a valuable one with which
to be infected. ;-)

The same probably holds true about ambiguous biblical passages,
I hate to admit.
My group has invested probably something like 15 person-years writing
Python code in the last few years. We have probably put about one of
those person years into trying to account for performance bottlenecks.
Management is presently of the opinion that a drastic rewrite is the
only way to resolve the remaining issues. Perhaps the most distinct
difference between your group and mine is that many of our developers
are fairly novice, and prone to select solutions that are not
well-informed about performance issues and algorithm complexity. On
the other hand, maybe our code is just more heavily used?
I'd vote for the latter. My group has been heavily junior in flavour.
Perhaps another cause of the difference is our greater (?) emphasis
on XP and test-driven development? I doubt anyone could say, but
for sure your code is more heavily used. I don't even need to know
what it does to say that. :-)

Maybe one example: we used += with strings a lot in the early days.
Partly junior developers, a greater part due to inexperience with
Python. I think only one or two bits of our code has been re-written
to use [].append() and ''.join() instead, because only those bits
came to the fore when performance was an issue. The rest is still
merrily chewing up CPU time doing wasteful += on strings, but nobody
cares. We refactor that (for consistency, mainly, I think) when we
get to them for other reasons, and new code probably doesn't use +=
so much, but that's about the extent of it.
At first blush, I thought "maybe there's an equilibrium that needs to
be found". But I don't think so now. I think it's important for
younger (intermediate?) developers to be obsessed with performance, so
they can learn the dangers of bad algorithms, how to recognize them,
how to avoid them. And it's worth building good habits where you
choose an optimal idiom rather than a slower one.
I would agree that new developers would benefit from that kind of
experience. One of the few reasons why a (good) university or
college education can be of value to a programmer. So can critical
reading of some decent books or web pages on the topic.
You can disagree, but I've done a lot of reading and thinking on the
matter, in part because my experience and my beliefs have been at odds
in the past. Consequently, you're going to hafta try harder than
invoking the divine authority of Kent Beck (or even Knuth!) to
persuade me. Still, I can yet be persuaded; my mind is quite
tractable.

I think Kent is merely on a par with the Pope, but is not Himself
divine. ;-) Knuth is another story, perhaps. :-)

-Peter

Jul 18 '05 #13

Paul Rubin

Peter Hansen <pe***@engcorp.com> writes:

individually. But in the long term, when confronted with a total
system rewrite because the collected work can no longer perform
adequately, and standard optimization techniques have met with
diminishing returns, you're going to regret not having paid attention
the first time through,

There's some truth in that, but I can't shake the nagging feeling
that simply by using Python, we've moved into a realm where the
best way to optimize a serious problem area is to rewrite in C
or Pyrex, or get a faster processor. (Like you, I can be
persuaded, but this is what _my_ experience has taught me.)

That's not always either feasible or desirable. For example, I once
worked on the user interface of an ATM switch. It had to display a
connection list in sorted order, when they were stored in memory in
random order. It did this by finding the smallest numbered
connection, then the next smallest, etc., an O(N**2) algorithm which
worked fine when the switch was originally designed and could handle
no more than 16 connections or something like that, but which ate a
lot of not-too-plentiful embedded cpu time when hardware enhancements
made hundreds of connections possible. OK, you say, rip out that
algorithm and put in a better one. The problem is that the "sorting"
code was intimately intermixed with the selection code which banged on
the hardware registers and dealt with all kinds of fault conditions,
and the display code, which was knee deep in formatting cruft, and had
grown like a jungle over years of maintenance as new releases of the
hardware kept sprouting new features. In short it was typical
embedded code written by electrical (i.e. hardware) engineers who,
while they were not stupid people, just didn't have much understanding
of software technology or methodology. We are not talking about some
three-line loop like

concat = ''
for s in stringlist:
concat += s

that can be rewritten into a ''.join call. This UI module was 5000 or
so lines of extremely crufty code and there was no way to fix it
without a total rewrite. And a total rewrite couldn't ever be
scheduled, because there were always too many fires to put out in the
product. The module therefore got worse and worse. So that's a
real-world example of where a little bit more up-front design caution
would have saved an incredible amount of headache for years to come.

And sure, there are all kinds of methodological platitudes about how
to stop that situation from happening, but they are based on wishful
thinking. They just do not always fit the real-world constraints that
real projects find imposed on them (e.g. that a complicated hardware
product is staffed mostly by hardware engineers, who bang out "grunt"
code without too much sense of how to organize large programs). All
you can do is recognize that you have a little bit of programming
sophistication available, and try to maximize your leverage in
applying it where it makes the most difference. Regardless of what
one thinks of C++, reading Stroustrup's C++ book after being through
experiences like the above makes it clear Stroustrup had had similar
experiences. It's visible in his book, how various design choices of
C++ were motivated by the tensions inherent in those experiences.

Jul 18 '05 #14

Geoff Gerrietts

Quoting Peter Hansen (pe***@engcorp.com):

I won't disagree with most of that (we're rapidly reaching near total
agreement here! :-) but I do think that assuming "the code will always
be touched" is a very healthy attitude, in the same way you think that
at least trivial attention to performance is a healthy attitude.
Yes, I think we're pretty close to in accord here.
As an XP team, we tend to consider that critical evaluation to be
the domain of the customer, so we basically don't worry about it
until there is feedback that we're doing the wrong thing. This,
in cooperation with the customer, makes the best use of the our
resources (for which the customer is paying, in effect). But,
yeah, that's just the XP view of things.
And I'm working from the perspective of an internal customer. But I
also think that with an external customer; special care ought to be
paid to those pieces of software that you don't plan to live
exclusively inside the project.

A half-million micro-optimizations may not pay for themselves

Phew! I seriously hope your group hasn't examined that many
pieces of code with performance concerns in mind! We don't have
even that many lines of code, let alone areas that could be
micro-optimized.

....well, no, we haven't. But we are approaching that many lines of
code. And a good deal of it is naive code, none of which we will be
able to reclaim the lost performance from without more profound reason
to refactor. Some of it we probably should, but it's a challenge to
effectively profile our code.
There's some truth in that, but I can't shake the nagging feeling
that simply by using Python, we've moved into a realm where the
best way to optimize a serious problem area is to rewrite in C
or Pyrex, or get a faster processor. (Like you, I can be
persuaded, but this is what _my_ experience has taught me.)
Probably some truth in that, too.
Actually, it's probably just that re-interpretation and discussion
which proves so very useful, not the phrase itself. Like a Zen
koan or something, it's too short (or ambiguous) to have direct,
hard meaning, but the meme it carries is a valuable one with which
to be infected. ;-)

The same probably holds true about ambiguous biblical passages,
I hate to admit.
There's an ambiguous koan-like meme that I like to break out now and
again -- I think it's due to Robert Anton Wilson but the years have
not been kind to my respect for authority:

Any proposition is true in some way, false in some way, and in some
way not pertinent to the matter at hand at all.

Spend enough time with the meme and it justifies both sides of the
discussion.
I'd vote for the latter. My group has been heavily junior in
flavour. Perhaps another cause of the difference is our greater (?)
emphasis on XP and test-driven development? I doubt anyone could
say, but for sure your code is more heavily used. I don't even need
to know what it does to say that. :-)
I'll believe you. :) We've scaled up to the point where we're happy
but bursting at the seams.
I would agree that new developers would benefit from that kind of
experience. One of the few reasons why a (good) university or
college education can be of value to a programmer. So can critical
reading of some decent books or web pages on the topic.
Yes. It's something of a rite of passage, in some ways. And maybe the
right way to respond to optimization questions is "focus on
algorithms, and learn which built-in constructs use lousy ones". I'm
not sure, but I find "thinking about optimization before your
processors melt is premature" to be more than a little disingenuous.
I think Kent is merely on a par with the Pope, but is not Himself
divine. ;-) Knuth is another story, perhaps. :-)

Great minds, but human -- all too human. ;)

--G.

--
Geoff Gerrietts <geoff at gerrietts net> http://www.gerrietts.net/
"Now, now my good man, this is no time for making enemies."
--Voltaire, on his deathbed, when asked to renounce Satan

Jul 18 '05 #15

Peter Hansen

Paul Rubin wrote:

Peter Hansen <pe***@engcorp.com> writes:
individually. But in the long term, when confronted with a total
system rewrite because the collected work can no longer perform
adequately, and standard optimization techniques have met with
diminishing returns, you're going to regret not having paid attention
the first time through,
There's some truth in that, but I can't shake the nagging feeling
that simply by using Python, we've moved into a realm where the
best way to optimize a serious problem area is to rewrite in C
or Pyrex, or get a faster processor. (Like you, I can be
persuaded, but this is what _my_ experience has taught me.)

That's not always either feasible or desirable. For example, I once
worked on the user interface of an ATM switch.

Wait, wait, wait.... hang on a second. We were talking in a Python
newsgroup about Python development and optimization, and my comments
above relate solely and exclusively to that context. You are talking
about embedded development, certainly not with Python (though we use
it in that way, but we're nearly unique I think), and I agree totally
with you and Geoff and anyone else who wants to put forward the position
that worrying about performance in advance is an important step in,
say, more static and less readable languages where testing is harder
and refactoring next in line with suicide in terms of dangerous pastimes.

Python is a different story. Python is *trivial* to refactor if one
has adequate tests (as one should, even if just because of the dynamic
nature of Python) and, more to the point, it is definitely lower in
performance no matter what I or anyone else says for your "average"
programming task than, say, C.

I say that since we're talking *exclusively* in the context of
Python (I was, in case that wasn't clear) one can *always* consider
a fallback to C or Pyrex and the option for faster hardware to be
possible steps, instead of merely doing profiling and local micro-
optimization ala some of Geoff's points (not to diminish the importance
of anything he said).
It had to display a
connection list in sorted order, when they were stored in memory in
random order. It did this by finding the smallest numbered
connection, then the next smallest, etc., an O(N**2) algorithm which
worked fine when the switch was originally designed and could handle
no more than 16 connections or something like that, but which ate a
lot of not-too-plentiful embedded cpu time when hardware enhancements
made hundreds of connections possible. OK, you say, rip out that
algorithm and put in a better one.
Oh, dang, you aren't even addressing my original points, as I thought
you were about to. :-) You're talking about the typical O(xx) analysis
that is the basis of all serious performance analysis. But then you're
saying that with really crappy code written by incompetent designers
and coders, without a decent process to control the spread of the
disease they create, one can't avoid the resulting predicament, with
my approaches or anyone else's. No surprise there...
that can be rewritten into a ''.join call. This UI module was 5000 or
so lines of extremely crufty code and there was no way to fix it
without a total rewrite. And a total rewrite couldn't ever be
scheduled, because there were always too many fires to put out in the
product. The module therefore got worse and worse. So that's a
real-world example of where a little bit more up-front design caution
would have saved an incredible amount of headache for years to come.
Maybe. Maybe (and I think it's more likely) when you have electrical
engineers writing code, the only thing up-front design will do for you
is postpone the point at which they start coding, and then you're
doomed anyway.
And sure, there are all kinds of methodological platitudes about how
to stop that situation from happening, but they are based on wishful
thinking.
The only platitude about that is that those kinds of people shouldn't
be allowed near the keyboard.
They just do not always fit the real-world constraints that
real projects find imposed on them (e.g. that a complicated hardware
product is staffed mostly by hardware engineers, who bang out "grunt"
code without too much sense of how to organize large programs).

I'll consider myself fortunate to be in a real-world ("modern"?)
situation where the software people contribute intimately to the
hardware designs and write all the code and, often, things work
much better as a result. I've been in your shoes in the past and
know whereof you speak... I just don't think that's how it has to be
any more.

Anyway Paul, I can't really see where you disagree with much that I've
said, and in any case I mostly agree with what you said, other than
that I don't think it's inevitable that a bunch of hardware types have
to be the ones writing code (certainly not in my company anyway) so
I don't agree that my approach is inappropriate, except perhaps in
some companies which, in my opinion, are doomed anyway. :-) Heck,
at this point I figure anyone writing embedded code that isn't done
in a test-driven style(*) is probably doomed, so I'm pretty much an
extremist in this respect!

-Peter

(*) We have a very nice in-house developed simulator, all in Python,
which we're turning into a most excellent test-driven development
framework for our embedded code. Very promising early results, and
if this kind of thing catches on it could well revolutionize, for
at least some companies, embedded development and improve quality
immensely. Nothing to do with performance, mind you, except that
other than a few experiments with Pyscho I haven't bothered worrying
about the performance of the simulator since it's (a) in Python, and
(b) already something like 1/60 the speed of the native CPU at 12MHz,
and this on a 600MHz Pentium III!!

Jul 18 '05 #16

Peter Hansen

Geoff Gerrietts wrote:

And maybe the
right way to respond to optimization questions is "focus on
algorithms, and learn which built-in constructs use lousy ones". I'm
not sure, but I find "thinking about optimization before your
processors melt is premature" to be more than a little disingenuous.

With that I can fully agree. Any comments I make which might suggest
the latter are merely in reaction to comments about likely unnecessary
micro-optimizations made by people who don't add the "required" Knuthism
about the root of all evils, etc.

Basically, newbies might be in danger of learning too little about
algorithmic complexity and the effect on performance, but they're
equally in danger of becoming so obsessed with unnecessary performance
gains that it could waste literally years of their fledgling careers.

IMHO :-)

-Peter

Jul 18 '05 #17

Alex Martelli

Peter Hansen wrote:

Geoff Gerrietts wrote:

Maybe I'm a heretic, but I think this is a healthy attitude to have.
If you can write it optimally the first time with no significant
increase in effort, then nobody's going to hafta come back and rewrite
it later: that's a big maintenance win.

Not unless you add Alex' constraint that the two alternatives under
consideration are equally readable. Otherwise the less readable one
is always going to cost you more at maintenance time. And I'd add
my own constraint that you actually have to *need* the speed. Otherwise
even the "insignificant" increase in effort that it will cost you will
not be paying for itself.

If you *get into the habit* of always checking with "if x is None:"
rather than "if x == None:" -- two equally readable constructs -- it
will cost you no increase in effort whatsoever to always use the
idiom you've gotten used to. So, whether the (tiny) extra speed and
readability are important or not, it's still a good habit to pick up.

I would (more hesitantly) claim the same for ''.join vs += in putting
"a few" strings together. If they're really just a few performance
will not matter. But as soon as they're not so few it does suddenly
matter A LOT. So, even though the effort may not be trivial in this
case, I still think of this as "a good habit to pick up" to the point
where it becomes automatic... but it's not _quite_ as clear-cut.
Alex

Jul 18 '05 #18

Alex Martelli

Peter Hansen wrote:
...

A quick inspection of the code backs up the empirical evidence,
showing no Python 2.2+ dependencies that don't have automatic
fallbacks (as with the attempt to include itertools).
My, but that timbot guy IS good. Envy, envy...

(Thanks for the tutorial on timeit.py Alex. I've finally stuck
it in all my older Python installations, after your repeated
helpful promptings!)

You're welcome, but Tim Peters is the one who really deserves
our collective gratitude!
Alex

Jul 18 '05 #19

Peter Hansen

Alex Martelli wrote:

I would (more hesitantly) claim the same for ''.join vs += in putting
"a few" strings together. If they're really just a few performance
will not matter. But as soon as they're not so few it does suddenly
matter A LOT. So, even though the effort may not be trivial in this
case, I still think of this as "a good habit to pick up" to the point
where it becomes automatic... but it's not _quite_ as clear-cut.

Agreed that it's not as clear-cut, the more so because the construct
is inherently less readable. (More line noise, so to speak.) There
are times when I'll consciously choose += for a couple of lines
(yes, where I subconsciously know performance won't matter) to ensure
readability of the code.

Nevertheless, your (snipped) point is valid, and I have certainly
tried to develop both the "is None" and the "''.join" habits for
my own work.

-Peter

Jul 18 '05 #20

Bryan

> If you *get into the habit* of always checking with "if x is None:"

rather than "if x == None:" -- two equally readable constructs -- it
will cost you no increase in effort whatsoever to always use the
idiom you've gotten used to. So, whether the (tiny) extra speed and
readability are important or not, it's still a good habit to pick up.

Alex

can you explain in more detail why "if x is None:" is better/faster than "if x == None:"? i guess i don't fully
understand "is". my fingers always seem to want to type "if not x:", but that is probably worse still since it includes
(), [], {}, '', False, None .

thanks,

bryan

Jul 18 '05 #21

Alex Martelli

Bryan wrote:

> If you *get into the habit* of always checking with "if x is None:"
rather than "if x == None:" -- two equally readable constructs -- it
will cost you no increase in effort whatsoever to always use the
idiom you've gotten used to. So, whether the (tiny) extra speed and
readability are important or not, it's still a good habit to pick up.
>
Alex

can you explain in more detail why "if x is None:" is better/faster than
"if x == None:"? i guess i don't fully
understand "is". my fingers always seem to want to type "if not x:", but
that is probably worse still since it includes (), [], {}, '', False, None

If you WANT to include other false values, then of course "if not x:" is
just perfect. But with either 'is None' or '== None' you're checking
specifically for None -- quite different semantics.

"a is None" a bit more readable than "a == None" because it uses readable
words rather than punctuation. It's a bit faster, because 'is' gets the
id (machine addresses) of its operands and just compares them -- it looks
for IDENTITY, the SAME objects, as opposed to two separate objects which
happen to have the same value; '==' necessarily must do a bit more work,
because many objects can be equal without being the same object.
Alex

Jul 18 '05 #22

Terry Reedy

"Bryan" <be*****@yahoo.com> wrote in message
news:Klckb.807073$YN5.806740@sccrnsc01...

can you explain in more detail why "if x is None:" is better/faster than "if x == None:"? i guess i don't fully understand "is".

'is' means same object (same identity) For CPython, this is a trivial
comparison of addresses. == means same value, which comparison
starts, I believe, with calling type(x) and type(None) to see if they
are comparable.
my fingers always seem to want to type "if not x:", but that is

probably worse still since it includes (), [], {}, '', False, None .

It is good or erroneous depending on whether 'not x' is or is not the
appropriate condition ;-)

Terry J. Reedy

Jul 18 '05 #23

Aahz

In article <Klckb.807073$YN5.806740@sccrnsc01>,
Bryan <be*****@yahoo.com> wrote:

can you explain in more detail why "if x is None:" is better/faster
than "if x == None:"? i guess i don't fully understand "is". my fingers
always seem to want to type "if not x:", but that is probably worse
still since it includes (), [], {}, '', False, None .

When you want to check whether a target specifically points to the None
object, you can't use == because it calls a Python method that might
return None. That's in addition to the fact that ``is`` is faster.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

Jul 18 '05 #24

Michael Hudson

Alex Martelli <al***@aleax.it> writes:

Peter Hansen wrote:
...
A quick inspection of the code backs up the empirical evidence,
showing no Python 2.2+ dependencies that don't have automatic
fallbacks (as with the attempt to include itertools).

My, but that timbot guy IS good. Envy, envy...

a) I'm pretty sure timeit.py was written by Guido (he's not bad, either :-)

b) Revision 1.6 has this checkin comment:

Broke down and made it work for Python 2.0 and up. (Older versions
would have required refraining from using string methods -- too
painful.)

Changed the -s option so that multiple -s options are cumulative.

Cheers,
mwh

--
Java sucks. [...] Java on TV set top boxes will suck so hard it
might well inhale people from off their sofa until their heads
get wedged in the card slots. --- Jon Rabone, ucam.chat

Jul 18 '05 #25

Alex Martelli

Michael Hudson wrote:

Alex Martelli <al***@aleax.it> writes:
Peter Hansen wrote:
...
> A quick inspection of the code backs up the empirical evidence,
> showing no Python 2.2+ dependencies that don't have automatic
> fallbacks (as with the attempt to include itertools).

My, but that timbot guy IS good. Envy, envy...

a) I'm pretty sure timeit.py was written by Guido (he's not bad, either

Ooops yes -- it mentions Tim Peters in the docstring in a way that
should make it pretty obvious it's not written by Tim (only Julius
Caesar wrote of himself in 3rd person in just that way...)... silly me...!
Alex

Jul 18 '05 #26

Trivial performance questions

Similar topics